STOCHQUANT PROBABILISTIC DETECTION AND RELATED METHODS AND SYSTEMS

FIELD

The present disclosure relates to detection technology and in particular to stochastic quantification of molecules. More particularly the present disclosure relates to StochQuant probabilistic detection and related methods and system.

BACKGROUND

Confidence is an inherent problem of any type of detection. It stems from the knowledge that a value obtained as a result of a detection process may not correctly represent a detected item, in view inaccuracies introduced by the detection technique used.

A confidence score is often used as a measure of the probability that a value provided in outcome of detection correctly correspond to a detected item.

In particular, with respect to detections, such as molecular detection, performed through sampling process and/or in sample or environments including target molecules present at low absolute and/or relative abundance, improving the confidence of qualitative and/or quantitative presence remains challenging in view of the inherent stochasticity of the detection system as understood by a skilled person.

SUMMARY

The present disclosure describes methods and systems to perform molecular detection according to a quantitative stochastic approach (herein StochQuant approach or StochQuant), which provides probability distributions in place of single values for a parameter used in molecular detection.

In particular, in StochQuant detection methods and systems of the disclosure, a probability distribution of a target molecule abundance in an environment (herein StochQuant probability distribution) detected in outcome of a testing measurement, is obtained as a function of i) a molecular count of the target molecule detected in the environment or a sample thereof, ii) a molecular count of a reference molecule added to or detected in, the environment a sample or a subsample thereof, in combination with iii) an absolute anchoring value of the reference molecule; and in some embodiments also iii) a quantitively measured amount (e.g. volume) of a sample or a subsample of the environment.

In StochQuant detection methods and systems of the disclosure, the testing measurement comprises or consists of a measuring workflow in which a physical manipulation of the environment, a sample and/or a subsample thereof are performed to provide the molecular counts of the target molecule and of the reference molecule as well as the anchoring measurement required to provide StochQuant probability distribution.

In StochQuant detection methods and systems of the disclosure, the StochQuant probability distribution is obtained from the molecular counts detected during the measuring workflow of the testing measurement in the form of one or more testing parameters such as read counts from sequencing or fluorescence intensity in flow cytometry as well as additional testing parameters identifiable by a skilled person.

, The StochQuant probability distribution so obtained enables a quantitative and/or qualitative detection of the target molecule that takes into account the stochasticity inherent to the detection system due in particular to the need of performing physical manipulations of the environment, a sample and/or a subsample thereof such as sampling and/or additional manipulations inherent to the detection workflow of the testing measurement used for performing detection of the target molecule in the environment a sample and/or a subsample thereof.

The stochasticity inherent to the detection system characterizes in particular detection workflow performed in an environment, sample or subsample thereof comprising a known or expected small numbers of molecules from an environment, and/or obtained during the testing measurement, as understood by a skilled person upon reading of the disclosure.

Accordingly, in StochQuant detection methods and systems of the disclosure performing in an environment a sample and/or a subsample thereof, a testing measurement in which a detection workflow configured to detect molecular counts is modeled according with StochQuant methods and system herein described, provide in place of a single value of one or more testing parameters, a probability distribution of values indicative of the detected target molecule abundance in the environment, which will account for the probability that the target molecule is present or absent in the environment, as well as the probable count of target molecule in the environment.

As a consequence, the StochQuant detection methods and systems of the disclosure provide an improvement in detection technology because StochQuant testing measurements enable detection of a target molecule in an environment with an increased confidence with respect to corresponding testing measurement performed without StochQuant detection as understood by a skilled person upon reading of the present disclosure.

In particular according to a first aspect, a method and a systems are described to improve a testing measurement for detection of an abundance of a target molecule in a physical environment. In the method and system according to the first aspect the testing measurement comprises a measuring workflow for the molecular count of a target molecule and a reference molecule.

The method comprises: i) dividing the measuring workflow into one or more measuring segments arranged in a measuring workflow order, each of the one or more measuring segments comprising one or more physical manipulations impacting the molecular count of the target molecule and/or of the reference molecule.

The method further comprises: ii) calibrating the one or more measuring segments by building corresponding stochastic representations of each of the one or more measuring segments into a computer-based system, the stochastic representations taking as inputs physical parameters of the measuring workflow.

The method also comprises: iii) chaining the corresponding stochastic representations together into a model of the measuring workflow by connecting outputs of measuring segments into inputs of other measuring segments in the measuring workflow order, such that the model takes as model inputs the physical parameters including at least a target molecule molecular count, a reference molecule molecular count, and an absolute anchoring value of the reference molecule.

The method additionally comprises: iv) configuring the computer-based system to provide a probability distribution of an abundance of the target molecule based on the model of the measuring workflow when provided the model inputs.

The related system comprises reagents and/or equipment to perform a testing measurement and embodiments of methods described in the first aspect. Examples of system components include computing devices configured to carry out one or more embodiments of the methods, computer-readable non-transient mediums encoded with programs configured to carry out one or more embodiments of the methods, PCR kits, biotech library preparation kits, flow cells, microfluidic devices, genetic tags, etc.

According to a second aspect a method and system are described to build a computer-readable program that improves a measuring workflow of a testing measurement for detection of an abundance of a target molecule in a physical environment.

The method comprises: i) dividing the measuring workflow into one or more measuring segments arranged in a measuring workflow order, each of the one or more measuring segments comprising one or more physical manipulations of a molecular count of the target molecule and/or of a reference molecule in the environment, a sample and/or a subsample thereof.

The method further comprises: ii) calibrating the one or more measuring segments by building corresponding stochastic representations of each of the one or more measuring segments into a computer-readable program, the stochastic representations taking as inputs physical parameters of the measuring workflow.

The method also comprises: iii) chaining the corresponding stochastic representations together into a model of the measuring workflow by connecting outputs of measuring segments into inputs of other measuring segments in the measuring workflow order, such that the model takes as its inputs the physical parameters including at least a target molecule molecular count, a reference molecule molecular count, and an absolute anchoring value of the reference molecule.

The method additionally comprises: iv) configuring the computer-readable program to provide a probability distribution of an abundance of the target molecule based on the model of the measuring workflow when run on a computer system and given the inputs by a user of the computer-readable program.

The related system comprises reagents and/or equipment to perform a testing measurement and embodiments of methods described in the second aspect. Examples of system components include computing devices configured to carry out one or more embodiments of the methods, computer-readable non-transient mediums encoded with programs configured to carry out one or more embodiments of the methods, PCR kits, biotech library preparation kits, flow cells, microfluidic devices, genetic tags, etc.

According to a third aspect, a method and a system are described to probabilistically detect a target molecule in an environment through a measuring workflow of a testing measurement to measure abundance of the target molecule in the environment in combination with a reference molecule.

The method comprises: i) performing the measuring workflow on the environment, a sample and/or a subsample thereof, the measuring workflow comprising one or more physical manipulations of the target molecule and/or the reference molecule in the environment, the sample and/or the subsample thereof impacting a molecular count of the target molecule and/or of the reference molecule.

The method also comprises ii) providing a molecular count of the target molecule in the environment from performing the measuring workflow by detecting the molecular count of the target molecule in the environment, the sample and/or the subsample thereof.

The method further comprises iii) providing a molecular count of a reference molecule from performing the measuring workflow by adding a known amount of the reference molecule and/or by detecting the molecular count of the reference molecule in the environment, the sample and/or the subsample thereof.

The method additionally comprises iv) providing an absolute anchoring value of the reference molecule.

The method also comprises v) based on at least the absolute anchoring value of the reference molecule, the molecular count of the target molecule, and the molecular count of the reference molecule, forming a probability distribution of abundances of the target molecule in the environment based on a modeling of the measuring workflow, the modeling taking into account stochastic properties of the physical manipulations of the target molecule. and/or the reference molecule in the environment, the sample and/or the subsample thereof.

The related system comprises reagents and/or equipment to perform a testing measurement and embodiments of methods described in the third aspect. Examples of system components include computing devices configured to carry out one or more embodiments of the methods, computer-readable non-transient mediums encoded with programs configured to carry out one or more embodiments of the methods, PCR kits, biotech library preparation kits, flow cells, microfluidic devices, genetic tags, etc.

According a fourth aspect a method and a system to probabilistically detect a target molecule in an environment, are described. The method comprises:

- performing a testing measurement comprising
  - obtaining a molecular count of the target molecule in an environment or a sample thereof; and
  - obtaining a molecular count of a reference molecule; and
- providing an absolute anchoring value of the reference molecule in the sample; and
- obtaining a probability distribution of the target molecule abundance in the sample as a function of
  - the molecular count of the target molecule;
  - the molecular count of the reference molecule; and
  - the absolute anchoring value of the reference molecule;
    
    In the method to probabilistically detect a target molecule in an environment of the first aspect, the probability distribution of the target molecule abundance in the environment is indicative of the confidence of detection or non-detection or confidence of the quantitative value of the target molecule detected in the environment.

The related system comprises reagents and/or equipment to perform a testing measurement and embodiments of methods described in the fourth aspect. Examples of system components include computing devices configured to carry out one or more embodiments of the methods, computer-readable non-transient mediums encoded with programs configured to carry out one or more embodiments of the methods, PCR kits, biotech library preparation kits, flow cells, microfluidic devices, genetic tags, etc.

According to a fifth aspect a method and a system are described to probabilistically measure an abundance of a target molecule in an environment.

The method comprises: i) determining a) an absolute anchoring value of a reference molecule in the environment.

The method further comprises ii) performing a testing measurement comprising a measurement workflow, producing quantitative testing measurements, on the environment, a sample and/or a subsample thereof, to establish:

- b) a corresponding molecular count of the target molecule in the environment; and
- c) a corresponding molecular count of the reference molecule in the environment.

The method also comprises iii) inputting a), b) and c) into a computer-based system, the computer system being configured to generate a probability distribution of abundance of the target molecule in the sample based on the basis of a), b) and c) by a model of the quantitative testing measurements.

The method additionally comprises iv) based on the probability distribution, producing, through the computer-based system, one or more of:

- confidence level of abundance values above and below a threshold abundance value of the target molecule input to the computer system;
- confidence interval of abundance values based on an abundance value confidence level of the target molecule input to the computer system; and
- abundance value confidence level based on a confidence interval of abundance values input to the computer system.

The related system comprises reagents and/or equipment to perform a testing measurement and embodiments of methods described in the fifth aspect. Examples of system components include computing devices configured to carry out one or more embodiments of the methods, computer-readable non-transient mediums encoded with programs configured to carry out one or more embodiments of the methods, PCR kits, biotech library preparation kits, flow cells, microfluidic devices, genetic tags, etc.

According to a sixth aspect a computer-based system is described comprising a processor, memory, input components, and output components.

The computer-based system is configured to: i) receive, process and store, through the input components, the processor and the memory, a) an absolute anchoring values of a reference molecule in an environment a sample and/or a subsample thereof, b) a molecular count of a target molecule in the environment as determined by a measuring workflow performed in the environment, the sample and/or a the subsample thereof, and c) a molecular count of the reference molecule in the environment as determined by the measuring workflow performed in the environment, the sample and/or a the subsample thereof.

The computer-based system is further configured to: ii) process, through the processor, a), b) and c) from i) into a model of the measuring workflow configured to obtain probabilistically distributed abundance values of the target molecule in the environment; and at least one of:

- iiia) receive, through the input components, a threshold abundance value of the target molecule and process, through the processor, the threshold abundance value of the target molecule through the probabilistically distributed abundance values of the target molecule to obtain and output, through the output components, a confidence level of abundance values above and below the threshold abundance value of the target molecule; or
- iiib) receive, through the input components, an abundance value confidence level of the target molecule and process, through the processor, the abundance value confidence level of the target molecule through the probabilistically distributed abundance values of the target molecule to obtain and output, through the output components, a confidence interval of abundance values of the target molecule; or
- iiic) receive, through the input components, a confidence interval of abundance values of the target molecule and process, through the processor, the confidence interval of abundance values of the target molecule through the probabilistically distributed abundance values of the target molecule to obtain and output, through the output components, an abundance value confidence level of the target molecule.

The related method comprises the system running a program encoded to carry out one or more of the methods described herein, including from other aspects.

According to a seventh aspect a method is to probabilistically detect a target molecule in an environment, the method comprising:

- separating a portion of the environment to obtain a sample of the environment the sample having a quantitatively measurable amount;
- providing an absolute anchoring value of a reference molecule in the sample;
- performing a testing measurement comprising
- obtaining a molecular count of the target molecule in the sample; and
- obtaining a molecular count of the reference molecule in the sample; and
- obtaining a probability distribution of the target molecule abundance in the sample as a function of
- the molecular count of the target molecule;
- the molecular count of the reference molecule;
- the absolute anchoring value of the reference molecule; and
- a quantitively measured amount of the sample;
- the probability distribution of the target molecule abundance in the sample indicative of the confidence of detection or non-detection or confidence of the quantitative value of the target molecule detected in the sample which is indicative of the probabilistic detection of the target molecule in the environment.

The related system comprises reagents and/or equipment to perform a testing measurement and embodiments of methods described in the seventh aspect. Examples of system components include computing devices configured to carry out one or more embodiments of the methods, computer-readable non-transient mediums encoded with programs configured to carry out one or more embodiments of the methods, PCR kits, biotech library preparation kits, flow cells, microfluidic devices, genetic tags, and additional system components identifiable by a skilled person.

In StochQuant detection methods and systems of the disclosure StochQuant probability distribution will thus provide an advantageous probabilistic detection (probability function) of the target molecule in the sample which is indicative and relates back to the probabilistic detection (quantitative or qualitative) of the target molecule in the environment from which the sample is obtained, as understood by a skilled person upon reading of the present disclosure.

StochQuant methods and systems provide an improvement to various fields of technology in which molecular detection is performed by method systems that determine molecular counts. In particular StochQuant methods and systems enable detection that account for the inherent stochasticity introduced by the manipulations required by a detection workflow, thus augmenting the accuracy, precision, confidence in, and reliability of the results of the detection, and solving a problem arising from the technology itself. Accordingly, StochQuant methods and systems also improve various technical fields, such as diagnostics, in-vitro diagnostics, cancer diagnostics, prenatal diagnostics, biotherapeutics, medical drug design and development, biotic treatment, bioanalysis, biotechnology, agricultural biotechnology, food testing, genetic testing, and immunology.

The StochQuant detection methods and systems herein described can be used in connection with various applications wherein accurate and/or reliable detection of a molecular count is desired, in particular in target environment including target molecule in low abundance. For example, the StochQuant detection methods and systems herein described allow in several embodiments herein described for qualitative and/or quantitative microbiome profiling and/or detection of target molecules in environments sch as tissues, organs, stool, biopsies and bodily fluids in human and veterinary medicine, or environmental sample analyses (e.g., soil and water) or sample thereof. Exemplary application of the StochQuant detection methods and systems herein described comprise, biotherapeutics, medical drug development, clinical application, diagnostic applications, in-vitro diagnostics, cancer diagnostics, prenatal diagnostics, drug development, biotic treatment, biotechnology, agricultural biotechnology, food testing, bioanalysis, genetic testing, immunology and additional applications identifiable by a skilled person.

The details of one or more embodiments of the disclosure are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate one or more embodiments of the present disclosure and, together with the detailed description and example sections, serve to explain the principles and implementations of the disclosure. Exemplary embodiments of the present disclosure will become more fully understood from the detailed description and the accompanying drawings, wherein:

FIG. 1 shows a schematic representation of a StochQuant Workflow where an exemplary set of steps directed to Build the StochQuant Workflow and an exemplary set of steps directed to use the StochQuant Workflow are schematically identified.

FIG. 2 shows schematic representations of uses of probability distribution provided by a detection method StochQuantized by a workflow, such as the one exemplified in Example 00, based on the concept of the confidence interval (FIG. 2, Panel A). These include providing a confidence level for a given confidence interval of target abundance (FIG. 2, Panel B), or providing a confidence interval for a given confidence level (FIG. 2, Panel C).

FIGS. 3A-3E: show charts and schematics reporting the result of experiments demonstrating limitations of current approaches of 16S rRNA gene sequencing and analysis of low-to-moderate biomass samples that highlight the problem of confidence in making determinations based upon 16S rRNA gene sequencing measurements. In particular, FIG. 3A shows a schematic illustration of experimental design of the sequencing experiment of a defined microbial community prepared at a range of dilutions. In this example, each dilution is an environment. Symbols correspond to different taxa (target 16S rRNA gene molecules of a taxon), as given in FIG. 3B. FIG. 3B shows rates of differential abundance type I errors (incorrectly determining taxa to be differentially abundant when they are not) for each taxon among different dilutions. FIG. 3 Cshows an exemplary trial (described in Example 4) in which a taxon in dilution 4 (MD4) was found to be over 2.3× lower in mean relative abundance compared with MD1, and an example trial in which the same taxon in MD4 was found to be nearly 2.5× higher in mean relative abundance compared to MD1. FIG. 3D, shows the results of experiments in which four sequencing replicates of a no-template control (NTC), with the 10 most abundant taxa shown, highlighting the problem of confidence in the detection and quantitative detection of low numbers of target molecules in a NTC environment that will be used for the determination of the presence of targets in the dilution environments compared to the abundance of the targets in the NTC environment. FIG. 3E shows the result of exemplary experiments illustrating variability of PCA of relative abundances with four trials, highlighting the problem of confidence in the PCA results.

FIGS. 4A-4C show charts diagrams and schematics illustrating an exemplary stochastic representations of a measuring workflow (also referred to as a forward measurement model) of amplicon sequencing provided as a representative testing measurement. The illustration of FIGS. 4A-C provides a mathematical representation of the amplicon sequencing testing measurement and therefore a model of the measuring workflow, The stochastic representations of the measuring workflow of FIGS. 4A-C describes the intrinsic variability of amplicon sequencing data from low-to-moderate biomass samples as arising from two sequential stochastic sampling events. In particular, a the model of the exemplary measuring workflow of the amplicon sequence of FIGS. 4A-C allows identification of i) Segments of the measuring workflow (also referred to as steps of the detection workflow) which are characterized by stochasticity and can be represented as probabilistic stochastic mathematical functions, and ii) Physical parameters (measurable parameters of the testing measurement workflow that affect the molecular count of the target) which parameterize the probabilistic stochastic mathematical functions to account for the stochasticity. In particular, FIG. 4A shows a schematic of an exemplary model of the measuring workflow (forward measurement model) of the amplicon sequencing testing measurement described in FIGS. 4A-C. In particular the forward measuring model of FIG. 4A, describes how probable molecular counts of a target molecule is stochastically yielded from the combination of two stochastic processes that occur during an amplicon sequencing molecular detection workflow. FIGS. 4B-C describe simulations generated by the exemplary forward measurement model illustrating the stochastic sampling of input target/reference molecules during separation of a sample of the environment (also referred to as the loading of template DNA) into a library-preparation reaction. This illustration is of the first segment of the forward measuring model. In particular, FIG. 4B describes a schematic representation of results of experiments showing that when taxon absolute abundance (the number of target molecules in an environment) and total load (the number of reference molecules in an environment) are low (10³16S rRNA gene copies/mL of target and 5×10⁴reference 16S rRNA gene copies/mL but relative abundance and the molecular count of the reference molecule (read depth) are sufficiently high (2% relative abundance and 100,000 total reads) detection and measurement noise are driven by the stochastic loading of molecules. The illustration of FIG. 4B shows that the stochasticity of the first segment of the forward measuring model accounts for the majority of the stochasticity/variability of the entire representation of the measurement workflow. FIG. 4C presents results of experiments showing that when taxon absolute abundance and total load are high (e.g. 10⁶16S rRNA gene copies/mL of target and 10¹⁰total 16S rRNA gene copies/mL) but relative abundance and read depth are low (e.g. 0.01% relative abundance and 5,000 reads), detection and measurement noise are driven by the stochastic sampling of reads on the flow cell. Accordingly, the results present in FIG. 4C show that the stochasticity of the second segment of the forward measuring model accounts for the majority of the stochasticity/variability of the entire model as will be understood by a skilled person upon reading of the present disclosure.

FIGS. 5A-5F show charts and schematics reporting results of experiments showing that the exemplary model of the measuring workflow provided by the forward measurement model, accurately describes/represents detectability and measurement noise from the actual amplicon sequencing testing measurement. In particular, FIG. 5A shows a comparison of a StochQuant simulation of a sequencing experiment (simulated read counts transformed to relative abundance by dividing the molecular count of the target molecule by the molecular count of the reference molecule) assuming a taxon relative abundance of 1.9% relative abundance under identical conditions to the experimentally observed results reported in FIGS. 3A-3E) of Pseudomonas, a taxon that was present in the defined microbial community at 1.9% relative abundance. (FIGS. 5B-D report results of experiments directed to evaluate the accuracy of the exemplary representation of the measurement workflow provided by the forward measurement model by comparing the results yielded by the forward measurement model to the results yielded by the testing measurement. in the illustration of FIGS. 5B-D. In particular, in FIG. 5B shows comparison of a StochQuant simulation of a sequencing experiment provided as representative testing measurement assuming a constant absolute abundance of 500 16S rRNA gene copies/mL, compared to experimentally observed results from a contaminant taxon. (FIG. 5C shows Experimental result validating the performance of the StochQuant model of the molecular detection workflow of amplicon sequencing provided as a representative example of a testing measurement. The result measured the detection frequency of Salmonella (0.04% relative abundance) under each of the four dilution conditions (environments) (MD1-MD4) compared to StochQuant model simulations of the same experiment. FIG. 5D shows experimental result validating the performance of the StochQuant model of the molecular detection workflow of amplicon sequencing. The result measured the coefficient of variation (% CV) for each of the top 5 defined-community taxa under each of the four dilution conditions compared to StochQuant simulations of the same experiment. FIG. 5E shows an exemplary frequency of detection of taxa in an environment based on StochQuant detection of molecular counts of a 16S RNA as a biomarker of the taxa. In the illustration of FIG. 5E the frequencies of detection are simulated as a function of taxon absolute abundance, total bacterial load, template loading volume, and read depth. Plot gradients (on a 0.0-1.0 scale) indicate the probability of detection. Limits of detection (at least 95% probability of detection) are shown for relative abundance (diagonal line) and absolute abundance (horizontal line). FIG. 5F shows a generalized schematic of the relationship between detectability of a taxon at a given absolute abundance (taxon 16S rRNA gene copies/mL) as a function of total bacterial load (total 16S rRNA gene copies/mL), template loading volume, and read depth. “Sliders” indicate how the 4 detection zones are affected by changing read depths and template loading volumes.

FIG. 6A show a set of charts reporting results of experiments illustrating a comparison of the detectability of molecular counts in form of detected frequencies of targets yielded by StochQuant simulated detection frequencies, versus experimentally observed detection frequencies (from the amplicon sequencing testing measurement) for each mock community taxon under each dilution condition. The results reported in the illustration of FIG. 6A enable evaluating the accuracy of the exemplary StochQuant simulated detection frequencies as a model of the amplicon sequencing testing measurement providing the frequencies.

FIG. 6B-D: shows a set of charts reporting results of experiments illustrating a comparison of the measurement noise (% CV) yielded by a StochQuant simulated relative abundance CV versus experimentally observed relative abundance CV from the testing measurement of amplicon sequencing. In particular, the illustration of FIG. 6B shows a diagram reporting the comparison when only the second segment of the StochQuant simulated relative abundance CV (stochastic sampling of reads on the sequencing flow cell) is considered, FIG. 6C show a diagram reporting the comparison when only the first segment of the StochQuant simulated relative abundance CV (stochastic loading of target/reference template DNA molecules) is considered, and FIG. 6D show a diagram reporting the comparison when the entire measurement workflow representation of the StochQuant simulated relative abundance CV (both the stochastic loading of target/reference molecules and sampling of reads) is considered. The results reported in FIGS. 6B-D can be used to evaluate the Accuracy of each representation as will be understood by a skilled person upon reading of the present disclosure.

FIGS. 7A-G shows charts, diagrams and schematics reporting results of experiments showing that the StochQuant representation of a detection workflow directed to detect abundance of 16S RNA marker for target taxon in an environment enables inference of taxon abundance and measurement uncertainty by yielding a probability distribution of taxon (target) abundance from a molecular count of the target molecule obtained (a read count measurement), an absolute anchoring value (total load measurement), and other StochQuant parameters including a molecular count of the reference molecule obtained via the testing measurement (“read depth”), and a measurable amount of sample separated from an environment. The illustrations of FIGS. 7A-G show that StochQuant probability distributions of number of target molecules in an environment (or abundances computed form the number of target molecules in an environment) are obtained from a molecular count of the target (read count measurement), an absolute anchoring value of the reference molecule (total load measurement), and a measurable amount of sample separated from an environment (experimental parameters) and a molecular count of the reference molecule (experimental parameters). In particular, FIG. 7A, shows a schematic of an exemplary StochQuant representation of a detection workflow yielding a probability distribution of number of target molecules detected through amplicon sequencing as the testing measurement. In this example, the StochQuant Detection Workflow infers taxon abundance and measurement uncertainty by generating a probability distribution of a taxon abundance from an observed molecular counts in the form of read-count and read depths,in combination with additional psychical parameters provided by the total microbial load (absolute anchoring value of the reference), volumes used. FIG. 7B shows a chart reporting simulated data yielded by a StochQuant representation of a measuring workflow of the amplicon sequencing of 16S rRNA as markers of a taxon at 1% relative abundance in a sample with a total microbial load of 40,000 16S rRNA gene copies/mL, with 2 μL of template loaded into the library preparation reaction, sequenced with 100,000 reads. Probability distributions of molecular counts in the form of read counts from the simulation, colored by number of molecules loaded into the library preparation reaction are shown. For visualization purposes, only molecules less than or equal to 3 are shown.

FIG. 7C shows a chart reporting the StochQuant probability distributions of abundance generated from three read-count outcomes (No-detection, 1500, and 3000 reads) under the simulation conditions in FIG. 7B with the StochQuant representation of the amplicon sequencing detection workflow. FIG. 7D shows charts reporting StochQuant probability distribution of abundance generated from one sequencing replicate each from dilutions MD1, MD2, MD3, and MD4, all sequenced at similar read depths of 105,000-114,000. FIG. 7E shows charts reporting relative abundance values drawn from each of the four distributions shown in FIG. 7D compared with all experimentally observed replicate measurements. FIGS. 7F-G show report result of a demonstration of how the StochQuant detection workflow yields measurement uncertainties from a single read-count measurement for each dilution (MD1 and MD4). Each of FIGS. 7F-G, for two taxa in the defined community, Bacillus (light) and Pseudomonas (dark), plot: (i) the StochQuant probability distributions for each taxon's relative abundance, (ii) the relative abundances drawn from StochQuant probability distributions, and (iii) the relative abundances observed experimentally in all 20-22 replicates.

FIG. 8 shows charts reporting results of experiments showing exemplary probability distributions of (FIG. 8 Panel A) relative and (FIG. 8 Panel B) absolute of Bacillus in two sequencing replicates; one for which zero reads were detected (r01) and one for which nearly 1200 reads were detected (r22).

FIGS. 9A-9H: shows charts reporting results of experiments showing an exemplary comparison of standard detection vs StochQuant detection in the analysis of a defined microbial community. In particular FIGS. 9A-B show charts reporting absolute-abundance estimates and corresponding StochQuant probability distributions of (FIG. 9A) Spirosoma (a contaminant) and (FIG. 9B) Pseudomonas (a member of the defined community) using standard and StochQuant approaches in four NTC sequencing replicates and one sequencing replicate each from dilutions MD3 and MD4. FIG. 9C reports a chart reporting a number of genera remaining in the dataset after contamination filtering with the standard approach (based on absolute abundances) and with the StochQuant approach. FIG. 9D reports PCA of the StochQuant-derived center-log-ratio transformed relative abundances of trials from FIG. 3E. FIG. 9E show a comparison of total number of times each differential abundance approach (ALDEx2, DESeq2, Kruskal, and StochQuant) incorrectly determined a taxon to be differentially abundant between two groups (P-value <0.05). FIGS. 9F-H report examples for high (FIG. 9F), medium (FIG. 9G) and low (FIG. 9H) abundance taxa illustrating how increased variability at high dilutions (MD4) can lead to spurious “statistically significant” differences and how these spurious results are corrected by StochQuant by recognizing the broadening of probability distributions at high dilutions.

FIG. 10 shows charts reporting results of sampling from StochQuant probability distributions of target relative abundance, projected onto the principal components from FIG. 3E for each of the trials. The StochQuant PCA “clouds” for each sequencing replicate in each trial are shown.

FIGS. 11A-11H show charts and schematics illustrating an exemplary analysis performed with StochQuant which identifies compositional and taxon-level differences between locations along the gastrointestinal (GI) tract of Patient 12. FIG. 11A shows a schematic of a longitudinal sample collection along the GI tract. FIG. 11B shows a chart reporting PCA of relative abundances filtered by the standard absolute-abundance contamination identification approach. FIG. 11C shows a chart reporting PCA of relative abundances filtered by the StochQuant approach, with the StochQuant generated relative abundances projected onto the same principal component axes. FIG. 11D) shows a chart reporting top 15 feature loadings of PC1 and PC2 with standard approaches. Each taxon is colored by its most common habitat (human, environmental, or ambiguous). FIG. 11E) shows a chart reporting top 15 feature loadings of PC1 and PC2 with the StochQuant approach. FIG. 11F shows a chart reporting taxa that were identified to be differentially abundant between GI locations by any of the differential abundance methods, sorted by P StochQuant. P-values <0.001 were set to 0.001 for plotting purposes. FIGS. 11G-H) shows chart reporting (Left) Relative abundances from the original 16S rRNA gene sequencing data, (Middle) StochQuant probability distributions of abundance generated from the original sequencing data, and (Right) (n=2) additional sequencing replicates of each rectum biopsy. In each plot, relative abundances below the Limit of Detection (LoD) for each sample are colored in dark gray. FIG. 11G) shows a chart reporting results of analysis in which Dialister was called differentially abundant by all three standard methods (DESeq2, ALDEx2, Kruskal) and by StochQuant. In the exemplary analysis illustrated in FIG. 11G, StochQuant predicted that the observed differences in Dialister relative abundance was greater than the StochQuant predicted measurement noise (confirmed by sequencing replicates), and therefore Dialister was determined to be differentially abundant between the TI and R of Patient 12. (H) R. gnavus was called differentially abundant by standard approaches but not by StochQuant. the exemplary analysis illustrated in FIG. 11G, StochQuant correctly infers that the presence of R. gnavus in terminal ileum biopsies but absence of R. gnavus in rectum biopsies may have occurred due to the higher LoD in the rectum biopsies. StochQuant therefore correctly determines (as confirmed by sequencing replicates) that R. gnavus is not differentially abundant, as the taxon can be stochastically detected at similar abundances in the rectum as those found in the terminal ileum. LoD in FIG. 11G is not shown, as it is below the plotting limits of the y-axis of the plot.

FIG. 12 shows charts reporting results of experiments in which total bacterial loads, were measured via digital PCR with universal 16S rRNA gene primers in each of the biological replicate biopsies from the terminal ileum (TI), ascending colon (AC), descending colon (DC), and rectum (R) in (a) Patent 12 and (b) Patient 13. (c) Total bacterial loads of the no template control (NTC) processing blanks.

FIGS. 13A-13D shows charts reporting results of experiments showing that StochQuant improves PCA analysis of low-bacterial load biopsies in Patient 13. In particular FIG. 13A shows that PCA of relative abundances filtered by the standard absolute-abundance contamination identification approach. FIG. 13B show that PCA of relative abundances filtered by the StochQuant approach (including contamination removal), with the StochQuant generated relative abundances projected onto the same principal component axes. FIG. 13C shows top 15 feature loadings of PC1 and PC2 with standard approaches. Each taxon is colored by its most common habitat (human, environmental, or ambiguous). FIG. 13D shows top 15 feature loadings of PC1 and PC2 with the StochQuant approach. The separation among sampling sites observed in standard PCA is driven by contaminants, which play a larger role as total microbial load decreases along the GI tract of this patient, (FIG. 12 Panel B). leading to the spurious trend observed in FIG. 13A

FIGS. 14A-14D shows charts reporting relative abundances of the additional four taxa that were differentially abundant by all three standard differential abundance approaches (DESeq2, ALDEx2, Kruskal) and by StochQuant between biological replicates collected from the terminal ileum (TI) (n=3) and rectum (R) (n=3) from Patient 12 (see FIG. 14A-D). (Left) Relative abundances from the original 16S rRNA gene sequencing data, (Middle) StochQuant probability distributions of abundance generated from the original sequencing data, and (Right) (n=2) additional sequencing replicates of each rectum biopsy, with relative abundances below the Limit of Detection (LoD) for each sample colored in dark gray. LoDs in (FIG. 14B-C) are not shown, as the LoDs were below the plotting limits of the y-axis of the plots.

FIGS. 15A-E shows charts reporting results of experiments showing that StochQuant improves interpretation of differential abundance analysis of low-bacterial load biopsies in Patient 13. In particular FIG. 15A shows results for taxa that were identified to be differentially abundant between GI locations by any of the differential abundance methods, sorted by PStochQuant. P-values <0.001 were set to 0.001 for plotting purposes. 46 taxa differentially abundant from DESeq2, 33 from Kruskal, 21 from ALDEx2, and 2 from StochQuant (with contamination filtering). FIG. 15B shows subtle differences in Bacteroides (one of just two taxa determined to be differentially abundant by all standard approaches and StochQuant) relative abundance were observed in initial sequencing data. StochQuant accurately predicts the small measurement noise among TI biopsies and larger measurement noise among R biopsies, and correctly infers that the difference in abundance between TI and R in Patient 13 is greater than the measurement uncertainty within each biopsy. FIG. 15C shows results for Anaerococcus which is initially undetected in TI biopsies and determined to be differentially abundant by standard approaches, but StochQuant correctly predicts that Anaerococcus can be stochastically detected in terminal ileum biopsies, which is confirmed by re-sequencing the biopsies. FIG. 15D shows statistically significant differences in abundance of Streptococcus are initially observed by all standard approaches. Even though a 100-fold difference (1% relative abundance in the TI vs less than 0.01% in the R) in relative abundance is observed, StochQuant correctly infers that these differences at low total microbial loads may have been stochastically observed. Upon re-sequencing, Streptococcus is indeed stochastically detected at similar relative abundances (1%) the rectum as in the terminal ileum. FIG. 15E similarly shows that, Senegalimassilia is detected in all (n=3) TI biopsies but completely undetected in all (n=3) R biopsies of Patient 13, and is thus differentially abundant by all tested standard approaches. StochQuant correctly infers that Senegalimassilia may not be differentially abundant with statistical significance. Furthermore, StochQuant accurately predicts that Senegalimassilia is above LoD in the TI, and thus should be consistently detected upon resequencing, while Senegalimassilia is below LoD in the R, and thus can be stochastically detected upon resequencing of rectum biopsies.

FIGS. 16A-16D chart reporting results of exemplary experiments illustrating how StochQuant probability distributions of taxon abundance and StochQuant predictions of individual sample LoDs improves interpretation and conclusions from differential abundance analyses from human samples with different total microbial loads and read depths. In particular, FIGS. 16A-D show (Left) Relative abundances from the original 16S rRNA gene sequencing data, (Middle) StochQuant probability distributions of abundance generated from the original sequencing data, and (Right) (n=2) additional sequencing replicates of each rectum biopsy, with relative abundances below the Limit of Detection (LoD) for each sample colored in dark gray. LoDs in FIG. 16A-B are not shown, as the LoDs were below the plotting limits of the y-axis of the plots. More particularly FIG. 16A Panel A) shows that StochQuant probability distributions of abundance enables sensitive detection of subtle changes in taxon abundance, such as the increase in relative abundance of Lachnospiraceae between the TI and R of Patient 12 that conservative standard methods such as ALDEx2 miss. FIG. 16 Bshows that when measurement uncertainty predicted by StochQuant is larger than the observed difference in the relative abundance of Negativibacillus, StochQuant accurately determines (as confirmed by sequencing replicates) that the taxon is not differentially abundant, which is missed by non-parametric approaches such as Kruskal-Wallis.

FIG. 16C shows that StochQuant accurately determines that Aggregatibacter is not differentially abundant, and that the observed differences in abundance and lack of detection of Aggregatibacter in the rectum biopsies are due to the higher LoD of the rectum biopsies. FIG. 16D shows that StochQuant accurately determines that the observed differential abundance of Campylobacter between TI and R biopsies is irreproducible. Even though Campylobacter was initially detected in the 3 rectum biopsies, StochQuant predicted that Campylobacter was below the LoD of each biopsy, which was confirmed by the stochastic detection of Campylobacter in the re-sequenced biopsies.

FIG. 17 shows a schematic illustration of an exemplary detection workflow of a testing measurement comprising sampling of an environment as a physical manipulation directed to perform the absolute anchoring measurement to obtain the absolute anchoring value (Sample 1) and as a physical manipulation part of the detection workflow (Sample 2).

FIG. 18 shows a schematic illustration of an exemplary detection workflow of a testing measurement comprising sampling of an environment as a physical manipulation part of the workflow, and addition of a reference molecule (“Spike in”) into the environment to obtain the absolute anchoring value.

FIG. 19 shows a schematic illustration of an exemplary detection workflow of a testing measurement comprising sampling of an environment as a physical manipulation part of the workflow, and addition of a reference molecule (“Spike in”) into the sample to obtain the absolute anchoring value.

FIG. 20 shows a schematic illustration of an exemplary detection workflow of a testing measurement comprising sampling and subsampling of an environment as a physical manipulations which are part of the workflow.

FIG. 21 shows a schematic, illustration of an exemplary detection workflow of a testing measurement performed in an environment which is a solution of nucleic acids in a tube. In the exemplary workflow illustrated in FIG. 21 the contents of the tube can be a sample from somewhere else, but the environment is the tube, since that is the container for which the user is quantitatively detecting the target. Then, the same manipulations are performed as in Example 19.

FIG. 22 shows probability distributions of target relative abundance yielded by StochQuant in connection to Example 7.

FIG. 23. This is an example of using a “spike-in” of a reference molecule in an environment to obtain an absolute anchoring value of the reference molecule in an environment. This is also an example of an Assessment of a Measurement Representation Workflow. Here, the detectability of the 16S rRNA gene of a particular taxon (Escherichia), is compared between the Measurement Workflow Representation and the amplicon sequencing measurement workflow.

FIG. 24. This is an example of using qPCR with a standard curve to measure the PCR amplification efficiency of a target molecule (Listeria 16S rRNA gene). The target was amplified using the same primers and reagents that were used for the PCR amplification manipulation of the target/reference molecules as part of an exemplary amplicon sequencing measurement workflow.

FIG. 25: A comparison between (i) the probable distributions of target abundances in each environment yielded by the StochQuant measurement representation workflow described in Example 35 and (ii) the observed molecular count of the target yielded by the shotgun sequencing testing measurement for each environment (dashed vertical line in each plot). The target in this example is provided by the “ELFBKOLN_00270” geneid in the gene annotation provided as part of the product specification sheet of she defined microbial community (Zymo Cat #D6311).

FIG. 26: This shows a comparison between the probability distribution of target abundance (in this example, the probability distribution of the concentration (target molecules per microliter) of the gene target from FIG. 26. This example shows that for this gene target, each of the probability distributions of target abundance yielded by using the StochQuant workflow accurately determined that the “ground truth” concentration of the target from the defined community (dashed horizontal line) was within the bounds of the inferred one or more probability distributions.

FIGS. 27A-27G: This is part of Example 37 that shows stripplots of (FIG. 27A) the total reads sequenced from each environment of each participant in the example, (FIG. 27B) the normalized read counts per million of the gene target, (FIG. 27C) the non-normalized read counts of the gene target, and the longitudinal timeseries of the (FIG. 27D) total reads, (FIG. 27E) target read counts, (FIG. 27F) normalized target counts per million reads, and (FIG. 27G) StochQuant probability distribution of target molecules per microliter. In FIG. 27G, the horizontal dashed line is the inferred lower limit of detection.

FIG. 28: An example of the comparing the detectability of four ERCC RNA targets in (n=1030) cells in connection to the assessment of the accuracy of the measurement workflow for an exemplary measurement workflow of single cell RNA sequencing as part of Example 38.

FIG. 29: An example of the comparing the detectability of four ERCC RNA targets in (n=1030) cells in connection to the assessment of the accuracy of the measurement workflow for an exemplary measurement workflow of single cell RNA sequencing as part of Example 39.

FIG. 30. This is part an example showing that a Neural Network trained on the measurement workflow representation and physical parameters of the measurement workflow can yield probability distributions of target abundances (in this example target copies/μL or target concentration) that yield similar mean abundances (left) and measurement uncertainties (right) expressed as a % CV of the target concentrations of the probability distributions.

Additional, exemplary embodiments, features, objects, and advantages of the present disclosure will be apparent to a skilled person from the detailed description, the examples section and the claims and the instant disclosure in its entirety.

DETAILED DESCRIPTION

The present disclosure describes methods and systems to perform detection of a target molecule in an environment according to a quantitative stochastic approach.

The term “environment” as used herein indicates a sum total of all the elements in a defined space of interest and subject to investigation. An environment can be a biological environment if it includes at least one biological elements, elements of an environment comprise molecule of any source and in particular biological molecule whether originated by living organisms or synthetically produced and/or engineered. Accordingly, environments can include different defined spaces of interest, such as their tissues, organs, and/or biofluids of an individual or aquatic or terrestrial environments. An environment in the sense of the disclosure can be subject to sampling. For example, for a blood test it could be the person, or the blood tube, or the plasma obtained from the blood, or the nucleic acids extracted from the plasma.

The term “molecule” as used herein indicates any group of two or more atoms held together by chemical bonds, subject to detection in the form of a molecular count. Molecules in the sense of the disclosure can comprise biological molecules (produced by cells and living organisms) and/or artificial molecules (artificially manufactured in a laboratory), the latter sometimes mimicking a biological molecule, as understood by a skilled person.

Accordingly, exemplary molecules in the sense of the disclosure comprise naturally occurring or synthetic nucleic acids as well as other substances attaching a nucleic acid or a nucleic acid mimic, e.g., as part of a molecular complex or as a barcode or a tag [1]. The term “nucleic acid” or “polynucleotide” as used herein indicates an organic polymer composed of two or more monomers including nucleotides, nucleosides or analogs thereof. The term “nucleotide” refers to any of several compounds that consist of a ribose or deoxyribose sugar joined to a purine or pyrimidine base and to a phosphate group and that is the basic structural unit of nucleic acids. The term “nucleoside” refers to a compound (such as guanosine or adenosine) that consists of a purine or pyrimidine base combined with deoxyribose or ribose and is found especially in nucleic acids. The term “nucleotide analog” or “nucleoside analog” refers respectively to a nucleotide or nucleoside in which one or more individual atoms have been replaced with a different atom or a with a different functional group. Exemplary functional groups that can be comprised in an analog include methyl groups and hydroxyl groups and additional groups identifiable by a skilled person. Exemplary monomers of a polynucleotide comprise deoxyribonucleotide, ribonucleotides, LNA nucleotides and PNA nucleotides as understood by a skilled person.

The term “nucleic acid” or “polynucleotide” thus includes nucleic acids of any length, and in particular DNA, RNA, analogs thereof, such as LNA and PNA, and fragments thereof, each of which can be isolated from natural sources, recombinantly produced, or artificially synthesized. Polynucleotides can typically be provided in single-stranded form or double-stranded form (herein also duplex form, or duplex). A “single-stranded polynucleotide” refers to an individual string of monomers linked together through an alternating sugar phosphate backbone. The 5′-end of a single strand polynucleotide designates the terminal residue of the single strand polynucleotide that has the fifth carbon in the sugar-ring of the deoxyribose or ribose at its terminus (5′ terminus). The 3′-end of a single strand polynucleotide designates the residue terminating at the hydroxyl group of the third carbon in the sugar-ring of the nucleotide or nucleoside at its terminus (3′ terminus). A “double-stranded polynucleotide” or “duplex polynucleotide” refers to two single-stranded polynucleotides bound to each other through complementarily binding. The duplex typically has a helical structure, such as a double-stranded DNA (dsDNA) molecule or a double stranded RNA, which is maintained largely by non-covalent bonding of base pairs between the strands and by base stacking interactions. The term “5′-3′ terminal base pair” with reference to a duplex polynucleotide refers to the base pair positioned at an end of the duplex polynucleotide that is formed by the ‘5 end of one single strand of the two single strands forming the duplex polynucleotide base-paired with the 3′ end of the single strand forming the duplex polynucleotide complementary to the one single strand.

Additional molecules in the sense of the disclosure comprise naturally occurring or synthetic proteins. The term “protein” as used herein indicates a polypeptide with a particular secondary and tertiary structure that can interact with another molecule and in particular, with other biomolecules including other proteins, DNA, RNA, lipids, metabolites, hormones, chemokines, and/or small molecules. The term “polypeptide” as used herein indicates an organic linear, circular, or branched polymer composed of two or more amino acid monomers and/or analogs thereof. The term “polypeptide” includes amino acid polymers of any length including full length proteins and peptides, as well as analogs and fragments thereof. A polypeptide of three or more amino acids is also called a protein oligomer, peptide, or oligopeptide. In particular, the terms “peptide” and “oligopeptide” usually indicate a polypeptide with less than 100 amino acid monomers. In particular, in a protein, the polypeptide provides the primary structure of the protein, wherein the term “primary structure” of a protein refers to the sequence of amino acids in the polypeptide chain covalently linked to form the polypeptide polymer. A protein “sequence”indicates the order of the amino acids that form the primary structure. Covalent bonds between amino acids within the primary structure can include peptide bonds or disulfide bonds, and additional bonds identifiable by a skilled person. Polypeptides in the sense of the present disclosure are usually composed of a linear chain of alpha-amino acid residues covalently linked by peptide bond or a synthetic covalent linkage. The two ends of the linear polypeptide chain encompassing the terminal residues and the adjacent segment are referred to as the carboxyl terminus (C-terminus) and the amino terminus (N-terminus) based on the nature of the free group on each extremity. Unless otherwise indicated, counting of residues in a polypeptide is performed from the N-terminal end (NH₂-group), which is the end where the amino group is not involved in a peptide bond to the C-terminal end (—COOH group) which is the end where a COOH group is not involved in a peptide bond. Proteins and polypeptides can be identified by x-ray crystallography, direct sequencing, immuno precipitation, and a variety of other methods as understood by a person skilled in the art. Proteins can be provided in vitro or in vivo by several methods identifiable by a skilled person. In some instances where the proteins are synthetic proteins in at least a portion of the polymer two or more amino acid monomers and/or analogs thereof are joined through chemically mediated condensation of an organic acid (—COOH) and an amine (—NH₂) to form an amide bond or a “peptide” bond. As used herein the term “amino acid”, “amino acid monomer”, or “amino acid residue” refers to organic compounds composed of amine and carboxylic acid functional groups, along with a side-chain specific to each amino acid. In particular, alpha- or a-amino acid refers to organic compounds composed of amine (—NH₂) and carboxylic acid (—COOH), and a side-chain specific to each amino acid connected to an alpha carbon. Different amino acids have different side chains and have distinctive characteristics, such as charge, polarity, aromaticity, reduction potential, hydrophobicity, and pKa. Amino acids can be covalently linked to forma polymer through peptide bonds by reactions between the amine group of a first amino acid and the carboxylic acid group of a second amino acid. Amino acid in the sense of the disclosure refers to any of the twenty naturally occurring amino acids, non-natural amino acids, and includes both D an L optical isomers.

Molecules in the sense of the disclosure includes aptamers which are short sequences of artificial nucleic acids, or peptides that bind a specific target substance, or family of target substance, exhibiting a range of affinities (K_Din the pM to μM range), with variable levels of off-target binding and are sometimes classified as chemical antibodies. [2] [3]

Molecules in the sense of the disclosure can also comprise any additional molecules that can be directly detected e.g., through use of a label of additional visualizing techniques such as microscopy. Direct single-molecule detection can be performed via methods such as the detection of RNA molecules via smFISH (as described e.g., in “Imaging individual mRNA molecules using multiple singly labeled probes” ref [4]) and “Third-generation in situ hybridization chain reaction: multiplexed, quantitative, sensitive, versatile, robust” ref. [5]).

Molecules in the sense of the disclosure can be distinguished in different types based on their capability to provide a unique molecular count following detection. Accordingly, a “type of molecule” in the sense of the present disclosure is a molecule that can provide a unique molecular count following detection. Examples comprise nucleic acid comprising different sequences of a same gene, nucleic acid from different genes, proteins labeled with different barcodes and additional types identifiable by a skilled person.

Molecules in the sense of the disclosure can also comprise molecules that can be conjugated to a nucleic acid, the nucleic acid which can be quantitatively detected via a testing measurement such as next generation sequencing. Examples of these types of molecules comprise synthetic or naturally occurring polymers, fatty acids, phospholipids, triglycerides, carbohydrates, nanoparticles, or macromolecules.

The term “target” as used herein indicates any referenced item which is selected as an item of interest. Therefore, a “target molecule” in the sense of the disclosure refers to molecule selected as molecule type of interest within the detection method: it can be formed by one type of molecule, or it can be form by a population of different types of molecules which are of interest and subject to investigation.

The term “detection” or “measurement” in the sense of the disclosure indicates the determination of the existence, presence or fact of a target in a limited portion of space, including but not limited to a sample, a reaction mixture, a molecular complex and a substrate.

A detection in the sense of the disclosure can be quantitative or qualitative. A detection is “qualitative” when it refers, relates to, or involves identification of a quality or kind of the target or signal in terms of relative abundance to another target or signal, which is not quantified, such as presence or absence. A detection is “quantitative” when it refers, relates to, or involves the measurement of quantity or amount of the target or signal (also referred as quantitation), which comprises any analysis designed to determine the amounts or proportions of the target or signal.

Accordingly, a quantitative detection or measurement in the sense of the disclosure indicates a detecting referring, relating to, or involving the measurement of quantity or amount of the target or signal (also referred as quantitation), which comprises to any analysis designed to determine the amounts or proportions of the target or signal. In quantitative detection in the sense of the disclosure the detection can be directed to detect an amount expressed as discrete value confined by integers, based number of molecule or elaboration thereof.

For example, quantitative detection of a nucleic acid can be provided using a fluorescence or spectrophotometric based method (e.g., Nanodrop or Qubit) which is considered to be proportional to the levels of the nucleic acid to be quantified as understood by a skilled person. Examples, as described e.g., in ref. [6] US Appl. Publ. 20210079447 (incorporated by reference in its entirety herein), absolute quantification of a nucleic acid can be provided by cell counting based methods such as flow cytometry, optical density, plating which is also considered to be proportional to the desired 16S nucleic acid levels. Absolute quantification of a nucleic acid can be provided by sequencing spike-in (adding a 16S sequence not in the sample at a known level, usually determined by dPCR/qPCR and then use the relative abundance after sequencing and the known abundance level that was inputted as the anchor) as will be understood by a skilled person. Absolute quantification of a nucleic acid can also be provided by detection of unique molecular identifiers (UMIs) via sequencing.

A: quantitative measurement of a total number of a referenced item provided in the form of total counts or of probability distribution of the total counts, is herein indicated also as an “absolute detection” or “absolute measurement” as understood by a skilled person upon reading of the disclosure.

In particular, in embodiments of the disclosure, the quantitative measurement in the sense of the disclosure can take the form of a molecular count. The term “molecular count” as used herein indicates a measurement indicative of the copy number of a molecule (e.g., number of read count for target nucleic acid, number of target gene as detected by digital PCR). Molecular count is a parameter related to (and often can be proportional to) absolute measurements. Molecular counts can be detected by a user (or software) who can count the number of molecules identified as the target based on physical characteristic(s) of the target as will be understood by a skilled person.

StochQuant methods and systems of the disclosure can be used in connection with one or more testing measurements directed to obtain a molecular count the target molecule in the environment in connection with detection of a reference molecule.

The term “reference” as used herein indicates an item that is selected as an item of comparison with respect to a target item. Accordingly, the term “reference molecule” as used herein indicates a molecule measured for comparison purposes in connection with the measurements, of a target molecule. As a consequence, a “reference molecule” in the sense of a disclosure is a molecule that i) can be detected, providing a molecular count, with a testing measurement providing a molecular count for the target in the sample and ii) can be measured with an absolute anchoring measurement and/or can be added in a known number of molecules.

In particular, the testing measurement of StochQuant methods and systems comprises at least one manipulation of the target molecules and/or the reference molecules which is known or expected to affect the number of the target molecules counted in the environment in view of the required manipulation of the target and/or or reference molecules and thus the molecular count which is detected in outcome of the testing measurement, thus impacting the accuracy and reliability of the measurement.

Accordingly, StochQuant methods and systems are preferably used in connection with testing measurement directed to detect target molecular known or expected to be present in the environment at a low abundance or moderate abundance since the related molecular count will be more impacted by the stochasticity introduced by the detection process, as will be understood by a skilled person.

In StochQuant methods and systems of the present disclosure, the wording “low abundance” of a target molecule in an environment, indicates a non-zero target molecule abundance that is expected to lead to irreproducible detection by a given testing measurement. Accordingly, low abundance indicates embodiments in which the target molecule is known or expected to give rise to non-zero detected molecular counts less than a certain precent of the time if the testing measurement were repeated, as understood by a skilled person. In other words, low abundance can be identified based on the ability (or lack thereof) to consistently detect a target molecule via a testing measurement. For example, less than 99% of the time, 97.5, 95% of the time can be chosen. An example of a low abundance target can be one for which the probability of detecting the target molecule at a given abundance via the testing measurement is less than 99% of the measurements, less than 97.5% of the measurements or less than 95%, as will be understood by a skilled person.

In StochQuant methods and systems of the present disclosure, the wording “moderate abundance” of a target molecule in an environment indicates a non-zero target molecule abundance that is expected to be consistently detected by a given testing measurement, but for which measurement uncertainty from the testing measurement is above a certain value, expected to impact the downstream analyses, conclusions, or decisions based on the testing measurement. For example, values of 50% uncertainty, 2× or 3× uncertainty can be used, as understood by a skilled person. An example of a moderate abundance target can be one for which the probability of quantifying the target molecule within 2× of the expected value of the testing measurement is less than 95%.

In embodiments herein described low abundance and moderate abundance can refer to a molecule known or expected to be present in an environment at low absolute and/or low relative abundance and that is detected with a testing measurement as will be understood by a skilled person upon reading of the present disclosure.

A “testing measurement” in the sense of the disclosure indicates quantitative detection performed through detection of a feature of a tested molecule which provides a molecular count. In particular, in StochQuant methods and systems herein described, a molecular count can be obtained by detection of structural features of a molecule to be counted, such as sequence of polynucleotide (typically DNA and RNA) or polypeptides (typically proteins or peptides) spatial conformation of the molecule resulting in specific binding of antibodies, and generation of specific mass spectrum which can be used to perform the count. Mass photometry can be used to count biomolecules and investigate their binding affinities, as described in ref. [7].

In particular, mass spectrometry can be used to detect a molecular count in connection with measured sequence of a polynucleotide or a polypeptide, and/or to a detected molecular mass of the molecular primarily by measuring the mass-to-charge ratio of ionized molecules. Accordingly, a measurement by mass spectrometry can be used in connection to specific structural features that can include molecular mass, isotropic composition, fragmentation patterns of the molecule, functional groups of the molecule, degree of unsaturation of a molecule, charge state of the molecule as will be understood by a skilled person.

Additional structural feature that can be detected to provide a molecular count comprise can be amino acid composition and amino acid structure of the molecular target based on an antibody-epitope interactions of the measurement performed for example by digital ELISA.

Further structural features that can detected to provide a molecular count, include presence of a tag which can advantageously performed for molecules that are not normally detected by sequencing. In some of those embodiments, the tag is provided by a nucleic acid sequence added in connection with a structural feature to be detected.

Additional structural features that can be used to perform quantitative detection with a testing measurement of the disclosure are identifiable by a skilled person.

In embodiments of StochQuant methods and systems of the disclosure, a testing measurement is directed to provide a molecular counts of detected molecules through detection of one or more structural features of the molecule provided by many detection method comprising a workflow directed to detect a molecular count.

Exemplary detection methods that can be used to perform one or more testing measurements in the sense of the disclosure comprise sequencing methods to detect a nucleic acid target, such as amplicon sequencing (16S rRNA gene sequencing described in the exemplary applications reported in Examples 3 to 15 and Examples 21 to 43 as well as in Appendix B of U.S. Provisional Application No. 63/579,291 incorporated by reference in its entirety), ITS gene sequencing, 18S rRNA gene sequencing, COI gene sequencing, ITS2 gene sequencing, RBP1 gene sequencing, RBP2 gene sequencing,V(D) J region sequencing, mitochondrial gene sequencing, functional gene sequencing). Sequencing methods may generate cDNA from either template DNA or template RNA (following reverse-transcription). Further examples of sequencing methods comprise bulk RNA sequencing (RNA-seq) to detect RNA target molecules, single cell RNA-seq to detect RNA target molecules or cell target molecules, metagenomic sequencing to detect DNA target molecules, metatranscriptomic sequencing to detect RNA target molecules, spatial transcriptomics to detect RNA target molecules, Chromatin Immunoprecipitation Sequencing (ChIP-seq) to detect DNA complex targets or DNA-protein complex targets, exome sequencing to detect exome (nucleic acid) target molecules, whole genome sequencing to detect nucleic acid target molecules, target capture gene panels, small RNA sequencing (microRNA-seq), methyl DNA sequencing, single-cell DNA-Seq, or Mate-Pair Sequencing. Examples of sequencing can be performed with short read or long read sequencing technologies. Additional methods to detect molecules such as target protein molecules include single molecule protein counting assays such as digital immunoassays such as SIMOA (as described e.g., in ref. [8], single molecule fluorescence in situ hybridization (smFISH), hybridization chain reaction (HCR) FISH, next generation sequencing (NGS) adapted for protein quantification.

Further examples of sequencing methods which can provide a testing measurement in a StochQuant methods and systems herein described comprise bulk RNA sequencing (RNA-seq), single cell RNA-seq, metagenomic sequencing, metatranscriptomic sequencing, spatial transcriptomics, Chromatin Immunoprecipitation Sequencing (ChIP-seq). These exemplary sequencing methods can be performed with short read or long read sequencing technologies as will be understood by a skilled person.

Additional methods that can be used to obtain molecular counts and can provide a testing measurement in a StochQuant methods and systems herein described comprise single molecule protein counting assays such as digital immunoassays such as SIMOA, single molecule fluorescence in situ hybridization (smFISH), hybridization chain reaction (HCR) FISH, next generation sequencing (NGS) adapted for protein quantification.

Additional methods that can be used to obtain molecular counts and can provide a testing measurement in a StochQuant methods and systems herein described comprise mass spectrometry directed to detect molecular counts for example from sequence a polypeptide or polynucleotide, or from the molecular mass of the molecular typically detected in form of mass-to-charge ratio of ionized molecules as will be understood by a skilled person.

Further methods that can be used to obtain molecular counts and can provide a testing measurement in a StochQuant methods and systems herein described comprises digital ELISA directed to detect molecular counts through detection of the amino acid composition and amino acid structure of the molecular target based on the antibody-epitope interactions of the measurement as will be understood by a skilled person.

Additional methods that can be used to obtain molecular counts and can provide a testing measurement in a StochQuant methods and systems herein described comprise detection of tagged molecular, e.g. by sequencing of a polynucleotidic tag, as will be understood by a skilled person.

Accordingly, a testing measurement in the sense of the disclosure can be performed according to any detection method configured to detect molecular counts of a target molecule as will be understood by a skilled person.

The molecular counts obtained in outcome of different measurements can take the form of one or more testing parameters which characterizes the testing measurement. For example, in testing measurement comprising RNA sequencing, the molecular count of a detected RNA can be indicated in the form or read counts. Additional example molecular counts can include: molecular counts of a target that are based on the exact match of physical characteristics of the target (e.g., the exact nucleic acid sequence), for example, the initial output of NGS is generally files that contain the physical characteristics of each sequenced “read” from the testing measurement—this could be a count of the number of reads that contain a sequencing that perfectly matches the sequence of the target of interest. Molecular counts also include molecular counts of a target identified by software or algorithms that identify key characteristics of the target to determine the number of detected target molecules—for example, a sequencing alignment software as will be understood by a skilled person.

Accordingly, molecular counts that can be obtained with testing measurement in the sense of the disclosure comprise, for example molecular counts obtained by sequencing nucleic acid target molecules, nucleic acid tags associated with target molecules, and/or amplicons generated from nucleic acid target molecules, and/or nucleic acid tags associated with one or more target molecules, as will be understood by a skilled person. Examples of sequencing methods include: amplicon sequencing (16S rRNA gene sequencing (as described in the exemplary applications reported in Examples 3 to 15 and Appendix B of U.S. Provisional Application No. 63/579,291 incorporated by reference in its entirety), ITS gene sequencing, 18S rRNA gene sequencing, COI gene sequencing, ITS2 gene sequencing, RBP1 gene sequencing, RBP2 gene sequencing,V(D) J region sequencing, mitochondrial gene sequencing, functional gene sequencing). Amplicons that can be generated by sequencing methods and then sequenced, comprise cDNA from either template DNA or template RNA (following reverse-transcription).

Other examples of molecular counting include quantifying protein-protein interactions by molecular counting with mass photometry [7] and single molecule multiplexed protein counting via modified DNA carriers with nanopore sequencing [9].

In StochQuant methods and systems, the testing measurement comprises or consist of a workflow (herein indicated as measuring workflow, detection workflow or measurement workflow) that yields a measurement of a molecular count of a molecule of interest (e.g., target molecule or reference molecule) from a target molecule in an environment. The testing measurement is formed by a set of activities which i) are required to perform the testing and ii) comprise manipulations that affect the number of detected target molecules and/or reference molecules.

The term “manipulation” as used herein in connection with a molecule, indicated modification of the physical, biological and/or chemical status of a molecule resulting from activities which form part of a testing measurement and are performed to enable detection of the molecule. Manipulations of a molecule in the sense of the disclosure is typically associated with a manipulation of the environment, sample and/or subsample thereof, where the molecule is known or expected to be present, the manipulation comprising or consisting of a modification of the physical, biological and/or chemical status of said environment, sample and/or subsample thereof.

Exemplary manipulations of a molecule in the sense of the disclosure comprise, sampling, fractionation, ligation of a barcode or an adapter, extraction such as liquid-phase extraction, fragmentation, cDNA synthesis amplification such as amplification by PCR or other amplification techniques. Additional exemplary manipulation comprise centrifugation, filtration, heat treatment, lyophilization, ultrasonication, mechanical shearing, electroporation, enzymatic digestion, cell lysis, hybridization, transfection, editing (e.g. by CRISP/Cas9), chemical crosslinking, chemical de-crosslinking, chemical denaturation, heat denaturation, precipitation, methylation/demethylation, chemical labeling, redox reactions, solid-phase extraction, chromatography, immunoprecipitation, encapsulation into droplets, microfluidic manipulations, in situ hybridization. Further exemplary manipulations in the sense of the disclosure include manipulations involved in the measurement/detection of the target/reference molecule such as fluorescent dye incorporation, nucleotide labeling, fluorophore quenching, real-time fluorescence detection, detecting emitted light from a fluorescent product, photometric detection, spectrophotometric detection. Another example is target enrichment, such as using capture probes that preferably bind to the target and/or reference molecules. Additional manipulations are identifiable by a skilled person.

In StochQuant methods and systems, the set of activities comprised in the measuring workflow of a testing measurement further comprises iii) detection of one or more physical parameters (herein also StochQuant parameters, StochQuant physical parameters or physical parameters) which are used to model the workflow and comprise at least: a) a molecular count of one or more target molecules, b) a molecule count of one or more reference molecules, and c) an absolute anchoring measurement providing a corresponding detected value.

The term “absolute anchoring measurement” in the sense of the disclosure indicates a quantitative measurement of the total number of a reference molecule the total number of the reference molecules is also indicated as the absolute anchoring value. The anchoring value can be provided in the form of a total number of molecular counts, or a probability distribution of a total number of molecular counts.

In StochQuant detection methods and systems of the disclosure absolute anchoring measurement and molecular counts of the reference molecule obtained during a testing procedure provide a standard for comparison against the molecular counts of the target molecule during the testing measurement as understood by a skilled person upon reading of the present disclosure.

In StochQuant methods and systems, the StochQuant parameters are used to provide stochastic representations of the activities of the workflow including manipulations which impact the count of detected targeted molecule and/or reference molecule. These stochastic representation form a model of the measuring workflow herein also indicated as measurement workflow representation as will be understood by a skilled person upon reading of the present disclosure.

In StochQuant methods and systems, a measurement workflow representation can thus be defined as a mathematical representation of the manipulations of the testing measurement that yields a distribution of probable molecular counts of the target that approximates the number and/or variability in the number of molecules counted resulting from the testing measurement. The measurement workflow representation can be used in a StochQuant detection workflow to obtain the probability distribution of the target molecule abundance in the environment based on the physical parameters.

In StochQuant methods and systems, a measurement workflow representation can be performed in connection with any testing measurement which result in a molecular count of a target molecule, and which affect the number of target molecules counted in an environment in view of the required manipulation of the molecules of importance (target or reference molecules) as will be understood by a skilled person upon reading of the present disclosure.

In StochQuant methods and systems of the disclosure a measurement workflow representation can include one or more measurement workflow representation segment (referred to as measuring segment or a “segment” for short).

Accordingly, in StochQuant methods and systems herein described, a testing measurement representation segment is a segment identified within the testing measurement workflow directed to detect a molecular count comprises at least one set of activities that is known or expected to impact the molecular count. The set of activities/manipulations that is selected to form segment of a measurement workflow representation depend on the abundance of the molecule, the specific activities that form part of the detection workflow, and the desired accuracy of the measurement workflow representation as will be understood by a skilled person upon reading of the present disclosure.

Exemplary segments include separation of a sample from an environment, flow cell binding (which is an example of a sampling step), amplification manipulations (e.g., PCR), isolation of target (e.g., nucleic acid extraction), and reverse transcription (RT). Other segments would be understood by one skilled in the art. In preferred embodiments, StochQuant detection methods and systems comprise a detection workflow comprising one or more of: (Segment 1) Separation of a sample from an environment and (Segment 2) a Measurement Segment.

For example, in the measurement workflow representation of amplicon sequencing provided as a proof of principle to investigate taxon abundance in a microbial community, two segments can be identified that comprise the measurement workflow representation (see e.g. Example 5). In this example, these segments are stochastic representations of Segments of the testing measurement that affect the molecular count of the target/reference molecules. It can be understood that segments of a measurement workflow representation can occur in sequence, such that the output number of molecules of a Segment are the input number of molecules into the subsequent segment. It can also be understood that the final segment of a measurement workflow representation yields a molecular count of the target molecule (or target molecules, in a workflow that includes more than one target molecule type).

In StochQuant methods and systems of the disclosure, a measurement workflow representation segment can be identified by identifying a manipulation or series of manipulations of a testing measurement workflow that: (i) can impact the molecular count of the target/reference molecule obtained via the testing measurement, (ii) can be measured via a segmental calibration that can yield a representation of the segment that can yield output numbers of target/reference molecules that approximate the output numbers of target/reference molecules of the manipulation(s) of the testing measurement, and (iii) for which the segment representation can be parameterized by the number of input target/reference molecules and/or the physical parameter of the manipulation(s) of the testing measurement that can impact the molecular count of the target/reference.

Accordingly, a user can identify the manipulations of a testing measurement workflow based on obtaining the procedures of the testing measurement workflow. These manipulations are commonly referred to as “steps of a protocol” that describe the sequential manipulations of a molecule of interest to yield a molecular count of the molecule of interest.

In StochQuant methods and systems given the manipulations of a testing measurement workflow, a user can identify the manipulation or series of manipulations for which a segmental calibration is to be performed.

In StochQuant methods and systems, at least one of the segment of a testing measurement workflow comprises a manipulation affecting of at least one of StochQuant parameter selected from the molecular count of one or more target molecules, the molecule counts of one or more reference molecule, an absolute anchoring measurement of the detection workflow providing a corresponding detected value. In StochQuant methods and systems, one or more segments of the workflow can comprise additional StochQuant parameters which are associated with and characterize the step of the protocol performed in the segment and affect the molecular count of one or more target molecules and/or one or more reference molecules. For examples, in a segment comprising a performing sampling and a polymerase chain reaction (PCR) a quantitatively measured amount of the sample, and the PCR amplification rate provides an additional StochQuant parameter for the representation of the segment as will be understood by a skilled person upon reading of the present disclosure.

In StochQuant methods and systems, at least one of the segment of a testing measurement workflow can be evaluated and the impact of the manipulations on molecular counts modeled through a segmental calibration. A “segmental calibration” can be defined as a calibration procedure that generates or acquires the data that characterizes the properties of the manipulation and that impact the molecular count to provide the physical parameters of the manipulation that will be used to parameterize the segment representation. Accordingly, data generated or acquired during segmental calibration comprise values for at least one or more StochQuant parameters as will be understood by a skilled person.

In StochQuant methods and systems, the data generated or acquired by the segmentation calibration are used to understand the physical properties of the manipulation such that the understanding can provide the physical parameters of the manipulation and the mathematical representation of the manipulation. It can be understood that generating and/or acquiring calibration data across a wider range of number of target molecules, and increasing the number of different numbers of target molecules used for the calibration, and performing more repeated measurements to obtain the calibration data can result in improved segmental calibration.

In some embodiments of the StochQuant methods and systems, performing a segmental calibration for a particular manipulation can be challenging as will be understood by a skilled person in view of technological limitations that can make it challenging to accurately characterize the properties of the manipulation that impact the molecular count. In some embodiments of the StochQuant methods and systems, performing a segmental calibration can be performed in view of the time and/or cost constraints which would limit the number of segments considered by a skilled person when performing identification of segment of a measuring workflow, which can be used for StochQuant segmental calibration.

Accordingly, in some embodiments, of the StochQuant methods and systems a segment of a measuring workflow can comprise more than one manipulation combined into a series of manipulations in a single segment of the workflow to be used for a single segmental calibration in accordance with the disclosure. For example, in those embodiments of StochQuant methods and systems, for a series of manipulations, Manipulation 1 and Manipulation 2, a segmental calibration can be performed by using a known number of molecules of interest in Manipulation 1, then subsequently performing Manipulation 2, and then obtaining calibration data that characterizes the properties of the series of Manipulation 1 and Manipulation 2. A non-limiting example is the isolation of nucleic acids from a biological specimen. In this example, the isolation of nucleic acids involves a series of manipulations. Measuring the number of molecules affected by each manipulation would be challenging, so it is common practice to measure the “extraction efficiency” or “extraction variability” that describes the number of molecules yielded by the series of manipulations that are grouped collectively to describe the manipulations of the workflow required to isolate the nucleic acids. In this case, extraction efficiency and extraction yield are physical parameters of the series of manipulations that characterize the properties of the manipulation that impact the molecular count. As such, these physical parameters characterize the fraction of molecules and the stochasticity of molecules that are yielded by the series of the manipulations as will be understood by a skilled person.

In some embodiments, identification of a segment of a measuring workflow fore related segmental calibration can be performed for a “proxy” manipulation which share the same physical biological and/or chemical properties of the manipulation comprised within the measuring workflow of the testing measurement which impact the molecular count of target and/or reference molecule detected by the testing measurement. A skilled person can understand that if a manipulation (Manipulation 1) shares the same properties of the manipulation that impact the molecular count as another manipulation (Manipulation 2), then the segmental calibration for Manipulation 1 can be used for Manipulation 2.

An exemplary proxy manipulation is provided by separating a sample from an environment. One may perform a segmentation calibration for target molecule A (e.g., a DNA molecule) (Manipulation 1). Based on the results of the segmentation calibration for molecule A and physical features of molecule A, one may use this segmentation calibration for molecule A as a proxy for the segmentation calibration for the manipulation of target moleculeB(e.g., another DNA molecule) (Manipulation 2), another exemplary proxy manipulation is provided Binding of a DNA molecule to a flow cell. One may perform a segmentation calibration for molecules of interest with a MiSeq v2 Kit Flow Cell (Manipulation 1), and one may use this segmentation calibration for a manipulation with a MiSeq v3 Kit Flow Cell (Manipulation 2). Additional proxy manipulation can be identified by a skilled person upon reading of the present disclosure.

In StochQuant methods and systems, the mathematical representation and physical parameters selected by the user can be guided by the desired accuracy of the measurement workflow representation Accordingly, skilled person will understand that in StochQuant methods and systems herein described, selection of a StochQuant Detection Accuracy can be obtained as by balancing the gain in accuracy via a Segment of the measurement workflow representation that approximate output numbers of molecules of the manipulations of a testing measurement with the cost of detection (the cost of performing the segmental calibrations, the increased complexity of the StochQuant detection, and increased computational requirements).

In some embodiment of StochQuant methods and systems, the data generation of a segmental calibration is obtained by the user.

In some embodiments of StochQuant methods and systems, the data generation of a segmental calibration has been previously performed by the user or by others (e.g., the data from the calibration is available in the literature) and as such a user can acquire the data. (see e.g. Examples 29, 35, 37)

In some embodiments of the StochQuant methods and systems, a segmental calibration is performed by retrieving data generated by measurements previously performed by the user or by others. In some embodiments of the StochQuant methods and systems, a the physical parameters of a segment to be used in a StochQuant segmental calibration are already known. (see e.g. Example 29, Example 33, Example 38)

In StochQuant methods and systems of the disclosure, segmental calibration preferably performed also in combination of assessing accuracy of the calibrate segment results in a mathematical representation of the stochasticity introduced by manipulations of the workflow segments. Exemplary common mathematical representations of the measurements of the segmental calibration can include a Poisson distribution, binomial distribution, Bernoulli distribution, normal distribution, exponential distribution, hypergeometric distribution, negative binomial distribution, and/or negative hypergeometric distribution.

It can also be understood that the mathematical representation and physical parameters selected by the user can be guided by the desired accuracy of the measurement workflow representation.

In some embodiments of the StochQuant methods and systems, the method comprises determining accuracy of a segment of a measurement workflow representation. Below are two examples:

Option 1 (verify segments in order to string them together):

- 1) For a measurement workflow representation, start with the final segment that yields the molecular count.
- 1a) Input known amounts of target/reference into an environment.
- 1b) Perform the manipulation(s) of the segment repeatedly on replicate environments containing the target/reference. Because this is the final segment, the manipulation(s) will yield a molecular count of the target/reference.
- 1c) The repeated manipulations yield a distribution of outcomes of the manipulation (in this case distributions of molecular counts of the target/reference).
- 1d) Compare the distribution of molecular counts of the target/reference from the manipulation(s) of the segment to the distribution of molecular counts yielded by the Segment Representation. Can compare using the same techniques/procedures described for the assessment of the Accuracy of the entire workflow representation.
- 2) Next, assess the accuracy of the preceding segment (the second to last segment).
- 2a) Repeat the procedure above, except perform the manipulations of the final two segments.
- 2b) Assess the Accuracy of representation of the two segments together.
- 3) Next, assess the accuracy of the preceding segment (the third to last segment) . . . repeat until you are at the start of the workflow (the first manipulation of the target in the environment).

Option 2 (verify a segment independently of all other segments):

- 1) Input known amounts of a molecule of interest (the molecule of interest can either be the target/reference or a molecule that shares the key physical features of the target/reference) into an environment.
- 2) Perform the manipulation repeatedly on replicate environments containing the molecule of interest.
- 3) Perform a measurement indicative of the number or state of the molecule of interest in each of the replicate environments to obtain a distribution of outcomes of the manipulation.
- 4) Compare the distribution of outcomes to the distribution of outcomes predicted by the segment representation (can use any of the assessment techniques/procedures described for the assessment of an entire representation workflow).

In StochQuant methods and systems of the disclosure, mathematical representations provided in outcome of a segmental calibration are chained together to provide a mathematical representation of a measuring workflow of the testing measurement as will be understood by a skilled person upon reading of the present disclosure.

In particular, in StochQuant methods and systems of the disclosure, a molecular count of a target molecule and a molecular count of a reference molecule detected during the testing measurement and typically modeled through a segmental calibration of one or more segments of a workflow of the testing measurement, are used together with an absolute anchoring value of the reference molecule; to obtain a probability distribution of the abundance of the target molecule in the environment The probability distribution provides a StochQuant detection in outcome of the testing measurement.

Accordingly, in StochQuant methods, the probability distribution of the abundance of the target molecule in an environment is obtained as a function of i) the molecular count of the target molecule; ii) the molecular count of the reference molecule; and iii) the absolute anchoring value of the reference molecule. The molecular count of the target molecule and the molecular count of the reference molecule are obtained in outcome of the testing measurement. The molecular count of the target molecule, the molecular count of the reference molecule, and the absolute anchoring measurement of the reference molecule are collectively referred to as the Physical parameters or StochQuant Parameters.

The term “probability distribution” as used herein indicates a mathematical expression (data, list, function, etc.) that describes the probability of different possible values for a given quantity of interest as understood to a skilled person.

A probability distribution can take many different forms as understood by a skilled person. For example, a probability distribution can be provided in non-parametric form as one or more target abundances, each with a probability of being the true target abundance. A probability distribution can be further provided in the form of shape parameters for a known discrete probability distribution. An example is containing the information of the probability distribution in the form of the rate parameters n and p of a negative binomial distribution. A probability distribution can be provided in the form of a list of target abundances where the representation of each target abundance (e.g., how many times the target abundance “2” occurs) is correlated with its probability. If abundance “2” is the most likely, it will appear more times than any other abundance.

In StochQuant methods and systems herein described, obtaining a probability distribution of the target molecule abundance in the environment as a function of the molecular count of the target molecule; the molecular count of the reference molecule; the absolute anchoring value of the reference molecule; and possibly additional StochQuant Parameter such as a quantitively measured amount of the sample and possibly others, as will be understood by a skilled person upon reading of the disclosure.

StochQuant methods and systems herein described the specific measurement workflow representation is used to obtain the probability distribution reporting the probable molecular counts of target molecule obtained via the testing measurement. The probable molecular count is thus based on the physical parameters modeled with segmental calibration and/or with a model of the entire workflow of the testing measurement selected to correspond to the molecular count and variability in the molecular count of the target resulting from the actual testing measurement performed.

[Accordingly, in StochQuant methods and systems herein described the number and the variation in the molecular count of the target molecule resulting from the specific activities of the testing measurement can be obtained by performing multiple testing measurements running the entire measuring workflow or multiple calibration of one or more segments of the measuring workflow as will be understood by a skilled person In some embodiment the number and the variation in the molecular count of the target molecule resulting from the specific activities of the testing measurement can be obtained by combining one or more measurement with data and/or representation of one or more segments previously obtained by the user or others as will be understood by a skilled person The StochQuant parameters so obtained can be used to obtain a mathematical representation of the segments and/or of the testing measurement.

In some embodiments of StochQuant methods and systems herein described the selection of a mathematical representation of a manipulation or series of manipulations is in the form of a known discrete probability distribution and the physical parameters which are representative of the number and variability in the number of molecules of interest yielded by the manipulation of a testing measurement as part of a StochQuant workflow (See Example 2).

In some of those embodiments the mathematical representation of the manipulation or series of manipulations can be identified with the aid of artificial intelligence (AI) approach such as machine learning approaches such as supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning, deep learning through deep neural networks, neural networks, transfer learning, generative models, ensemble learning, and dimensionality reduction techniques. For example, the relevant parameter can be input into a trained neural network, trained to produce an expected distribution of outputs for the segment or series of segments (See Example 48).

In some embodiments of StochQuant methods and systems herein described the measurement workflow representation has been pre-identified and therefore the user can perform the StochQuant detection by inputting the detected values of StochQuant physical parameters in the pre-determined measurement workflow representation (See Example 2).

In some embodiments, the measurement workflow representation can be pre-identified and loaded in a devices (e.g. a microfluidic device) with an algorithm which inputs the detected values for the StochQuant parameters in the model and displays the probability distribution, confidence level, and/or a determination based upon the probability distribution or confidence level related to the target molecule abundance.

In some embodiments, the measurement workflow representation can comprise more than one probability distribution which corresponds, and are representative of, the changes in molecular count due to the manipulation of the biological environment required by the detection activities of one or more segments.

In particular, a measurement workflow representation can be prepared to account additional various factors due to the detection activities such as intra-operator variability (that can arise due to several factors including a user's mistake), inter-operator variability (that can arise due to differing levels of consistency/variability between different users performing the same workflow), or variability of equipment performance.

In some embodiments, the probabilistic abundance of a reference molecule is used to determine the probabilistic abundance (absolute or relative) of a target molecule. This is beneficial because, if the target molecule is in low or moderate absolute and/or relative abundance, one or more sampling step can provide a highly variable number of target molecules. This variable number of molecules can give rise to a variable ratio of target to non-target molecules. Therefore, StochQuant takes this into account by treating the loading processes(es) stochastically. This can be accomplished, for example, by taking virtual random samples and simulating the molecular counts at different quantities. A measurement is taken where the simulated read count matches the observed read count for each quantitative value, thereby building a probability distribution over multiple values, each probability score representing the confidence that the target molecule matches that given abundance value.

In StochQuant methods and systems of the disclosure an Inference Procedure is performed with the measurement workflow representation to yield a probability distribution of target abundances in an environment.

In some embodiments of StochQuant methods and systems, the inference is an algorithm that uses the physical parameters of the measurement workflow representation and the measurement workflow representation to identify probable target abundances in an environment that yield molecular count of the target that are approximately equal to the molecular count of the target yielded by the testing measurement. An example is Example 6, Example 35, Example 37, Example 38.

In some embodiments, the Inference Procedure is implemented in the form of Bayesian Inference method. Examples of Bayesian Inference methods can include Markov Chain Monte Carlo (that uses common algorithms such as Metropolis-Hastings, Gibbs Sampling, Hamiltonian Monte Carlo, or No-U-Turn Sample), Variational Inference that uses common techniques such as Mean-Field Variational Inference, Stochastic Variational Inference, or Black-Box Variational Inference, Laplace Approximation, Expectation Propagation, Sequential Monte Carlo (SMC)/Particle Filters, Approximate Bayesian Computation, Integrated Nested Laplace Approximation, Bayesian Model Averaging, Empirical Bayes methods, Bayesian Nonparametrics methods such as Dirihclet Process mixtures. These approaches and other approaches like these approaches can be implemented via a software package. Examples of a software package that can implement a Bayesian Inference method can include Stan, PyMC/PyMC3, JAGS, BUGS, TensorFlow Probability, Emcee, Greta, LibBi, Edward/Edward2, BayesPy, Infer.NET, Turing.jl, SVI in Pyro, R-INLA, TMB, Pyro, SMCTC, SMC, ABC-SysBio, PyABC, EasyABC, abc, DABC, BMA, Bayes VarSel, BMS, EBglmnet, limma, ashr, vmbp, DPpackage, BNP, LibDAI, pgmpy, GraphLab Create.

In another embodiment other forms of inference can perform the same inference task of taking the measurement workflow representation and the StochQuant physical parameters (molecular count of the reference molecule obtained via the testing measurement, molecular count of the target molecule obtained via the testing measurement, the absolute anchoring value of the reference molecule, and quantifiable measured amounts) and produces a probability distribution of target molecule abundance. For example, one can take StochQuant inputs and outputs, and train a neural network to perform the regression task of predicting the probability distributions (see Example 48)

Accordingly, StochQuant is a combined experimental and computational approach as would be understood by a skilled person, that improves the quality of detection and in particular, sequencing analysis, of target molecule with particular reference to low-to-moderate abundance targets, which are difficult to analyze with standard methods.

In preferred embodiments, StochQuant detection methods and systems comprise a detection workflow configured to measure from one or more of the following environments: a sample obtained from a human such as blood, biopsy, swab (vaginal, rectal, urethral, oral, nasal), urine, stool, respiratory specimen material derived from the sample obtained from a human, such as purified, cleaned-up, isolated, etc. (e.g., nucleic acids); cells and organisms (Plants, seeds, fungi, bacteria, animals, mammalian cells) for genetic identification of an organism or for detecting a contaminating cell or organism (such as for genetic testing of seeds/plants in agriculture or yeasts/fungi/bacteria/mammalian cells in biomanufacturing); sample/material as above, but from a non-human animal instead of a human (e.g. an animal that underwent a treatment for drug discovery, or an animal for agriculture like a cow or a pig); food (e.g., testing for pathogens, sterility, genetic composition); DNA-encoded/DNA-tagged library of target molecules; wastewater, built environment, sterility filtration collection; and pooled samples of any of the preceding. In preferred embodiments, StochQuant detection methods and systems comprise a workflow configured to measure one or more target molecules related to: prenatal, cancer, infectious diseases, STIs, and BV.

In preferred embodiments, StochQuant detection methods and systems comprise a detection workflow utilizing one or more of the following reference molecules: A synthetic nucleic acid that contains a unique sequence that can easily be differentiated from target sequence and other sequences in the environment; a synthetic nucleic acid that contains similar physical properties to the target molecule(s) such that the manipulations of the workflow have a similar effect on the target and the reference. For example, a reference of similar length and GC composition to the target; plurality of 16S rRNA gene molecules (e.g., those obtained from 16S with universal primers); a molecule that is expected to be in the environment of interest, such as a gene marker of a commensal organism expected to be in the environment; and a molecule that is expected to be in the environment of interest, such as a non-mutated human sequence expected to be in the environment. In preferred embodiments, StochQuant detection methods and systems comprise a detection workflow comprising one or more of the following testing measurements: amplicon sequencing; multiplex amplicon sequencing; shotgun metagenomic sequencing; bulk RNA sequencing; and single cell RNA sequencing.

In preferred embodiments, StochQuant detection methods and systems comprise a detection workflow utilizing absolute anchoring values determined by one or more of: spike-in of a target into an environment for the absolute anchoring value and/or measurement of the efficiency and/or variability of a segment or workflow; digital PCR measurement to yield the absolute anchoring value of the reference; and qPCR with a standard curve.

In preferred embodiments, StochQuant detection methods and systems comprise manipulations comprising one or more of: separation of a sample from an environment, flow cell binding (which is an example of a sampling step), amplification manipulations (e.g., PCR), isolation of target (e.g., nucleic acid extraction), reverse transcription (RT), and target enrichment (e.g., via capture probes).

In some embodiments, StochQuant can be used in methods and a systems to improve a testing measurement for detection of an abundance of a target molecule in a physical environment. In those embodiments to the first aspect the testing measurement comprises a measuring workflow for the molecular count of a target molecule and a reference molecule to be improved by providing a molecular detection that account for stochasticity impacting the detection itself introduced by the measuring workflow.

In those embodiments the method comprises: dividing the measuring workflow into one or more measuring segments arranged in a measuring workflow order, each of the one or more measuring segments comprising one or more physical manipulations impacting the molecular count of the target molecule and/or of the reference molecule.

In some embodiments at least one of the one or more physical manipulation comprises sampling the environment or a sample or a subsample thereof from a previous measuring segment.

In some embodiments, at least one of the one or more measuring segments includes amplicon sequencing.

In some embodiments, at least one stochastic representation of the one or more measuring segments comprises calculating a distribution of data for output for said at least one stochastic representation.

In some embodiments, the distribution is one of: a Poisson distribution, binomial distribution, discrete random uniform distribution, or a negative binomial distribution.

In some embodiments, the method includes configuring the computer-based system to also provide a confidence level of an abundance of the target molecule based on the model of the measuring workflow when further provided with a threshold abundance value.

In some embodiments, the computer-based system provides the confidence level by determining a total amount of probability above the threshold abundance value within the probability distribution.

In some embodiments, the computer-based system is also configured to provide a confidence level of an abundance of the target molecule by calculating a total amount of probability within a confidence interval within the probability distribution.

In some embodiments, the confidence interval is a pre-set value.

In some embodiments, the computer-based system is also configured to provide a confidence interval of an abundance of the target molecule matching a given confidence level by calculating a total amount of probability matching the given confidence level within the confidence interval within the probability distribution.

In some embodiments, the given confidence level is input by the user of the computer-based system.

In some embodiments StochQuant can be used in methods and a systems to build a computer-readable program that improves a measuring workflow of a testing measurement for detection of an abundance of a target molecule in a physical environment. The improvement of the measuring workflow is performed by StochQuant by enabling a probabilistic detection which account for and inform the user of the stochasticity impacting the detected molecular count and resulting from the activities of the detection workflow.

The method comprises: i) dividing the measuring workflow into one or more measuring segments arranged in a measuring workflow order, each of the one or more measuring segments comprising one or more physical manipulations of a molecular count of the target molecule and/or of a reference molecule in the environment, a sample and/or a subsample thereof.

The method further comprises: ii) calibrating the one or more measuring segments by building corresponding stochastic representations of each of the one or more measuring segments into a computer-readable program, the stochastic representations taking as inputs physical parameters of the measuring workflow.

The method also comprises: iii) chaining the corresponding stochastic representations together into a model of the measuring workflow by connecting outputs of measuring segments into inputs of other measuring segments in the measuring workflow order, such that the model takes as its inputs the physical parameters including at least a target molecule molecular count, a reference molecule molecular count, and an absolute anchoring value of the reference molecule.

In some embodiments, at least one of the one or more measuring segments is a step of taking samples from the environment or from a result from a previous measuring segment.

In some embodiments, at least one of the one or more measuring segments includes amplicon sequencing.

In some embodiments, the distribution is one of: a Poisson distribution or a negative binomial distribution.

In some embodiments, the computer-readable program is further configured to provide a confidence level of an abundance of the target molecule based on the model of the measuring workflow when further provided with a threshold abundance value.

In some embodiments, the computer-readable program provides the confidence level by determining a total amount of probability above the threshold abundance value within the probability distribution.

In some embodiments, the computer-readable program is further configured to provide a confidence level of an abundance of the target molecule by calculating a total amount of probability within a confidence interval within the probability distribution.

In some embodiments, the confidence interval is a pre-set value.

In some embodiments, the computer-readable program is further configured to provide a confidence interval of an abundance of the target molecule matching a given confidence level by calculating a total amount of probability matching the given confidence level within the confidence interval within the probability distribution.

In some embodiments, the given confidence level is input by the user of the computer-readable program.

In some embodiments StochQuant can be used in methods and systems to probabilistically detect a target molecule in an environment by performing measuring workflow of a testing measurement to measure abundance of the target molecule in the environment in combination with a reference molecule. In those embodiments StochQuant enables detection of the abundance of the target molecule providing probability distributions which inform the user of the impact of stochasticity introduced by the detection workflow on the detected abundance thus improving the related testing measurement.

The method additionally comprises iv) providing an absolute anchoring value of the reference molecule.

In some embodiments, the absolute anchoring value of the reference molecule is obtained by performing in a sample of the environment an absolute anchoring measurement of the reference molecule.

In some embodiments, the absolute anchoring value of the reference molecule is a known value because the reference molecule would be added for the measuring workflow in a known amount.

In some embodiments, the reference molecule is not present in the environment but is added to the measuring workflow at some point.

In some embodiments, the absolute anchoring value is an adjusted value of an absolute anchoring measurement of the reference molecule.

In some embodiments, the measuring workflow includes amplicon sequencing.

In some embodiments, the amplicon sequencing includes one or more of: 16S rRNA gene sequencing, ITS gene sequencing, 18S rRNA gene sequencing, COI gene sequencing, ITS2 gene sequencing, RBP1 gene sequencing, RBP2 gene sequencing,V(D) J region sequencing, mitochondrial gene sequencing, functional gene sequencing.

In some embodiments, the reference molecule is a mRNA of a gene.

In some embodiments, the reference molecule is selected from: Glyceraldehyde-3-phosphate dehydrogenase (GAPDH), Phosphoglycerate kinase 1 (PGK1), Peptidylpropyl isomerase A (PPIA), ribosomal protein L13a (RPL13A), ribosomal protein large P0 (RPLP0), Beta-2-microglobulin (B2M), YWHAZ, SDHA, TFRC, GUSB, HMBS, HPRT1, TBP; bacterial housekeeping genes such as 16S, tus, rpoD, glyA, dnaB, gyrA, pykA/F, pfkA/B, mdoG, arcA; fungal housekeeping genes such as DUF221, ubcB, ADA, fis1, Cu-ATPase, psm1, spo7, spt3, DUF500, sac7, AP-2 beta, npl1, Beta-tubulin, Arabinofuranosidase-B2, Xylanase C.

In some embodiments, the reference molecule is a plurality of types of molecules simultaneously detected during the testing measurement to provide a same count.

In some embodiments, the reference molecule is multiple 16S genes which all amplify from the same primer.

In some embodiments, the plurality of molecule types that are simultaneously detected during the testing measurement are selected from multiple genes, portions of genes, regions, or portions of regions which all amplify from the same primer Lipopolysaccharides (LPS), Peptidoglycan, Teichoic acids, and specific DNA or RNA targets.

In some embodiments, the reference molecule is a plurality of types of molecules each separately detected during the testing measurement to provide separate unique counts that are used to determine at least the molecular count of the reference molecule.

In some embodiments, the forming a probability distribution of abundances of the target molecule is further based on multiple molecular counts of the reference molecule.

In some embodiments, the plurality of types of molecules are selected from multiple RNA expression reference molecules.

In some embodiments, the method also includes determining a probability that an actual abundance of the target molecule in the environment is above (or below) a threshold abundance by calculating a total area of the probability distribution higher than (or lower than) the threshold abundance. Calculating the area of the probability distribution can be done by calculating the area under the curve, by integration, by Monte Carlo integration, and other analytical, numerical, algebraic, and discrete methods identifiable by a skilled person.

In some embodiments, the method also includes determining a probability that an actual abundance of the target molecule in the environment is above (or below) or equal to a threshold abundance by calculating a total area of the probability distribution higher than (or lower than) or equal to the threshold abundance.

In some embodiments, the method also includes determining a confidence level by calculating the area of the probability distribution within a given confidence interval.

In some embodiments, the method also includes determining a confidence interval by calculating what interval within the probability distribution provides a given confidence level.

In some embodiments, the interval is centered around a given abundance value.

In some embodiments the StochQuant methods and systems, comprise determining accuracy of the measurement workflow.to assess if a measurement workflow representation yields a sufficiently accurate approximation of the testing measurement:

In some embodiments the StochQuant methods and systems the accuracy of the measurement workflow representation can be measured/assessed by comparing (a) the molecular counts of the target molecule yielded by the measurement workflow representation to (b) the molecular counts yielded by a testing measurement for which the number of target molecules in an environment is known. In some embodiments, a user can perform multiple (replicate) testing measurements to obtain a distribution of molecular counts of a target yielded by the testing measurement. Then, the user can use the measurement workflow representation (with the known number of molecules in an environment and the physical parameters obtained for the corresponding testing measurement) to yield a distribution of target molecular counts yielded by the measurement workflow representation. Then the distribution of molecular counts of the target yielded by the testing measurement and the measurement workflow representation can be compared to yield a measure of accuracy.

Exemplary procedure to perform an assessments of accuracy comprise comparing the detectability of a target via the testing measurement e.g. by comparing the number of times a target is detected to the number of times the measurement workflow representation predicts the target should be detected (see Example B6). In those embodiments, the comparison in detectability between the testing measurement and the measurement workflow representation is a measure of accuracy. In those embodiments, the Testing Representation is considered “accurate enough” if the actual detectability from the testing measurement fell within the range of detectability predicted by the testing representation.

Exemplary procedure to perform an assessments of accuracy comprise comparing the measurement noise of the testing measurement of the target, e.g. by comprising the measurement noise (in the form of a CV calculation) of a target relative abundance (target molecular count divided by reference molecular count) yielded by the testing measurement compared to the CV yielded by them measurement workflow representation. (see Example 5). Alternatively, the comparison can be performed using a test statistic-test such as the Kolmogorov-Smirnov (KS) Test to compare the distributions of molecular counts.

In some embodiments the StochQuant methods and systems for a given measure of accuracy, a user can identify an accuracy threshold. An “accuracy threshold” can be defined as a minimum value, maximum value, interval of values of a measurement of accuracy, or similar indication of accuracy. For example, in the exemplary procedure of comparing the measurement noise between the testing measurement and the measurement representation, one can set an “accuracy threshold” of 3×, meaning that the measurement noise yielded by the representation must be within 3× of the measurement noise yielded by the testing measurement.

Exemplary accuracy thresholds can include a percentage (e.g. 5%) which can be used in embodiments in which the accuracy is assessed by comparing the detectability of a target via the testing measurement.

Exemplary accuracy thresholds can also comprise a signal to noise ratio which can be used in embodiments in which the accuracy is assessed by comparing the measurement noise of the testing measurement of the target.

Exemplary accuracy thresholds can further comprise p-level value which can be used in embodiments in which a test statistic is used to assess accuracy such as the KS-Test to compare the distributions of molecular counts. In those embodiments, if the obtained p-value is below the significance level (e.g., 0.05), then the null hypothesis is rejected and one can determine that the distribution of molecular counts yielded from the measurement workflow representation differs from the distribution of molecular counts yielded from the testing measurement. In this example, if a p-value greater than 0.05 is obtained, then the measurement workflow representation is within the accuracy threshold and can be used in a StochQuant detection workflow. In embodiments of the StochQuant methods and systems, a user can perform this procedure repeatedly and for different numbers of target molecules in an environment to improve the accuracy assessment. It can be understood that increasing the number of different numbers of target molecules in an environment and performing more repeated measurements can result in improved assessment of accuracy.

In embodiments of the StochQuant methods and systems the desired accuracy of the measurement workflow representation can be defined as a measurement of how closely the measurement workflow representation can approximate the distribution of probable molecular counts of the target obtained via a testing measurement to the actual distribution of probable molecular counts of the target obtained via the testing measurement. (see Example 6).

In some embodiments the StochQuant methods and systems if a measurement workflow representation is not accurate in accordance with a desired accuracy indicated e.g. as a pre-set confidence level. In such cases, a user can improve the measurement workflow by means of any one of or combination of (i) acquiring more segmentation calibration data, (ii) further splitting the manipulations of a segment into additional segments, and/or (iii) using an alternative (but potentially more complicated and/or more computationally intensive) mathematical representation of the segment.

In many embodiments the StochQuant methods and systems StochQuant thus takes advantage of 1) an absolute anchoring measurement), 2) in combination with other known experimental parameters (physical parameters or StochQuant parameters) and in particular detection of molecular counts of the target molecule and of the reference molecule as well as quantified amount of the sample, to apply a measurement workflow representation (that in some cases utilizes Poisson statistics) to derive a probabilistic relationship between actual target molecule abundance in an environment and molecular counts obtained via a testing measurement. StochQuant was demonstrated on amplicon sequencing (16S rRNA gene sequencing) and in connection with determined of taxon abundance as explained in the exemplary experiments of Appendix A and Appendix B of U.S. Provisional Application No. 63/579,291 incorporated by reference in its entirety.

In embodiments of the disclosure probability distribution of abundance of a target molecule in an environment determined by StochQuant detection methods allows the user to identify confidence intervals of target molecule abundances, the interval giving a confidence level, which can be calculated based on the probability distribution of target molecule abundances.

The wording_“confidence interval” indicates the interval (e.g., a range of abundances of target molecule in an environment from some minimum abundance value of the confidence interval to a maximum abundance value of the confidence interval. In some embodiments, the methods and systems use a provided abundance threshold to determine a confidence level above and/or a confidence level below that threshold. In some embodiments, the methods and systems use a provided confidence interval to determine a confidence level for that interval. In some embodiments, the methods and systems use a provided confidence level to determine a confidence interval that has that level. (see FIG. 2). The units of the confidence interval (number of molecules, number of molecules per unit of volume, ratio of target molecules to another target molecule or reference molecule) should match the units of the probability distribution of target abundance in the environment. For example, if the probability distribution of target molecules in an environment is in molecules per microliter, then the confidence interval is provided in molecules per microliter. Examples of a “confidence interval” can include: from 500 target molecules to 1000 target molecules in an environment, from 50 target molecules/μL to 1000 target copies/mL in an environment, from 5 Target A molecules per Target B molecule to 10 Target A molecules per Target B molecule.

In some embodiments, the methods and systems use a provided abundance threshold to determine a confidence level above and/or a confidence level below that threshold. In some embodiments, the methods and systems use a provided confidence interval to determine a confidence level for that interval. In some embodiments, the methods and systems use a provided confidence level to determine a confidence interval that has that level. (see FIG. 2)

The wording “confidence level” indicates probability that the target molecule abundance is within the range of the confidence interval. In practice, the confidence level can be obtained from a probability distribution of target abundances in an environment by integrating over the probability distribution from the lower-bound of the confidence interval to the upper-bound of the confidence interval. In practice, this can be described as “the area under the curve” of the probability distribution, or sum of probabilities within a given confidence interval (see Examples 13-15 and Examples 40-47.

Mathematically, this can be represented:

$Prob ({CI}_{LowerBound} \leq X \leq {CI}_{UpperBound} = \int_{{CI}_{LowerBound}}^{{CI}_{UpperBound}} f (x) dx$

- where f(x) is the probability distribution.

In some embodiments, the Confidence Interval is pre-determined (e.g. +/−some set value around the measurement with maximum probability, or between two set values) and the confidence level is calculated by integrating over the probability distribution from the lower-bound of the confidence interval to the upper-bound of the confidence interval. In other words, in some embodiments, a probability distribution of target abundance and confidence interval are obtained to yield a confidence level. For example, the confidence interval can be set to 5×10{circumflex over ( )}5 to 1.5×10{circumflex over ( )}6 molecules, and when the confidence level is calculated for a given probability distribution of target abundance in an environment, the confidence level that the number of target molecules is within that range of values is 23.4%. (see Examples 40-47).

In some embodiments, the confidence level is pre-determined (e.g. 50%) and the Confidence Interval is calculated as the interval above and/or below a selected target abundance value that provides that confidence level. In other words, in some embodiments, a probability distribution of target abundance, a selected target abundance, and a confidence level are obtained to yield a Confidence Interval. For example, for a given probability distribution of target abundance, selected target abundance of 1,000 copies/mL, and confidence level of 50% (that the selected target abundance is greater than or equal to 1,000 copies/mL), the resulting calculation can yield a Confidence Interval of 1,000 copies/mL to 6,000 copies/mL For example, the interval to be demined is the range of 75% probable target molecule abundances centered around whatever the maximum probable count is, and the resulting curve can show that the interval of +/−6×10 {circumflex over ( )}4 around 1.5×10{circumflex over ( )}5 molecules gives the range of values that have a 75% probability to include the correct count.

In some embodiments, a confidence level threshold is predetermined, and the confidence levels for the two options (above and below) are calculated based on confidence interval bounded by the confidence level threshold.

The wording “confidence level threshold” indicates a pre-set minimum or maximum confidence level that can be used to make a binary decision (above vs. below the threshold). For example, if a minimum confidence level threshold of 95% is needed to determine that a target is present within a confidence interval, and a confidence level of 99% is obtained, then it is determined that the target is present within the confidence interval. (see Example 14).

For example, in some embodiments of StochQuant methods and systems a confidence level threshold of 25% is provided, with confidence levels above the confidence level threshold yielding a “positive” test result determination, and confidence level below the confidence level threshold yielding “negative” test result determination. Provided a probability distribution of target abundance and a confidence interval, a confidence level can be obtained. If the obtained confidence level is below the confidence level threshold (e.g., a confidence level of 10% for a confidence level threshold of 25%), a “negative” test result determination is yielded. If the obtained confidence level is above the confidence level threshold (e.g., a confidence level of 90% for a confidence level threshold of 25%), a “positive” test result determination is yielded.

Embodiments of StochQuant detection methods and system can comprise obtaining a confidence level from a confidence interval probability distribution of target abundance, thus improving accuracy of detection.

Accordingly, in StochQuant methods and systems herein described, in embodiments where the probability distribution of target molecules in an environment is so narrow to be approximated to a deterministic value, the StochQuantization of the related detection allows to derive a confidence interval which correspondence to a confidence level.

Consequently, each and every detection involving a molecular count in which a reference count can be obtained can be StochQuantized including single step detection and completely deterministic detections. In particular in detection workflow comprising single step detection approximated to deterministic detection, the StochQuantization will add an understanding of the confidence level of the resulting count that will otherwise be absent. This confidence level can also account for background noise and other factors such as user's mistakes if the probability distribution is chosen that account for those mistakes.

In some embodiments, StochQuant can be used to provide a method and a system to probabilistically detect a target molecule in an environment, accounting for the stochastic impact affecting the target molecule the detection due to the stochasticity introduced by the detection process. The method comprises:

- performing a testing measurement comprising
  - obtaining a molecular count of the target molecule in an environment thereof; and
  - obtaining a molecular count of a reference molecule; and
- providing an absolute anchoring value of the reference molecule; and
- obtaining a probability distribution of the target molecule abundance in the environment as a function of
  - the molecular count of the target molecule;
  - the molecular count of the reference molecule; and
  - the absolute anchoring value of the reference molecule;
    
    In the method to probabilistically detect a target molecule in an environment of the first aspect, the probability distribution of the target molecule abundance in the environment is indicative of the confidence of detection or non-detection or confidence of the quantitative value of the target molecule detected in the environment.

In some embodiments, the absolute anchoring value of the reference molecule is a value obtained by a previous measurement.

In some embodiments, the absolute anchoring value of the reference molecule is obtained by performing in the environment an absolute anchoring measurement of the reference molecule.

In some embodiments, the reference molecule is added to the environment and the absolute anchoring value of the reference molecule is a known absolute count or distribution of absolute counts of the reference molecule added to the environment.

In some embodiments, the absolute anchoring value is a single detected count.

In some embodiments, the absolute anchoring value is a plurality of detected counts.

In some embodiments, the plurality of detected counts is comprised in a distribution.

In some embodiments, the absolute anchoring value is a number which is proportional to the count and is adjusted to obtain the true count.

In some embodiments, the testing measurement is performed by 16S rRNA gene sequencing, ITS gene sequencing, 18S rRNA gene sequencing, COI gene sequencing, ITS2 gene sequencing, RBP1 gene sequencing, RBP2 gene sequencing,V(D) J region sequencing, mitochondrial gene sequencing, functional gene sequencing, bulk RNA sequencing (RNA-seq), single cell RNA-seq, metagenomic sequencing, metatranscriptomic sequencing, spatial transcriptomics, Chromatin Immunoprecipitation Sequencing (ChIP-seq SIMOA, single molecule fluorescence in situ hybridization (smFISH), hybridization chain reaction (HCR) FISH, and next generation sequencing (NGS) adapted for protein quantification.

In some embodiments, the reference molecule is a single type of molecule is one or more of the mRNA of a gene Glyceraldehyde-3-phosphate dehydrogenase (GAPDH), Phosphoglycerate kinase 1 (PGK1), Peptidylpropyl isomerase A (PPIA), ribosomal protein L13a (RPL13A), ribosomal protein large P0 (RPLP0), Beta-2-microglobulin (B2M), YWHAZ, SDHA, TFRC, GUSB, HMBS, HPRT1, TBP; 16S, tus, rpoD, glyA, dnaB, gyrA, pykA/F, pfkA/B, mdoG, arcA; DUF221, ubcB, ADA, fis1, Cu-ATPase, psm1, spo7, spt3, DUF500, sac7, AP-2 beta, npl1, Beta-tubulin, Arabinofuranosidase-B2, and Xylanase C.

In some embodiments, the reference molecule is a plurality of types of molecules simultaneously detected during the testing measurement to provide a same count such as multiple 16S genes which all amplify from the same primer.

In some embodiments, the reference molecule formed by a plurality of molecule types that are simultaneously detected during the testing measurement comprise multiple genes, portions of genes, regions, or portions of regions which all amplify from the same primer such as ITS, ITS2, 18S, COI, ITS2,V (D) J region.

In some embodiments, the reference molecule formed by a plurality of molecule types that are simultaneously detected during the testing measurement comprise types of multiple molecules all which give rise to a fluorescent signal, provided the same probe or fluorophore, such as Lipopolysaccharides (LPS), Peptidoglycan, Teichoic acids, specific DNA or RNA targets.

In some embodiments, the reference molecule is a plurality of types of molecules each separately detected during the testing measurement to provide separate unique counts.

In some embodiments, the testing measurement comprises bulk RNA-seq or shotgun metagenomic sequencing.

In some embodiments, the reference molecule comprises one or more of: a fungal cell-type specific reference molecule formed by multiple DNA molecule types; a bacterial cell-type specific reference molecule formed by multiple DNA molecule types; and a reference molecule formed by a reference DNA molecule and a reference RNA molecule.

In some embodiments, the probability distribution is obtained in non-parametric form as one or more molecular counts, each with a probability of being the true molecular count.

In some embodiments, the probability distribution is obtained in the form of shape parameters for a known discrete probability distribution.

In some embodiments, the probability distribution is obtained in the form of a list of target abundances where the representation of each target abundance is correlated with its probability.

In some embodiments, the target molecule is known or expected to be comprised in the environment and/or the sample at a low absolute abundance.

In some embodiments, the target molecule is known or expected to be comprised in the environment and/or the sample at a low relative abundance.

In some embodiments, the target molecule is comprised in a microorganism included in a microbial community, such as a microbiome.

In some embodiments, the probabilistic detection is performed in connection with detection of abundance of a microorganism and/or related taxa.

In some embodiments, the obtaining a probability distribution is performed on a computer with a processor and a memory.

In some embodiments, the computer is a network of computers.

In some embodiments, StochQuant can be used in a method and a system to probabilistically measure an abundance of a target molecule in an environment accounting for the stochasticity impacting the detected abundance which is introduced by the measurement process.