Biological samples often comprise mixtures of different types of substances (e.g., different types of cells, such as tumor cells and healthy cells, mixtures of multiple microbes, mixtures of different biological fluids, mixtures of immune cells, and/or the like). Deconvolution is generally used to estimate proportions of substances in a given sample based on known gene expression patterns within the substances, and/or to estimate the average gene expression profile within each type of substance given a known substance ratio in a given sample.
Conventional deconvolution methods often assume an additive model for sample mixture data: E(Y)=XB, where Y is an n*p matrix of gene expression in n samples and p genes, X is a p*K matrix of prototypical gene expression of the p genes in K cell types, and B is an n*K matrix of the quantities of each cell type in each sample. The additive model usually assumes that the amount of a gene transcript in a sample is the sum of the amount of the transcript in each of the sample's cell subpopulations. Additionally, by using an additive model, if a previous experiment allows estimation of the cell types' prototypical gene expression profiles X, then it is possible to estimate the matrix of cell type quantities B from X and Y. Alternatively, if B is known (e.g., by running the sample through a cell sorter before expression profiling), then the average expression profile of each cell type may be estimated. Through the introduction of prior information like the identities of genes expected to be unique to one sample type and constraints on parameters to ensure identifiability, some scientists have traditionally used this model to estimate B and X simultaneously.
The additive model, however, is problematic in a number of ways. For example, gene expression data is often log-transformed before analysis (save for qPCR data, which already exists on the log scale), and differential expression is generally measured in fold-changes, not additive increases. By transforming the data and/or utilizing it in such a manner as to incorporate it into an additive model, accuracy may be lost, resulting in incorrect results (e.g., false positives and/or false negatives of substances in a sample, or in inefficient estimates of mixing proportions and/or cell type gene expression profiles).
The methods disclosed herein describe a deconvolution method using both an additive model and a log-based calculation for more accurate gene expression calculations. This facility would be expected to be of significant benefit when analyzing sample mixtures, including but not limited to body fluid mixtures encountered in forensic analysis, and/or like sample mixtures. Specifically, described herein are statistical methods using the log or multiplicative scale and an additive model, which can calculate quantities of given fluids in a sample based on the gene expression of various targeted genes in the sample.
In some embodiments, a method for forensic biological sample identification may comprise obtaining at least one biological sample for analysis, extracting a total RNA from the biological sample, hybridizing the total RNA with at least one probe, in at least one assay, and analyzing the at least one assay using a multiplex codeset. In some implementations analyzing the assay may comprise determining a set of genes to quantify in the sample, modelling gene expression of each gene in the set of genes via generating a gene expression log function for each gene in the set of genes, and generating a maximum likelihood estimation of an amount of a biological substance in the biological sample based on the modelled gene expression of each gene in the set of genes.
In some embodiments, a method for estimating the presence of substances in at least one biological sample may comprise determining a set of biological substances to detect within a biological sample, modelling the expression of each gene in a set of unique genes in the biological substance for each biological substance in the set of biological substances, and generating an expected gene proportion model using the modelled expression of each gene in the set of unique genes in the biological substance. In some embodiments the method may further comprise generating a substance model containing a quantity of each biological substance in the set of biological substances within the biological sample, generating an expected gene expression model via using the expected gene proportion model and the substance model, and estimating gene expressing in the biological sample using the expected gene expression model. Further, the method may comprise generating an estimated sample profile based on a Maximum Likelihood Estimate of each biological substance in the set of biological substances using the estimated gene expression in the biological, calculating a likelihood ratio for each biological substance in the set of biological substances, the likelihood ratio indicating how likely the biological substance is contained in the biological sample, and determining whether each biological substance in the set of biological substances is in the biological sample based on the calculated likelihood ratio.
In some embodiments, the apparatuses, methods, and systems described herein can identify common forensically relevant body fluids and/or a variety of substances potentially present in a variety of samples, by multiplex solution hybridization of barcode probes to specific mRNA targets using a five minute direct lysis protocol. This simplified protocol with minimal hands-on requirement may facilitate routine use of mRNA profiling in casework laboratories. In contrast to most gene expression-based classifiers, the algorithm may not involve training a machine learning algorithm to optimize the ability to call samples correctly; rather, it may define a biologically reasonable model of gene expression in body fluid samples and use that model to evaluate the strength of evidence a sample provides for the presence of a particular fluid. This algorithm may allow the calculation of log-likelihoods for detection of each fluid type, making the algorithm's results more defensible in courtroom settings.
A further benefit of approaches according to some embodiments of the present disclosure is that it allows evaluation of the algorithm on all samples, including those used in training as the algorithm is based on an a priori model of gene expression in body fluid mixtures, and since its parameters may be estimated without regard to model performance, the algorithm may only minimally overfit the training data.
In some implementations, the apparatuses, methods, and systems described herein may be applied to gene expression data, protein data, metabolite data, and miRNA expression data, and/or any other data with log-scale variability. In some embodiments, the output of the methods described here can be used in classification, clustering and/or other machine learning problems. In some embodiments, the methods described here can be used to test for differential expression of a gene between samples or classes. In some embodiments, the methods described here can be used to test for the expression of a gene in a sample type.
In preferred embodiments, NanoString Technologies®'s nCounter® systems and methods are used. Probes and methods for binding and identifying specific mRNA targets have been described in, e.g., US2003/0013091, US2007/0166708, US2010/0015607, US2010/0261026, US2010/0262374, US2010/0112710, US2010/0047924, and US2014/0371088, each of which is incorporated herein by reference in its entirety.
Any aspect or embodiment described herein can be combined with any other aspect or embodiment as disclosed herein. While the disclosure has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the disclosure, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawings will be provided by the Office upon request and payment of the necessary fee.
In some embodiments, statistical analysis may be performed on a sample including at least one identifiable substance, in order to determine the composition of the sample and the gene expression within the sample. In some embodiments, exemplary cases may include forensic samples containing a plurality of substances (e.g., skin, venous blood, vaginal secretion, saliva, menstrual blood, semen, and bio-particles), and/or any sample (e.g., a biological sample) containing a plurality of substances (e.g., biological substances), which may need to be identified and/or quantified, e.g., using the gene expression of targeted genes known to be in each of the substances.
In some embodiments, referring to
Three exemplary properties of casework samples include: they often (i) comprise mixtures of two or more fluids, (ii) are limited in size and (iii) could be either partially or highly degraded. Thus, one exemplary approach to dealing with casework samples is as follows:
In some embodiments, gene expression may be best modeled on the log (multiplicative) scale. For example, a doubling of a gene's expression level may be generally considered a change comparable in magnitude to a halving of its expression level, and a gene increasing from 200 to 400 mRNA transcripts is as meaningful a difference in gene expression as a gene increasing from 2000 to 4000 counts. However, the mathematics of mixtures may be additive. For example, if a sample is half blood and half saliva, a gene's cumulative expression level may result from the summation of its expression levels in each tissue sample. Therefore, the contributions of each fluid to a mixture may be modeled on a linear scale, but discrepancies between observed and predicted expression may be measured on the log scale.
In some embodiments, a model for gene expression in a sample from a single fluid may be defined and then extended to mixtures of fluids. In some implementations, various models may be implemented, generated, stored, and/or utilized on a computing device. From there, a calculation of maximum likelihood estimates (MLEs) of fluid quantities in a sample, and the use of likelihood ratios to test for the presence of a fluid in a sample may be described.
Model for Gene Expression in a Sample from a Single Body Fluid
In some embodiments, each gene represents a given proportion of total gene expression in each fluid. For example, in an average blood sample one might expect 15% of total RNA to be HBB, 1% to be ALAS1, etc. In some embodiments these may be referred to as expected proportions XHBB, XALAS1, and/or the like. Therefore in a given blood sample, the vector of expected gene expression may be β(XHBB, XALAS1, . . . )T, where β is the total amount of RNA in the sample.
Due to both biological and technical noise, actual expression may vary around its expectation. Per the multiplicative nature behavior of gene expression, the variability may be modelled as arising from a log-normal distribution, wherein each gene may be assumed to be equally variable. A single gene's expression in a sample can then be modeled 310 using the following exemplary function:
log(yHBB)˜N(log(XHBBβ),σ2),
where yHBB may be the expression of HBB in the sample, and σ2 may be the variance (on the log scale) of HBB's expression around its expectation.
The model for mixtures may be derived from the model for single-fluid samples 312. For notation purposes, matrices may be represented with bold, uppercase letters, vectors with bold, lowercase letters, and scalars with lowercase letters. Samples may be indexed iε (1, . . . , n), genes jε (1, . . . , p), and tissues kε (1, . . . , K). The gene expression profile for a given sample may be yi=(yi1, . . . , yip)T, where yij is the expression of gene j in sample i. βik may be the amount of fluid k in sample i, and βi=(βi1, . . . , βiK) may be the vector of the amounts of all the fluids in sample i 316. Finally, a matrix X may be defined to represent the expected proportion of each gene j in each fluid type k 314, with xjk being the element in the jth row and the kth column of X, representing the expected proportion of gene j in samples from fluid k. In some implementations, the covariance matrix of the p genes' log-transformed expression levels may be notated as Σ. Additionally, the Lp norm of a matrix A may be represented as ∥A∥P (e.g., wherein p=2 in some implementations).
Referring to
E(yij)=Σk=1Kβikxjk,
and the expression for the sample's entire expected gene expression vector may be, in some embodiments 320:
E(yi)=Xβi.
Again, assuming the variability of gene expression occurs on the log scale, gene expression in a sample may be modelled as 318:
log(yi)˜N(log(Xβi),σ2I),
where I is the identity matrix and σ2 is the common variance (on the log scale) of all genes. (Note that if E(yi)=Xβi, then E(log(yi))≠log(Xβi). However, under the values considered in this application, E(log(yi)) very closely approximates log(Xβi). In some embodiments, if the data necessary to fully estimate the genes' covariance matrix is missing and/or absent, one may approximate it with σ2I.
Before applying the above model for gene expression in body fluids, one may estimate two parameters: X, e.g., the matrix of expected proportions of gene expression, and σ2, e.g., the variance of gene expression. Estimation of the X matrix is described above. σ2, the variance on the log scale common to all genes, may be estimated as the average variance of each gene in each fluid. In some implementations, X may be scaled to have columns summing to 1; in other implementations, may be scaled instead of X, neither matrix may be scaled, and/or one or both of the matrices may be scaled to a variety of different values.
Under the assumptions that log gene expression is normally distributed around the log of its expectation and that each gene is equally variable, the MLE 322 for βi can be calculated as follows:
i.e., {circumflex over (β)}i minimizes the sum of squared errors on the log scale between the observed gene expression yi and the predicted gene expression Xβ, subject to the constraint that all the elements of β are non-negative (a sample cannot have negative amounts of a fluid). If a closed-form solution to this expression does not exist, numerical methods may be used to optimize it (Byrd et al, SIAM J. Scientific Computing, 1995). The expression is not convex in β; however, its estimates may be reasonably robust to differing initial conditions, returning similar estimates with very similar log-likelihoods.
In some embodiments where the algorithm may risk overexerting itself trying to fit gene expression values in the background of the assay, subsequent layers of complexity may be added to the model. For example, in addition to fitting β terms for each fluid, a β may be added for background, with a corresponding column in the X matrix with equal weights on all genes. The background β term may be further constrained to contribute no more than some number (e.g., 15 counts) to each gene. For the same reason, all gene expression values may be truncated at 5 counts in order to derive a reasonable estimate of the average background counts 324.
In any given sample yi, one may determine which fluids are present. In some embodiments, this may involve testing whether each element of βi equals 0. One exemplary approach is to calculate the likelihood of the data under the MLE {circumflex over (β)}i and under a constrained MLE {circumflex over (β)}i,-j 326 with the βij term corresponding to the tissue in question forced to 0. The likelihood ratio under the full and constrained MLEs may summarize the evidence for the presence of the tissue of question.
Calculation of a log likelihood for the data given a MLE may involve a log gene expression which is normally distributed around the log of the predicted gene expression. Then up to a constant, the log-likelihood of yi given {circumflex over (β)}i is:
To test whether fluid j is present in sample i, we evaluate the above expression using yi and {circumflex over (β)}i and again using yi and the constrained MLE {circumflex over (β)}i,-j, and we calculate a likelihood ratio. The resulting value derived from the likelihood ratio may indicate what the sample composition is expected to include 328. In some implementations, all of the above calculations may be processed on an electronic computing device. In some implementations the electronic computing device may then present the sample composition output to a user 330, e.g., via a display module operatively coupled to the electronic computing device and configured to display the output in a digital graphical user interface, and/or the like.
In some implementations, the electronic computing device may determine and implement confidence intervals around estimated X or β values, e.g., based on the log likelihood ratio between the estimated X or β matrices and an arbitrary X or β matrix, and/or the like.
In some implementations, an electronic computing device may calculate the proportion of each substance (e.g., cell types, and/or the like) in a sample (e.g., in a tissue sample, and/or the like), e.g., using a penalty value and/or like constant. The estimation may be calculated using a function resembling the following exemplary function:
S=argmin_β{∥(log(y)−log(Xβ))TΣ−1(log(y)−log(Xβ∥P+Penalty(β)}
wherein S=the proportions of the substances in the sample, and wherein the function is subject to the constraint that the elements in β are all non-negative, and wherein Penalty(β) represents a further penalty on the elements of β (including but not limited to an “elastic net” penalty, the Dantzig selector, an Lp penalty, a group or fused lasso penalty if appropriate, any combination thereof, and/or the like). In some implementations, may be a K*1 matrix.
In some implementations, the above equation for estimating proportions of substances in a sample, may be modified by an electronic computing device such that the electronic computing device can also estimate the gene expression profile of each substance estimated to be in the sample. For example, for a gene j, its expression may be written in n samples as y′=(yi,j, . . . , ynj)T. The expected expression of gene j in each substance may be represented as x′=Xj,1, . . . , Xj,K)T, wherein X is defined as a matrix of expected proportions of gene expression, similar to the above equations. Let (βT)n*K be the matrix of the estimated proportions of each of the K cell types in the n samples. In some implementations, (βT)n*K may be a K*n matrix due to the inclusion of multiple samples.
Using the above values, x′ may be calculated using a function resembling the following exemplary function:
GE=argmin—x′{∥(log(y′)−log(βTx′))TΣ−1(log(y′)−log(βrx′))∥P+Penalty(x′)}
wherein GE=the gene expression profile in each substance, and wherein the function is subject to the constraint that the elements of x′ are all non-negative.
In some implementations, if X and β are unknown, GE and S may be combined in order to estimate both matrices jointly. For example, beginning with the most reasonable estimate possible for either X or β, one may iterate between estimating X from β, and vice-versa, until the estimates converge at values for both matrices.
In some implementations, if one column of X is unknown and the other columns are known (e.g., when cancer cells are mixed with normal tissue, due to gene expression in cancer being much more variable that gene expression in normal cells), the statistical method may estimate β using the best available estimate of the X matrix (e.g., if cancer cells and normal cells are being analyzed, one may use the average gene expression profile of cancer cells for the unknown column of X). The expression in the substance with the uncertain expression profile (e.g., the unknown column of X) may then be estimated using a function resembling the following exemplary function:
y−X
−kβ−k
wherein X−k is the X matrix without the uncertain column, and wherein β−k is the β vector without the term for the uncertain substance type.
In some implementations, one also may be able to estimate a covariance matrix Σ for each substance. Then, using substance-specific covariance matrices Σ1, . . . , Σk, the statistical method may be able to refine a global covariance matrix Σ based on the substance-specific matrices. For example, after choosing an appropriate global covariance matrix Σ (e.g., based on maximum likelihood estimation, penalized maximum likelihood estimation, the empirical covariance matrix and/or the like) in order to estimate β, an electronic computing device may use the estimated β and Σ1, . . . , Σk to determine a new covariance matrix Σ for the sample. The electronic computing device may continue to estimate β and use it and the substance-specific matrices in order to calculate a covariance matrix Σ until convergence, and/or the like.
As used in this Specification and the appended claims, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise.
Unless specifically stated or obvious from context, as used herein, the term “or” is understood to be inclusive and covers both “or” and “and”.
Unless specifically stated or obvious from context, as used herein, the term “about” is understood as within a range of normal tolerance in the art, for example within 2 standard deviations of the mean. About can be understood as within 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 00.5%, 0.1%, 0.05%, or 0.01% of the stated value. Unless otherwise clear from the context, all numerical values provided herein are modified by the term “about.”
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Although other probes, compositions, methods, and kits similar, or equivalent, to those described herein can be used in the practice of the present invention, the preferred materials and methods are described herein. It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.
In some embodiments, a ‘Codeset’ (e.g., a multiplex codeset) of 57 body fluid/tissue specific plus 10 housekeeping gene controls (TABLE 1), which is well within the 800 target technological capability of the system, may be utilized. To take advantage of the high multiplex capability of the system, biomarkers that have been demonstrated to be highly specific to a particular body fluid (e.g., PRM2 and SEMG1 for semen) may be included, as well as some that have shown a lesser degree of tissue specificity (e.g., MYOZ1 for vaginal secretions and MUC7 for saliva). See, also TABLE 2 and TABLE 3.
S. mutans 16S
S. mutans proC
S. mutans relA
S. mutans rplA
S. mutans rpoB
S. mutans rpoS
S. salivarius 16S
S. salivarius proC
S. salivarius relA
S. salivarius rplA
S. salivarius rpoB
S. salivarius rpoS
In some embodiments, datasets may include samples of highly varying RNA concentration, and may also include genes in the lower-concentration samples frequently dropped into the background noise of the assay. To ensure accurate estimates of each body fluid's average gene expression profile, samples with high expression levels of housekeeping genes may be retained for further processing.
Per the model described in the disclosure for model for gene expression in mixtures of body fluids, in some embodiments, the relative expression levels of the genes within each body fluid may be obtained; in other words, the proportion of total signature gene expression expected from each gene in a given body fluid. This is in contrast to most gene expression-based classifiers, which are more interested in each gene's absolute expression level, which can be difficult if not impossible to obtain. Therefore, each sample may be globally normalized, rescaling them so the sum of all expression values may be one value (e.g., 1) and so that each gene's expression value may be its proportion of the total signature gene expression. Then, each gene's expected proportion of expression in each fluid with its mean normalized expression value within each fluid may be estimated.
The five exemplary body fluids and skin, in some embodiments, may demonstrate highly distinct gene expression profiles, and although the signature genes may vary between samples of the same fluid, their differences between fluids may be much greater. In at least some fluids, the average expression profile may exhibit elevated expression of the fluid's putative characteristic genes, although this trend may under some circumstances be distinctly weaker in saliva samples. (See,
In some embodiments, HBB expression may dominate the blood profiles, far exceeding other blood markers such as ALAS2, ALOX5AP, AM1CA1, ANK1, AQP9, ARHGAP26, C1QR1, C5R1, CASP2, CD3G, GYPA, HBA, HMBS (PBGD), MNDA, NCFS2, and SPTB, although ALAS2 levels in blood may greatly exceed those of other genes. The putative blood marker ANK1 may not be enriched in blood samples, and may appear most prominently in saliva samples. In some circumstances, expression in semen samples may primarily come from the semen-specific genes IZUMO1, MSP, PSA (KLK3), PRM1, PRM2, SEMG1, SEMG2, and TGM4, although other genes, particularly HBB, may also be detectable. Saliva samples may have the most diffuse profile, with saliva-specific genes such as HTN3, MUC7, S. mutans 16S, S. mutans proC, S. mutans relA, S. mutans rplA, S. mutans rpoB, S. mutans rpoS, S. salivarius 16S, S. salivarius proC, S. salivarius relA, S. salivarius rplA, S. salivarius rpoB, S. salivarius rpoS, SMR3B, and STATH contributing, in some circumstances, only 28% of total measured expression. Vaginal secretion samples may have highly elevated levels of vaginal markers such as DKK4, CYP2B7P1 and to a lesser extent FUT6. Menstrual blood samples may show elevated expression of their characteristic genes, including LEFTY2, MMPI, MMP10, and MMP11. Menstrual blood samples may also contain blood (HBB, ALAS2) and vaginal secretion (CYP2B7P1) biomarkers. Skin samples may show elevated expression of skin genes such as LCE1C, IL1F7 and CCL27, although these genes may also be slightly elevated in vaginal secretions and menstrual blood. In some circumstances, HBB may be the most prevalent gene in the commercial skin preparation, in part due to the potential presence of contaminating endothelial tissue in such preparations.
At least some of the genes may be present at a non-negligible proportion of total expression in the saliva samples. If a gene highly expressed in saliva were measured, the relative expression of the other fluids' characteristic genes in saliva may shrink dramatically.
As described above, an exemplary algorithm according to some embodiments for a body fluid detection method is provided. Below is a summary of the performance predicting the body fluid composition of samples. A likelihood ratio cutoff of 100 may be used to declare whether a body fluid was detected in a given sample. In some embodiments, fluids may be called detected if their likelihood ratio exceeds 100. The algorithm may be successful in identifying the correct body fluid. If the characteristic genes for a given substance is not generally informative (e.g., there are few unique and easily detected genes in the substance), refinement of the algorithm may be performed in order to determine ways of improving the calculation in the absence of informative genetic data. In some embodiments, the sensitivity of the algorithm may be improved if samples are not degraded and/or miniscule.
In some embodiments, the algorithm may achieve better performance via varying the LR>100 cutoff.
As a preliminary indication of the ability of the method to discern admixtures of body fluids, five mixtures may be prepared by combining ½ of a 50 μl stain or single cotton swab from each body fluid. An exemplary mixture could comprise four binary (2× vaginal secretions/semen, 2× blood/saliva) and one ternary mixture (semen/saliva/vaginal secretions). The blood/saliva and vaginal secretions/semen may be biological, as opposed to technical, replicates. Using an LR of 100 as a decision threshold, several of the mixtures may be called perfectly, namely one of the vaginal secretions/semen and one of the blood/saliva samples (e.g.,
To facilitate routine analysis, a 5 minute room temperature cellular lysis protocol may be employed as an alternative to standard RNA isolation for forensic sample processing using the procedures outlined above. The method may be based upon the RLT buffer from QIAGEN which contains a high concentration of guanidine thiocyanate as well as a proprietary mix of detergents. β-mercaptoethanol (1% v/v) may also be added before use to inactivate RNAses in the lysate. Unlike most direct lysis reagents, the RLT buffer permits many biochemical reactions, such as hybridization, to take place. The released nucleic acids may be principally in the form of single stranded RNA and double stranded DNA, the latter of which therefore cannot hybridize to the single stranded probes. This fact, together with the lack of DNA titration of the assay probes to homologous DNA sequences and other reagents, thus may increase RNA assay sensitivity and specificity.
The reproducibility of the assay between standard RNA isolation/purification and direct lysis protocols from the same source material can be compared. In general, excellent concordance between the two protocols for all genes with a moderate to high degree of expression may be observed. The correlation between the protocols may break down for very lowly-expressed genes, reflecting the greater noise in the assay when measuring vanishing target. The most dramatic differences between replicates may be attributable to expected variance in RNA input amounts between lysate and purified RNA since lysate concentration is not reliably measureable by current methods. The concordance observed between lysis and purified protocols suggest that the simpler, 5 minute lysis protocol would be an efficient option for routine forensic casework workflow. (See,
Additionally, the samples excluded from training may suffer no overfitting. In some embodiments, the algorithm may utilize an LR >100 as the decision threshold for all body fluid types; in other embodiments, an alternative approach using body fluid specific thresholds may be utilized.
In some implementations, further optimization of the Codeset may be possible. For example, attenuating the HBB signal with the addition of precisely defined quantities of specifically designed unlabeled oligonucleotides complementary to the HBB RNA prior to hybridization with the full Codeset may aid in avoiding false positives arising from low level contamination with vascular tissue products. These competitively inhibit the hybridization reaction with the labeled probes. In contrast to the need to attenuate one of the blood biomarkers, the signal for the saliva biomarkers may be enhanced. Signal intensification may be accomplished by designing multiple probes that bind along a single HTN3 mRNA. In addition, the current probes may be designed to hybridize to both HTN3 and HTN1, the latter of which is also saliva specific. Alternative novel biomarkers identified by RNA-Seq studies may also be employed if the HTN3 intensification strategies fall short of expectations. In some embodiments, the ANKI probes may be re-synthesized or re-designed, and a similar approach may be taken with any non-optimally performing biomarkers. In some embodiments, additional body fluid specific biomarkers (e.g., commensal bacteria from the vagina, such as Lactobacillus sp.) may also be incorporated in order to improve assay performance.
In some embodiments, the algorithm may discern admixtures of body fluids, e.g., as shown in
Any and all references to publications or other documents, including but not limited to, patents, patent applications, articles, webpages, books, etc., presented in the present application, are herein incorporated by reference in their entirety, except insofar as the subject matter may conflict with that of the embodiments of the present disclosure (in which case what is present herein shall prevail). The referenced items are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that any invention disclosed herein is not entitled to antedate such material by virtue of prior invention.
Although example embodiments of the apparatuses, methods and systems have been described herein, other modifications to such embodiments are possible. These embodiments have been described for illustrative purposes only and are not limiting. Other embodiments are possible and are covered by the disclosure, which will be apparent from the teachings contained herein. Thus, the breadth and scope of the disclosure should not be limited by any of the above-described embodiments but should be defined only in accordance with claims supported by the present disclosure and their equivalents. In addition, any logic flow depicted in the above disclosure and/or accompanying figures may not require the particular order shown, or sequential order, to achieve desirable results. Moreover, embodiments of the subject disclosure may include methods, systems and devices which may further include any and all elements from any other disclosed methods, systems, and devices, including any and all elements corresponding to gene expression and the utilization of samples. In other words, elements from one and/or another disclosed embodiment may be interchangeable with elements from other disclosed embodiments. In addition, one or more features/elements of disclosed embodiments may be removed and still result in patentable subject matter (and thus, resulting in yet more embodiments of the subject disclosure). Still further, some embodiments of the present disclosure may be distinguishable from the prior art for expressly not requiring one and/or another feature disclosed in the prior art (e.g., some embodiments may include negative limitations). Some of the embodiments disclosed herein are within the scope of at least some of the following exemplary claims of the numerous claims which are supported by the present disclosure which may be presented.
This application claims the benefit of U.S. Provisional Application No. 62/035,019, filed Aug. 8, 2014. The contents of the aforementioned patent application are incorporated herein by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
62035019 | Aug 2014 | US |