The present disclosure relates to blood tests and proxy methods of conducting standard blood tests using genetic markers, for example, a complete blood count, comprehensive metabolic panel, chemistry panel, and thyroid-related blood tests (thyroxine, T3, and TSH levels).
Blood tests offer a variety of information for the diagnosis of diseases or conditions or maintenance of a subject's health. A well-chosen complement of blood tests, such as a complete blood count panel, comprehensive metabolic panel, or chemistry panel, can thoroughly assess one's overall state of health, as well as detect the silent warning signals that precede the development of serious diseases such as diabetes and heart disease. However, the current technology for conducting blood tests requires more than a few drops of blood. These tests require venipuncture to obtain cells and extracellular fluid (plasma) from the body for analysis. Although minimally invasive, venipuncture still requires a technician, and thus these tests cannot be performed without visiting a laboratory, whether one within a hospital or clinic or a standalone testing site. Another limitation of these tests is that for each test conducted, often at least one tube of blood collection required. For example, if a patient has orders for a complete blood count panel, comprehensive metabolic panel, and thyroid-related tests, it can require the collection of four tubes of blood. With the increased frequency of blood test monitoring, the subject can develop iatrogenic anemia, which is low red blood cell counts due to too much removal of blood. The amount of blood collected and the need to visit a laboratory for blood collection are significant obstacles for greater use of these tests as monitors of one's state of health. Thus, more convenient alternatives for obtaining the same results as standard blood tests are needed.
One aspect of the invention is directed to a method of performing a blood test. The method of performing the blood test generally includes extracting RNA from a blood sample; determining an mRNA level associated with a predictive gene in the blood sample; and converting the mRNA level into a blood test result for a target blood component, wherein the mRNA level of the predictive gene in the blood sample relates to the target blood component. In certain aspects, the method further includes selecting the predictive gene.
In an exemplary embodiment, the method comprises: extracting an RNA from a blood sample; selecting a predictive gene, wherein an mRNA level of the predictive gene in the blood sample relates a target blood component; determining the mRNA level of the predictive gene in the blood sample; and converting the mRNA level into a blood test result of the target blood component.
In certain exemplary embodiments, the blood sample is whole blood, plasma, or dried blood spot. In those embodiments wherein the blood sample is a dried blood spot, the quality of the dried blood spot may be determined by assessing quality of the extracted RNA.
Other exemplary aspects of the invention, the blood sample has a volume in the range of: 10 μl-3 ml, 10 μl-2.5 ml, 15 μl-2.5 ml, 15 μl-2 ml, 20 μl-2 ml, 25 μl-2 ml, 25 μl-1.5 ml, 30 μl-1.5 ml, 30 μl-1 ml, 10-300 μl, 10-250 μl, 15-250 μl, 15-200 μl, 20-200 μl, 25-200 μl, 25-150 μl, 30-150 μl, or 30-100 μl. In further particular aspects, the blood sample has a volume of between 10 μl and 1 ml or a volume of between 10-100 μl.
The mRNA level can be determined using many methods, for example, RNA sequencing, quantitative PCR, and hybridization. In certain preferred embodiments, the mRNA level is determined using next-generation sequencing and normalized using DESeq2 algorithm or edgeR algorithm.
In an exemplary embodiment, the blood test is reported as an amount of the target blood component; a concentration of the target blood component; a volume of the target blood component; a distribution of the target blood component; a ratio of the target blood component to a second blood component; or combinations thereof.
In one specific embodiment, the blood test is reported as a volume ratio of red blood cells to total blood (hematocrit level). In other aspects, the blood test is reported as a volume ratio of mean corpuscular hemoglobin (MCH) to mean corpuscle (cell) (MCV) (mean corpuscular hemoglobin concentration (MCHC)).
Examples of the blood test or blood component targeted by the blood test include: Absolute Basophils, Absolute Eosinophil, Absolute Lymphocyte, Absolute Monocyte, Absolute Neutrophil, Alanine Aminotransferase, Albumin, Alkaline Phosphatase, Anion Gap, Aspartate Aminotransferase, Total Bilirubin, Blood Urea Nitrogen (BUN), Calcium, Chloride, Cholesterol, CO2, Creatinine, Eosinophils, Gamma-Glutamyl Transferase (GGT), Globulin, Glucose, HDL Cholesterol, Hemoglobin, Immature Granulocyte, Lactic Dehydrogenase, LDL Cholesterol, Lymphocytes, mean corpuscular hemoglobin (MCH), mean corpuscle (cell) volume (MCV), Monocytes, mean platelet volume (MPV), Non-HDL Cholesterol, Osmolality, Inorganic Phosphorus, Platelet Count, Potassium, Total Protein, Red Blood Cell (RBC), red cell distribution width (RDW), Segmented Neutrophils, Sodium, Total T3, T3 Uptake, T7 Index, Thyroxine (T4), Triglycerides, Thyroid Stimulating Hormone (TSH), Uric Acid, VLDL Cholesterol, and White Blood Cell (WBC).
In preferred embodiments, the blood sample is whole blood, plasma, dried blood spot, or combinations thereof, and the target blood component is selected from the group consisting of: Segmented Neutrophils, Eosinophils, Prostate-Specific Antigen, red blood cells, monocytes, creatinine, lymphocytes, eosinophil, alanine aminotransferase, electrolytes, and non-HDL cholesterol.
In other preferred embodiments, the blood test includes: Prostate-Specific Antigen (PSA_total), Red Blood Cell count (RBC_m.mm3), Absolute Eosinophil, Anion Gap (AG), red cell distribution width (RDW_sd), Thyroid Index (T7), or combinations thereof.
In a particular non-limiting embodiment, converting the mRNA level into a blood test result uses the following formula: blood test result=C+C1*(gene), C and C1 are constants, and (gene) represents the mRNA level of the predictive gene. In particular preferred embodiment, the mRNA level is normalized gene count.
In a specific embodiment, the target blood component is Segmented Neutrophils and the predictive gene is: MNDA, STX3, TNFRSF1A, MSL1, or TLR1. In a specific exemplary aspects, for MNDA, C is: 21.7-40.3, 21.7-37.2, 24.8-37.2, 24.8-34.1, and 27.9-34.1; and C1 is: 21.4-39.7, 21.4-36.6, 24.4-36.6, 24.4-33.6, and 27.5-33.6; for STX3: C is 23.1-43.0, 23.1-39.7, 26.4-39.7, 26.4-36.4, and 29.8-36.4; and C1 is: 19.9-36.9, 19.9-34.1, 22.7-34.1, 22.7-31.3, and 25.6-31.3; for TNFRSF1A: C is: 20.8-38.6, 20.8-35.7, 23.8-35.7, 23.8-32.7, and 26.8-32.7, and C1 is: 22.2-41.3, 22.2-38.1, 25.4-38.1, 25.4-35.0, and 28.6-35.0; for MSL1: C is: 20.1-37.3, 20.1-34.5, 23.0-34.5, 23.0-32.7, and 25.9-31.6, and C1 is: 22.9-42.5, 22.9-39.3, 26.2-39.3, 26.2-36.0, and 29.5-36.0; for TLR1: C is: 24.9-46.3, 24.9-42.8, 28.5-42.8, 28.5-39.2, and 32.1-39.2, and C1 is: 18.2-33.7, 18.2-31.1, 20.8-31.1, 20.8-28.5, and 23.3-28.5.
In a particular exemplary embodiment, the blood sample is whole blood, the target blood component is Eosinophils, the predictive gene is: SLC29A1, SIGLEC8, IL5RA, TMIGD3, or SMPD3. In a further specific exemplary embodiment, for SLC29A1: C is: between −0.57 and −0.31, between −0.52 and −0.31, between −0.52 and −0.35, between −0.48 and −0.35, and between −0.48 and −0.39, and C1 is: 2.19-4.07, 2.19-3.75, 2.50-3.75, 2.50-3.44, and 2.81-3.44; for SIGLEC8: C is: 0.34-0.62, 0.34-0.57, 0.38-0.57, 0.38-0.53, and 0.43-0.53; and C1 is: 1.6-2.9, 1.6-2.7, 1.8-2.7, 1.8-2.5, and 2.0-2.5; for IL5RA: C is: between −0.124 and −0.067, between −0.115 and −0.067, between −0.115 and −0.076, between −0.105 and −0.076, and between −0.105 and −0.086, etc., and C1 is: 2.0-3.7, 2.0-3.4, 2.2-3.4, 2.2-3.1, and 2.5-3.1; for TMIGD3: C is: between −0.00104 and −0.00056, between −0.00096 and −0.00056, between −0.00096 and −0.00064, between −0.00088 and −0.00064, and between −0.00088 and −0.00072, and C1 is: 1.8-3.4, 1.8-3.2, 2.1-3.2, 2.1-2.9, and 2.4-2.9; for SMPD3: C is: 0.11-0.20, 0.11-0.18, 0.12-0.18, 0.12-0.17, and 0.14-0.17, and C1 is: 1.8-3.3, 1.8-3.1, 2.0-3.1, 2.0-2.8, and 2.3-2.8.
In another nonlimiting exemplary embodiment, the blood sample is dried blood spot, the target blood component is PSA_total, the predictive gene is: CTC-265F19.1, ADAM9, RAB1FIP5, SNAPC4, or LMNA. In a further specific exemplary embodiment, for CTC-265F19.1: C is: 0.30-0.56, 0.30-0.52, 0.35-0.52, 0.35-0.48, and 0.39-0.48, and C1 is: 0.37-0.68, 0.37-0.63, 0.42-0.63, 0.42-0.58, and 0.47-0.58; for ADAM9, C is: 0.30-0.56, 0.30-0.52, 0.35-0.52, 0.35-0.48, and 0.39-0.48, and C1 is: 1.2-2.2, 1.2-2.0, 1.3-2.0, 1.3-1.9, and 1.5-1.9; for RAB11FIP5: C is: 0.31-0.58, 0.31-0.53, 0.36-0.53, 0.36-0.49, and 0.40-0.49, and C1 is: 0.42-0.77, 0.42-0.71, 0.48-0.71, 0.48-0.65, and 0.53-0.65; for SNAPC4: C is: 0.31-0.58, 0.31-0.53, 0.36-0.53, 0.36-0.49, and 0.40-0.49, and C1 is: 0.43-0.80, 0.43-0.74, 0.49-0.74, 0.49-0.67, and 0.55-0.67; for LMNA, C is: 0.29-0.53, 0.29-0.49, 0.33-0.49, 0.33-0.45, and 0.37-0.45, and C1 is: 0.24-0.45, 0.24-0.42, 0.28-0.42, 0.28-0.38, and 0.31-0.38.
In yet other particular embodiments, the blood sample is dried blood spot, the target blood component is Eosinophils, the predictive gene is: SCARNA22, SNORA36C, SNORA11, RN7SL4P, or SNHG15. In a further specific exemplary embodiment, for SCARNA22: C is: 0.9-1.7, 0.9-1.6, 1.0-1.6, 1.0-1.4, and 1.2-1.4, and C1 is: 1.1-2.0, 1.1-1.8, 1.2-1.8, 1.2-1.7, and 1.4-1.7; for SNORA36C: C is: 0.9-1.7, 0.9-1.6, 1.1-1.6, 1.1-1.5, and 1.2-1.5, and C1 is: 1.0-1.9, 1.0-1.8, 1.2-1.8, 1.2-1.6, and 1.3-1.6, for SNORA11: C is: 0.9-1.6, 0.9-1.5, 1.0-1.5, 1.0-1.4, and 1.1-1.4, and C1 is: 1.0-1.9, 1.0-1.7, 1.2-1.7, 1.2-1.6, and 1.3-1.6; for RN7SL4P: C is: 0.7-1.4, 0.7-1.3, 0.8-1.3, 0.8-1.2, and 1.0-1.2, and C1 is: 1.1-2.0, 1.1-1.9, 1.3-1.9, 1.3-1.7, and 1.4-1.7; for SNHG15, C is: 1.0-1.8, 1.0-1.7, 1.1-1.7, 1.1-1.5, and 1.3-1.5, and C1 is: 1.0-1.8, 1.0-1.6, 1.1-1.6, 1.1-1.5, and 1.2-1.5.
In further exemplary embodiments, the blood sample is plasma, the target blood component is PSA_total, the predictive gene is: HNRNPA3P3, GTF3A, RP1l-342M1.6, HNRNPLP2, and RPS1 P5. In a further specific exemplary embodiment, for HNRNPA3P3: C is: 0.15-0.27, 0.15-0.25, 0.17-0.25, 0.17-0.23, and 0.19-0.23, and C1 is: 0.33-0.61, 0.33-0.56, 0.38-0.56, 0.38-0.52, and 0.42-0.52; for GTF3A: C is: between −0.48 and −0.26, between −0.45 and −0.26, between −0.45 and −0.30, between −0.41 and −0.30, and between −0.41 and −0.34, C1 is: 0.7-1.3, 0.7-1.2, 0.8-1.2, 0.8-1.1, and 0.9-1.1; for RP11-342M1.6: C is: 0.28-0.52, 0.28-0.48, 0.32-0.48, 0.32-0.44, and 0.36-0.44; and C1 is: 0.23-0.42, 0.23-0.39, 0.26-0.39, 0.26-0.36, and 0.29-0.36. In further aspects, for HNRNPLP2: C is: 0.23-0.43, 0.23-0.39, 0.26-0.39, 0.26-0.36, and 0.30-0.36; and C1 is: 0.22-0.41, 0.22-0.38, 0.25-0.38, 0.25-0.35, and 0.29-0.35. In yet further aspects, for RPS11P5: C is: 0.17-0.32, 0.17-0.29, 0.20-0.29, 0.20-0.27, and 0.22-0.27; and C1 is: 0.34-0.64, 0.34-0.59, 0.39-0.59, 0.39-0.54, and 0.44-0.54.
In yet further embodiments, the blood sample is plasma, the blood test is Red Blood Cell count (RBC_m.mm3), the predictive gene is: UTY, DDX3Y, ZFY, TXLNGY, and RPS4Y1. In a further specific exemplary embodiment, for UTY: C is: 3.1-5.8, 3.1-5.4, 3.6-5.4, 3.6-4.9, and 4.0-4.9, and C1 is: 0.24-0.45, 0.24-0.41, 0.28-0.41, 0.28-0.38, and 0.31-0.38; for DDX3Y: C is: 3.1-5.8, 3.1-5.4, 3.6-5.4, 3.6-4.9, and 4.0-4.9, and C1 is: 0.23-0.43, 0.23-0.40, 0.27-0.40, 0.27-0.37, and 0.30-0.37; for ZFY: C is: 3.1-5.8, 3.1-5.4, 3.6-5.4, 3.6-4.9, and 4.0-4.9, and C1 is: 0.23-0.43, 0.23-0.40, 0.26-0.40, 0.26-0.36, and 0.30-0.36; for TXLNGY: C is: 3.2-5.9, 3.2-5.4, 3.6-5.4, 3.6-5.0, and 4.1-5.0; and C1 is: 0.23-0.42, 0.23-0.39, 0.26-0.39, 0.26-0.36, and 0.29-0.36; for RPS4Y1: C is: 3.2-5.9, 3.2-5.4, 3.6-5.4, 3.6-5.0, and 4.1-5.0, and C1 is: 0.22-0.42, 0.22-0.38, 0.26-0.38, 0.26-0.35, and 0.29-0.35.
In yet another example, converting the mRNA level into the blood test result uses the following formula: blood test result=C+C1*(gene1)+C2*(gene2)+ . . . +Cn*(genen), n is 1, 2, 3, 4, or 5, C, C1, C2, . . . and Cn are constants, and (gene1), (gene2), . . . , and (genen) represent the mRNA level of gene1, gene2, . . . , and genen. In particular embodiments, the mRNA level is the normalized gene count.
In a particular exemplary embodiment, the blood sample is whole blood, the target blood component is Segmented Neutrophils, gene1 is RNF24, gene2 is MNDA, and gene3 is WIPF1. In some aspects, C is: 19.7-36.6, 19.7-33.8, 22.5-33.8, 22.5-31.0, and 25.4-31.0; C1 is: 4.6-8.6, 4.6-7.9, 5.3-7.9, 5.3-7.3, and 5.9-7.3; C2 is: 7.4-13.8, 7.4-12.7, 8.5-12.7, 8.5-11.7, and 9.5-11.7; and C3 is: 11.6-21.5, 11.6-19.8, 13.2-19.8, 13.2-18.2, and 14.9-18.2.
In yet another embodiment, the blood sample is whole blood, the target blood component is Lymphocytes, gene1 is GRB2, gene2 is MNDA, and gene3 is NFAM1, C is: 43.0-79.8, 43.0-73.6, 49.1-73.6, 49.1-67.5, and 55.2-67.5; C1 is: between −18.8 and −10.1, between −17.3 and −10.1, between −17.3 and −11.5, between −15.9 and −11.5, and between −15.9 and −13.0; C2 is: between −11.1 and −6.0, between −10.2 and −6.0, between −10.2 and −6.8, between −9.4 and −6.8, and between −9.4 and −7.7; and C3 is: between −13.0 and −7.0, between −12.0 and −7.0, between −12.0 and −8.0, between −11.0 and −8.0, and between −11.0 and −9.0.
In further embodiments, the blood sample is whole blood, the target blood component is Monocytes, gene1 is NAGA, gene2 is RIN2, gene3 is ADA2, gene4 is PLXNB2, and gene5 is ANXA2, C is: between −1.9 and −1.0, between −1.8 and −1.0, between −1.8 and −1.2, between −1.6 and −1.2, and between −1.6 and −1.3, etc; C1 is: 1.8-3.4, 1.8-3.2, 2.1-3.2, 2.1-2.9, and 2.4-2.9; C2 is: 2.2-4.2, 2.2-3.8, 2.6-3.8, 2.6-3.5, and 2.9-3.5; C3 is: 2.9-5.5, 2.9-5.0, 3.4-5.0, 3.4-4.6, and 3.8-4.6; C4 is: between −3.9 and −2.1, between −3.6 and −2.1, between −3.6 and −2.4, between −3.3 and −2.4, and between −3.3 and −2.7; and C5 is: 1.2-2.2, 1.2-2.0, 1.4-2.0, 1.4-1.9, and 1.5-1.9.
In still further embodiments, the blood sample is plasma, the target blood component is Absolute Eosinophil, gene, is CLC, gene2 is ADAT1, gene3 is SNRPEP4, and gene4 is GPC6, C is: 0.0021-0.0039, 0.0021-0.0036, 0.0024-0.0036, 0.0024-0.0033, and 0.0027-0.0033; C1 is: 0.041-0.075, 0.041-0.070, 0.046-0.070, 0.046-0.064, and 0.052-0.064; C2 is: 0.078-0.144, 0.078-0.133, 0.089-0.133, 0.089-0.122, and 0.100-0.122; C3 is: between −0.035 and −0.019, between −0.032 and −0.019, between −0.032 and −0.022, between −0.030 and −0.022, and between −0.030 and −0.024; and C4 is: 0.012-0.022, 0.012-0.020, 0.014-0.020, 0.014-0.019, and 0.015-0.019.
In another embodiment, the blood sample is plasma, the blood test is Anion Gap (Anion.Gap, AG), gene1 is DHX40, gene2 is SLC1A4, gene3 is IMPA2, gene4 is KATNA1, and gene5 is MEIS3P1, C is: 5.9-11.0, 5.9-10.2, 6.8-10.2, 6.8-9.3, and 7.6-9.3; C1 is: 1.7-3.2, 1.7-2.9, 1.9-2.9, 1.9-2.7, and 2.2-2.7; C2 is: between −1.3 and −0.7, between −1.2 and −0.7, between −1.2 and −0.8, between −1.1 and −0.8, and between −1.1 and −0.9; C3 is: 0.9-1.6, 0.9-1.5, 1.0-1.5, 1.0-1.4, and 1.1-1.4; C4 is: 1.2-2.2, 1.2-2.0, 1.3-2.0, 1.3-1.8, and 1.5-1.8; and C5 is: 0.35-0.66, 0.35-0.61, 0.40-0.61, 0.40-0.56, and 0.46-0.56.
In some embodiments, the blood sample is plasma, the target blood component is Segmented Neutrophils, gene, is RXFP1, gene2 is POLR3GL, gene3 is FOXK2, and gene4 is LAMB1, C is: 41.0-76.1, 41.0-70.2, 46.8-70.2, 46.8-64.4, and 52.7-64.4; C1 is: 1.5-2.8, 1.5-2.5, 1.7-2.5, 1.7-2.3, and 1.9-2.3; C2 is: between −7.1 and −3.8, between −6.5 and −3.8, between −6.5 and −4.4, between −6.0 and −4.4, and between −6.0 and −4.9; C3 is: 3.6-6.6, 3.6-6.1, 4.1-6.1, 4.1-5.6, and 4.6-5.6; and C4 is: 1.6-2.9, 1.6-2.7, 1.8-2.7, 1.8-2.4, and 2.0-2.4.
In other embodiments, the blood sample is whole blood or plasma, the blood test is red blood cell distribution width (RDW_sd), gene1 is CHCHD2P6 from plasma, gene2 is SEC63P1 from plasma, gene3 is DNAL1 from whole blood, and gene4 is ENSG00000197262 from whole blood, C is: 26.2-48.7, 26.2-44.9, 30.0-44.9, 30.0-41.2, and 33.7-41.2; C1 is: 1.0-1.9, 1.0-1.8, 1.2-1.8, 1.2-1.6, and 1.3-1.6; C2 is: 1.0-1.9, 1.0-1.8, 1.2-1.8, 1.2-1.6, and 1.3-1.6; C3 is: 2.3-4.2, 2.3-3.9, 2.6-3.9, 2.6-3.6, and 2.9-3.6; and C4 is: 0.8-1.6, 0.8-1.5, 1.0-1.5, 1.0-1.3, and 1.1-1.3.
In yet other embodiments, the blood sample is whole blood or plasma, the blood test is Thyroid Index (T7.Index), gene1 is IGHV3-33 from whole blood, gene2 is ZNF266 from whole blood, gene3 is CCDC183-AS1 from whole blood, gene4 is ENSG00000232745 from plasma, C is: 1.9-3.5, 1.9-3.2, 2.2-3.2, 2.2-3.0, and 2.4-3.0; C1 is: between −0.20 and −0.11, between −0.18 and −0.11, between −0.18 and −0.12, between −0.17 and −0.12, and between −0.17 and −0.14; C2 is: between −0.99 and −0.53, between −0.91 and −0.53, between −0.91 and −0.61, between −0.84 and −0.61, and between −0.84 and −0.69; C3 is: 0.21-0.38, 0.21-0.36, 0.24-0.36, 0.24-0.33, and 0.27-0.33; and C4 is: between −0.16 and −0.09, between −0.15 and −0.09, between −0.15 and −0.10, between −0.14 and −0.10, and between −0.14 and −0.11.
In further embodiments, the blood sample is dried blood spot, the target blood component is Alaine Aminotransferase, gene1 is EIF1AY, gene2 is SRXN1, gene3 is NDUFAF2, and gene4 is TBCE, C is: 13.3-24.7, 13.3-22.8, 15.2-22.8, 15.2-20.9, and 17.1-20.9; C1 is: 2.6-4.8, 2.6-4.5, 3.0-4.5, 3.0-4.1, and 3.3-4.1; C2 is: 2.1-3.9, 2.1-3.6, 2.4-3.6, 2.4-3.3, and 2.7-3.3; C3 is: 3.0-5.6, 3.0-5.2, 3.4-5.2, 3.4-4.7, and 3.9-4.7; and C4 is: between −7.2 and −3.9, between −6.6 and −3.9, between −6.6 and −4.4, between −6.1 and −4.4, and between −6.1 and −5.0.
In some embodiments, the blood sample is dried blood spot, the target blood component is Eosinophils, gene1 is SCARNA22, and gene2 is TET3, C is: 0.60-1.11, 0.60-1.02, 0.68-1.02, 0.68-0.94, and 0.77-0.94; C1 is: 0.66-1.22, 0.66-1.13, 0.75-1.13, 0.75-1.04, and 0.85-1.04; and C2 is: 0.61-1.13, 0.61-1.04, 0.69-1.04, 0.69-0.95, and 0.78-0.95.
In other embodiments, the blood sample is dried blood spot, the target blood component is Segmented Neutrophils, gene1 is HMGB1P1, gene2 is CSRNP1, and gene3 is CCNJL, C is: 39.1-72.5, 39.1-67.0, 44.6-67.0, 44.6-61.4, and 50.2-61.4; C1 is: 2.0-3.7, 2.0-3.4, 2.3-3.4, 2.3-3.1, and 2.5-3.1; C2 is: 2.0-3.7, 2.0-3.4, 2.3-3.4, 2.3-3.1, and 2.5-3.1; and C3 is: 1.7-3.2, 1.7-2.9, 2.0-2.9, 2.0-2.7, and 2.2-2.7.
In yet other embodiments, the blood sample is high-quality dried blood spot, the target blood component is non-HDL cholesterol, gene1 is BMT2, gene2 is PKD1P5, and gene3 is ARIH1, C is: 133-247, 133-228, 152-228, 152-209, and 171-209; C1 is: between −52 and −28, or any number range in between, e.g., between −48 and −28, between −48 and −32, between −44 and −32, and between −44 and −36; C2 is: 17.4-32.2, 17.4-29.8, 19.8-29.8, 19.8-27.3, and 22.3-27.3; and C3 is: between −47 and −25, between −44 and −25, between −44 and −29, between −40 and −29, and between −40 and −33.
In further embodiments, the blood sample is high-quality dried blood spot, the target blood component is Eosinophils, gene1 is NDUFA5, and gene2 is MCM8, C is: 1.1-2.1, 1.1-2.0, 1.3-2.0, 1.3-1.8, and 1.5-1.8; C1 is: 0.46-0.85, 0.46-0.78, 0.52-0.78, 0.52-0.72, and 0.59-0.72; and C2 is: between −1.2 and −0.6, between −1.1 and −0.6, between −1.1 and −0.7, between −1.0 and −0.7, and between −1.0 and −0.8.
In yet further embodiments, the blood sample is high-quality dried blood spot, the target blood component is Segmented Neutrophils, gene1 is AKAP12, and gene2 is APP, C is: 2.4-4.5, 2.4-4.2, 2.8-4.2, 2.8-3.8, and 3.1-3.8; C1 is: 1.0-1.9, 1.0-1.7, 1.1-1.7, 1.1-1.6, and 1.3-1.6; and C2 is: 1.6-3.0, 1.6-2.8, 1.9-2.8, 1.9-2.6, and 2.1-2.6.
In some embodiments, the blood sample is whole blood, the target blood component is Lymphocytes, gene1 is EVI2B, and gene2 is NFAM1, C is: 39.7-73.7, 39.7-68.1, 45.4-68.1, 45.4-62.4, and 51.1-62.4; C1 is: between −20.6 and −11.1, between −19.1 and −11.1, between −19.1 and −12.7, between −17.5 and −12.7, and between −17.5 and −14.3; and C2 is: between −16.1 and −8.7, between −14.8 and −8.7, between −14.8 and −9.9, between −13.6 and −9.9, and between −13.6 and −11.1.
In other embodiments, the blood sample is whole blood, the target blood component is Monocytes, gene1 is RIN2, and gene2 is ADA2, C is: between −0.21 and −0.11, between −0.19 and −0.11, between −0.19 and −0.13, between −0.17 and −0.13, and between −0.17 and −0.14; C1 is: 2.8-5.1, 2.8-4.7, 3.1-4.7, 3.1-4.3, and 3.5-4.3; and C2 is: 2.5-4.6, 2.5-4.3, 2.8-4.3, 2.8-3.9, and 3.2-3.9.
In yet other embodiments, the blood sample is whole blood, the target blood component is Segmented Neutrophils, gene1 is RNF24, gene2 is MNDA, and gene3 is TLR1, C is: 25.0-46.4, 25.0-42.8, 28.6-42.8, 28.6-39.3, and 32.1-39.3; C1 is: 6.2-11.5, 6.2-10.6, 7.1-10.6, 7.1-9.7, and 8.0-9.7; C2 is: 6.8-12.7, 6.8-11.7, 7.8-11.7, 7.8-10.7, and 8.8-10.7; and C3 is: 5.2-9.7, 5.2-9.0, 6.0-9.0, 6.0-8.2, and 6.7-8.2.
Herein the inventors also disclose a blood test. Typically, the blood test comprises a positive control plasmid, a first reagent, and a second reagent. The positive control plasmid comprising an exon of a predictive gene selected from Tables 1-9, wherein an mRNA level of the predictive gene in the blood sample relates to a blood test result of a target blood component. The first reagent detects the mRNA level of the predictive gene, comprises at least a primer or a probe hybridizing to the exon of the predictive gene. The second reagent detects an mRNA level of a housekeeping gene, for example, a primer or a probe hybridizing to the exon of the housekeeping gene.
Non-limiting examples of the housekeeping genes include glyceraldehyde-3-phosphate dehydrogenase (GAPDH), ACTB actin, beta2-microglobulin (B2M), Porphobilinogen deaminase (HMBS), or Peptidylprolyl Isomerase B (PPIB), etc.
Non-limiting examples of the target blood component include Segmented Neutrophils, Eosinophils, Prostate-Specific Antigen (PSA_total), Red Blood Cell count (RBC_m.mm3), Monocytes, Creatinine, Lymphocytes, Absolute Eosinophil, Anion Gap (AG), red cell distribution width (RDW_sd), Thyroid Index (T7), Alanine Aminotransferase, or non-HDL cholesterol, etc.
In some embodiments, the target blood component is Segmented Neutrophils, and the predictive gene is: MNDA, STX3, TNFRSF1A, MSL1, TLR1, RNF24, WIPF1, RXFP1, POLR3GL, FOXK2, LAMB, HMGB1P1, CSRNP1, CCNJL, AKAP12, or APP. In other embodiments, the target blood component is Eosinophils, and the predictive gene is: SLC29A1, SIGLEC8, IL5RA, TMIGD3, SMPD3, SCARNA22, SNORA36C, SNORA11, RN7SL4P, SNHG15, TET3, NDUFA5, or MCM8. In yet other embodiments, the target blood component is PSA_total, and the predictive gene is: CTC-265F19.1, ADAM9, RABllFIP5, SNAPC4, LMNA, HNRNPA3P3, GTF3A, RP11-342M1.6, HNRNPLP2, or RPSllP5. In further embodiments, the target blood component is Red Blood Cell count (RBC_m.mm3), and the predictive gene is: UTY, DDX3Y, ZFY, TXLNGY, or RPS4Y1. In yet further embodiments, the target blood component is Lymphocytes, and the predictive gene is: GRB2, MNDA, NFAM1, or EVI2B.
In some aspects, the target blood component is Monocytes, and the predictive gene is: NAGA, RIN2, ADA2, PLXNB2, or ANXA2. In other aspects, the target blood component is Absolute Eosinophil, and the predictive gene is: CLC, ADAT1, SNRPEP4, or GPC6. In yet other aspects, target blood component is Anion Gap (AG), and the predictive gene is: DHX40, SLC1A4, IMPA2, KATNA1, or MEIS3P1. In further aspects, the target blood component is red blood cell distribution width (RDW_sd), and the predictive gene is: CHCHD2P6, SEC63P1, DNAL1, or ENSG00000197262. In yet further aspects, the target blood component is Thyroid Index (T7.Index), and the predictive gene is: IGHV3-33, ZNF266, CCDC183-AS1, or ENSG00000232745.
In some embodiments, the target blood component is Alaine Aminotransferase, and the predictive gene is: EIF1AY, SRXN1, NDUFAF2, or TBCE. In other embodiments, the target blood component is non-HDL cholesterol, and the predictive gene is: BMT2, PKD1P5, or ARIH1.
Additional objectives, advantages and novel features will be set forth in the description which follows or will become apparent to those skilled in the art upon examination of the drawings and detailed description which follows.
The headings used in the figures should not be interpreted to limit the scope of the claims.
The disclosure is directed to methods of using biomarker proxies (predictive gene(s)) in predicting the results of standard blood tests based on hematology or chemistry, for example, the results from a complete blood count panel, a comprehensive metabolic panel, a chemistry panel, or an endocrine panel (such as levels of thyroxine, T3, and TSH). Instead of collecting multiple tubes of blood for conducting a variety of tests, a simple blood sample collection, for example of whole blood, plasma, or a dried spot, will enable a determination that correlates to the results of a standard blood test. Accordingly, some embodiments are directed to blood tests for measuring the RNA expression of the biomarker proxies, while other embodiments are directed to methods for determining a blood test result based on the RNA expression of the biomarkers.
In the following description, and for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the various aspects of the invention. It will be understood, however, by those skilled in the relevant arts, that the present invention may be practiced without these specific details. It should be noted that there are many different and alternative configurations, devices and technologies to which the disclosed inventions may be applied. The full scope of the disclosure is not limited to the examples that are described below.
The singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, a reference to “a step” includes reference to one or more of such steps. Unless specifically noted, it is intended that the words and phrases in the specification and the claims be given their plain, ordinary, and accustomed meaning to those of ordinary skill in the applicable arts.
As used herein, the term “subject” refers to any mammal, for example, mice, rats, primates, or humans.
The present disclosure is directed to the discovery of a predictive gene (biomarkers), the expression of which relates to a result of a standard blood test, for example, results for a complete blood count with differential and platelet, a basic chemistry panel, a lipid panel, thyroid tests (such as the levels of thyroxine, T3, and thyroid-stimulating hormone (TSH)), or a prostate-specific antigen (PSA) test.
The inventors disclose a method of performing a blood test. The method typically comprises the steps of: extracting an RNA (total RNA or mRNA) from a blood sample; quantifying a mRNA level of the predictive gene in the blood sample from the extracted RNA; and converting the mRNA level of the predictive gene in the blood sample into a blood test result. In some aspects, the method further comprising selecting a predictive gene or a set of predictive genes, for example, from Tables 1-9. In some implementations, the mRNA level of the predictive gene relates to a target blood component.
As used herein, the term “blood test” or “standard blood tests” refers to tests conducted that directly measure chemical or hematological components found in blood. The chemical components include T3, T3 uptake, Thyroxine (T4), T7 Index, TSH, PSA, cholesterol (HDL, non-HDL, LDL, and VLDL), cholesterol/HDL ratio, triglyceride, glucose, blood urea nitrogen (BUN), creatinine, BUN/creatine ratio, uric acid, sodium, potassium, chloride, CO2, anion gap, osmolality, total protein, albumin, globulin, albumin/globulin ratio, calcium, phosphorus (inorganic), alkaline phosphatase, gamma-glutamyl transferase (GGT), alanine aminotransferase, aspartate aminotransferase, lactic dehydrogenase, and bilirubin. The hematological components include white blood cell (WBC), red blood cell (RBC), hemoglobin, hematocrit, mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), platelet count, mean platelet volume, segmented neutrophils, lymphocytes, monocytes, eosinophils, basophils, absolute neutrophil, absolute lymphocyte, absolute monocyte, absolute eosinophil, absolute basophil, immature granulocyte, and absolute granulocyte. Table 10 lists some of the standard blood tests and how they may belong in blood test panels.
The term “blood test result,” as used herein, refers to the results from conducting the blood test or standard blood test. The third and fourth columns in Table 10 list the specific blood test and the units of the results of the specific blood test.
In some aspects, the blood test is reported as: an amount of the target blood component; a concentration of the target blood component; a volume of the target blood component; a distribution of the target blood component; a ratio of the target blood component to a second blood component; or combinations thereof. In other aspects, the blood test is reported as a volume ratio of red blood cells to total blood (hematocrit level). In other aspects, the blood test is reported as a volume ratio of mean corpuscular hemoglobin (MCH) to mean corpuscle (cell) (MCV) (mean corpuscular hemoglobin concentration (MCHC)).
Non-limiting examples of the blood tests or target blood components include: Absolute Basophils, Absolute Eosinophil, Absolute Lymphocyte, Absolute Monocyte, Absolute Neutrophil, Alanine Aminotransferase, Albumin, Alkaline Phosphatase, Anion Gap, Aspartate Aminotransferase, Total Bilirubin, Blood Urea Nitrogen (BUN), Calcium, Chloride, Cholesterol, CO2, Creatinine, Eosinophils, Gamma-Glutamyl Transferase (GGT), Globulin, Glucose, HDL Cholesterol, Hemoglobin, Immature Granulocyte, Lactic Dehydrogenase, LDL Cholesterol, Lymphocytes, mean corpuscular hemoglobin (MCH), mean corpuscle (cell) volume (MCV), Monocytes, mean platelet volume (MPV), Non-HDL Cholesterol, Osmolality, Inorganic Phosphorus, Platelet Count, Potassium, Total Protein, Red Blood Cell (RBC), red cell distribution width (RDW), Segmented Neutrophils, Sodium, Total T3, T3 Uptake, T7 Index, Thyroxine (T4), Triglycerides, Thyroid Stimulating Hormone (TSH), Uric Acid, VLDL Cholesterol, and White Blood Cell (WBC).
In preferred embodiments, the blood sample is whole blood, plasma, dried blood spot, or combinations thereof. Non-limiting examples of target blood component include: Segmented Neutrophils, Eosinophils, Prostate-Specific Antigen, red blood cells, monocytes, creatinine, lymphocytes, eosinophil, alanine aminotransferase, electrolytes, or non-HDL cholesterol, etc. Non-limiting examples of blood test include: red blood Cell count (RBC_m.mm3), Absolute Eosinophil, red cell distribution width (RDW_sd), Thyroid Index (T7), or Anion Gap (AG), etc.
In one aspect, the present disclosure is directed to a method of determining a blood test result, e.g., an amount of a target blood component, a concentration of a target blood component, a volume of a target blood component, a distribution of a target blood component, and a ratio between a target blood component and a second target blood component.
The present disclosure is also directed to methods of quantifying a target blood component in a blood sample. Typically, the methods comprising the steps of: extracting an RNA from a blood sample; selecting a predictive gene from Tables 1-9; measuring an mRNA level of the predictive gene (from the extracted RNA of the blood sample) in the blood sample; and converting the mRNA level of the predictive gene in the blood sample into an amount or ratio of the target blood component in the blood sample. In some embodiments, the target blood component is a chemical component, while in other embodiments, the target blood component is a hematological component.
As used herein, the term “blood sample” refers to a sample collected using blood, for example, a whole blood sample, a plasma sample, or a dried blood spot (DBS). The methodologies of the present invention can be used in conjunction with a small quantity of a blood sample. In some implementations, the volume of the blood sample is less than 1 ml (cubic centimeter, cc). In preferred implementations, the volume of the blood sample is less than 0.1 ml (cc), e.g., about 30 μl.
Not all dried blood spots are quality samples for providing predictive RNA expression levels (see
In some aspects, the quality of the dried blood spot is determined by assessing the quality of the extracted RNA, for example, by capillary electrophoresis (e.g., using an Agilent Bioanalyzer). In some aspects, the RNA quality is quantified as a RIN, wherein the RIN is calculated by an algorithmic assessment of the number of various RNAs presented within the extracted RNA. High-quality cellular RNA generally exhibits an RNA value approaching 10. In yet further aspects, the predictive gene is selected based on the quality of the blood sample. For example, if a dried blood sample is determined to be of high-quality, the predictive gene can be selected from Table 8.
The term “extraction” as used herein refers to any method for separating or isolating the nucleic acids from a sample, more particularly from a biological sample, such as a blood sample. Nucleic acids such as RNA or DNA may be released, for example, by cell lysis. Moreover, in some aspects, extraction may encompass the separation or isolation of coding RNA (mRNA).
Some embodiments of the invention include the extraction of one or more forms of nucleic acids from one or more samples. In some aspects, the extraction of the nucleic acids can be provided using one or more techniques known in the art. For example, in some aspects, the extraction steps can be accomplished using the QIAAMP® RNA Blood Kit from QIAGEN® (e.g., for the isolation of total RNA) or EXORNEASY® Serum/Plasma Kit from QIAGEN® (e.g., for the isolation of intracellular and/or extracellular RNA). In other embodiments, methodologies of the invention can use any other conventional methodology and/or product intended for the isolation of intracellular and/or extracellular nucleic acids (e.g., RNA).
The term “nucleic acid” or “polynucleotide” as referred to herein comprises all forms of RNA (mRNA, miRNA, rRNA, tRNA, piRNA, ncRNA), DNA (genomic DNA or mtDNA), as well as recombinant RNA and DNA molecules or analogs of DNA or RNA generated using nucleotide analogues. The nucleic acids may be single-stranded or double-stranded. The nucleic acids may include the coding or non-coding strands. The term also comprises fragments of nucleic acids, such as naturally occurring RNA or DNA which may be recovered using one or more extraction methods disclosed herein. “Fragment” refers to a portion of nucleic acid (e.g., RNA or DNA).
The term “library,” as used herein refers to a library of genome/transcriptome-derived sequences. The library may also have sequences allowing amplification of the “library” by the polymerase chain reaction or other in vitro amplification methods well known to those skilled in the art. In various embodiments, the library may have sequences that are compatible with next-generation high throughput sequencing platforms. In some embodiments, as a part of the sample preparation process. “barcodes” may be associated with each sample. In this process, short oligonucleotides are added to primers, where each different sample uses a different oligo in addition to a primer.
In certain embodiments, primers and barcodes are ligated to each sample as part of the library generation process. Thus during the amplification process associated with generating the ion amplicon library, the primer and the short oligo are also amplified. As the association of the barcode is done as part of the library preparation process, it is possible to use more than one library, and thus more than one sample. Synthetic nucleic acid barcodes may be included as part of the primer, where a different synthetic nucleic acid barcode may be used for each library. In some embodiments, different libraries may be mixed as they are introduced to a flow cell, and the identity of each sample may be determined as part of the sequencing process.
The term “expression” or “expression level” is used broadly to include a genomic expression profile, e.g., an expression profile of nucleic acids. Profiles may be generated by any convenient means for determining a level of a nucleic acid sequence, e.g., quantitative hybridization of nucleic acid, labeled nucleic acid, amplified nucleic acid, cDNA, etc., quantitative PCR, ELISA for quantitation, sequencing (e.g., RNA sequencing) and the like. According to some embodiments, the term “expression level” means measuring the abundance of the nucleic acid in the measured samples.
Expression level or other determinable traits regarding nucleic acids may function as one or more markers or biomarkers. As described herein, the expression level of the one or more biomarkers may be correlated with a blood test result and may be indicative of or predictive of a presence or stage of a disease, condition, or medical state. As such, embodiments of the invention can be employed in medically related analyses to diagnose, assess, provide prognostic information, and make therapeutic decisions regarding any biologically related state.
The expression of these RNA markers from a blood sample determine blood test results with an accuracy of at least 80% when comparing the predicted blood test result based on the RNA markers to the actual blood test result. In particular, these RNA markers determine results in a complete blood count, a comprehensive metabolic panel, and a chemistry panel, and the levels of thyroxine, T3, and TSH an accuracy of at least 80%. In some aspects, accuracy is determined based on regression analysis from the R2-value.
The mRNA level is determined, for example, using RNA sequencing, quantitative PCR (e.g., real-time RT-PCR), or hybridization (e.g., DNA microarray), etc. In preferred embodiments, the mRNA level is determined using next-generation sequencing. The methods of determining the expression of RNA from a dried blood spot is explained in PCT Application No. PCT/US2016/038243, the contents of which are incorporated herein.
In some implementations, the methods further comprise standardizing the level of RNA expression of the predictive gene.
In other implementations, the methods further comprise normalizing the mRNA level of the predictive gene. In some embodiments, the mRNA level of the predictive gene is normalized according to a method of differential analysis. In some aspects, the count data from next-generation sequencing is normalized using an algorithm. Any normalization algorithm normalization that normalizes library size may be used to normalize the mRNA level of the predictive gene. Non-limiting examples include a DESeq2 algorithm, or edgeR algorithm, etc. In some aspects, the mRNA level of the predictive gene is expressed as a normalized gene count. In these aspects, the normalized gene count is used to report the blood test result (e.g., an amount of the target component in the blood sample).
In some embodiments, the methods encompass converting a mRNA level of a single predictive gene in a blood sample into a blood test result using the formula: blood test result=C+C1*(gene). C and C1 are constants, and (gene) represents the mRNA level of the predictive gene. In some aspects, (gene) represents normalized gene count. In other aspects, a normalized gene count of a single predictive gene in a blood sample is converted into a blood test result according to a formula set forth in Tables 1-3. In some embodiments, the range of C and C1 are ±30% of the disclosed value. For example, for formula 0.153698762623272+2.5434273948207*SMPD3, C is between 0.11 and 0.20, and C1 is between 1.8 and 3.3. In preferred embodiments, the range of C and C1 are ±20% of the disclosed value. For the same formula, C is between 0.12 and 0.18, and C1 is between 2.0 and 3.1. In the most preferred embodiments, the range of C and C1 are ±10% of the disclosed value. For the same formula, C is between 0.14 and 0.17, and C1 is between 2.3 and 2.8.
In other embodiments, the methods encompass converting a mRNA level of each of a set of predictive genes in a blood sample into a blood test result using the formula: blood test result=C+C1*(gene1)+C2*(gene2)+ . . . +Cn*(genen), n is 1, 2, 3, 4, or 5, C, C1, C2, . . . and Cn are constants, and (gene1), (gene2), . . . , and (genen) represent the mRNA level of gene1, gene2, . . . , and genen. In some aspects, (gene1), (gene2), . . . , and (genen) represents the normalized gene count for each predictive gene within the set. C and C, may be positive or negative. In certain non-limiting aspects, the blood sample is a dried blood spot, and n is 1, 2, or 3. In some aspects, a set of normalized gene counts of a set of predictive genes in a blood sample is converted into a blood test result according to a formula set forth in Tables 4-9. In some aspects, C, C1, . . . C, is ±30% of the disclosed value. In other aspects, C, C1, . . . Cn is ±20% of the disclosed value. In further aspects, C, C1, . . . Cn is ±10% of the disclosed value.
In some implementations, a range in the mRNA level of the predictive gene corresponds to the normal range in the results of a blood test. Accordingly, detecting the mRNA level of genes listed in Tables 1-9 below replaces the need for conducting standard blood tests. Whereas conventional blood tests usually require a visit to a laboratory to get blood drawn as each blood test may have particular requirements for the blood collection process, the methods of the invention simplify the process of monitoring of a subject's state of health. One such benefit is that a single sample collection where a relatively small amount of blood is collected replaces the need to collect multiple tubes of blood by a visit to a laboratory. In the examples, a total of 1 cc of blood was collected for the whole blood sample and the generation of the plasma sample, whereas the typical collection volume for blood tests is 8 cc per tube of blood. In some implementations, less than 1 cc blood needs to be collected. In the case of the dried blood sample, a blood smear or the amount of blood released from a typical finger prick (for example, for blood sugar monitoring) is sufficient. Dried blood spot samples may also be easily kept in storage in case other blood tests analysis needs to be conducted on the sample, for example, if additional analysis is needed weeks, months, or years after collection of the dried blood sample. Another exemplary benefit of the invention is that one can track health status without the need to visit a laboratory or blood collection site. Instead, the subject may collect his or her own sample and send the sample for analysis in a laboratory. This is particularly convenient for subjects who cannot make the required visits to a laboratory, for example, ailing house-bound subjects or those residing far from a laboratory. Often, the former group of subjects has the most need for careful monitoring of their health status.
Tables 1-3 list the blood test results and the single most predictive genes based on the gene's mRNA level in whole blood, dried blood spot, and plasma samples respectively. In some aspects, the mRNA level of one or more of the genes listed in Table 1 in a subject's whole blood sample is used to determine the amount of eosinophils, absolute eosinophils, segmented neutrophils, lymphocytes, monocytes, or prostate-specific antigen (PSA) in the subject. In other aspects, the mRNA level of one or more of the genes listed in Table 2 in a subject's dried blood spot sample is used to determine the amount of eosinophils, absolute eosinophils, or PSA in the subject. In yet other aspects, the mRNA level of one or more of the genes listed in Table 3 in a subject's plasma sample is used to determine the amount of creatinine, PSA, red blood cell (RBC), or the mean corpuscular hemoglobin concentration (MCHC) in the subject.
1Transmembrane proteins
2The myeloid cell nuclear differentiation antigen (MNDA) is detected only in nuclei of cells of the granulocyte-monocyte lineage. MNDA was correlated with the amount of both lymphocytes and neutrophils. However, for lymphocytes, the correlation is negative.
Tables 4-8 list the blood test results with the most predictive set of genes of based on the genes' mRNA level in whole blood samples, plasma samples, the combination of results from whole blood and plasma samples, all dried blood spot samples, and dried blood spot samples with RNA expression of a high number of genes detected (high-quality dried blood spot samples), respectively. Accordingly, some implementations of the disclosure are directed to kits comprising reagents to measuring the RNA expression of the specific sets of genes listings in Tables 1-8 in whole blood samples, plasma samples, the combination of results from whole blood and plasma samples, any dried blood spot samples, or high-quality dried blood spot samples. Other implementations of the disclosure are directed to methods of using the mRNA level of genes in the specific combinations listed in Tables 4-9 to predict corresponding blood test results. The formulas shown in Tables 1-9 transform the mRNA level into the typically presented blood test results.
In some implementations, the method comprises determining the subject's blood test result is in the normal range based on the RNA expression count of a gene, which may be determined from the conversion formula. Accordingly, the methods comprise quantifying the RNA expression of a set of genes, for example, the set of genes described listed Tables 1-8 for each combination of blood test and sample type, in the whole blood, plasma, or dried blood spot sample from a subject; and determining the subject has normal results for the corresponding blood test based on the RNA expression count of the set of genes.
For example, the subject is determined to have a normal percentage of segmented neutrophils if the subject's whole blood has gene counts of between 508 and 574 for RNF24, between 21829 and 22878 for MNDA, and between 9031 and 10757 for WIPF1. In another example, the subject is determined to have a normal percentage of lymphocytes if the subject's whole blood has gene counts of between 4345 and 4583 for GRB2, between 17569 and 19699 for MNDA, and between 3862 and 4492 for NFAM1. In still another example, the subject is determined to have a normal percentage of monocytes if the subject's whole blood has gene counts of between 1311 and 1642 for NAGA, between 629 and 828 for RIN2, between 2773 and 3436 for ADA2, between 3220 and 4087 for PLXNB2, and between 3907 and 5210 for ANXA2. Also from the whole blood sample, a subject may be determined to have a normal level of cholesterol if the subject's whole blood has gene counts of between 13 and 20 for RP5-1139B12.2, between 466 and 794 for GOLGA8A, between 83 and 99 for ENSG00000233280, and between 1186 and 1445 for SMC5. A subject may also be determined to have normal concentration of Aspartate Aminotransferase if the gene count in the whole blood sample for NEFM is between 9 and 52, for THUMPD1 is between 438 and 584, for LDLR is between 570 and 630, for CRTAM is between 66 and 97, and for CHCHD1 is between 35 and 37. Accordingly, if the gene counts for the set of the genes are not within the aforementioned range, the subject may be determined to have abnormal percentage of segmented neutrophils, lymphocytes, or monocytes, abnormal level of cholesterol, or abnormal concentration of Aspartate Aminotransferase.
It should be understood that while particular embodiments have been illustrated and described, various modifications can be made thereto without departing from the spirit and scope of the invention as will be apparent to those skilled in the art. Such changes and modifications are within the scope and teachings of this invention as defined in the claims appended hereto.
Whole blood, plasma, and dried blood spot (DBS) samples were collected from 50 non-fasting individuals. Two sets of blood samples were collected on the same day. The set to be sent for analysis by Sonora Quest Laboratories contained collections of whole blood and plasma according to standard procedure. The set for analysis of RNA expression contained collections of whole blood, plasma, and DBS. Instead of collecting 8 ccs of blood, the total amount of blood for the second section was 1 cc. Blood was collected in blood collection tubes with K2EDTA. Plasma samples were produced by centrifuging the whole blood collected in K2EDTA tubes according to standard procedure.
Dried blood spot samples may be obtained using a finger-puncture technique in which a single drop of blood from the subject's finger was applied to a sample collection apparatus (i.e., RNA collection paper from FORTIUSBIO®). The blood spot is allowed to dry on the FORTIUSBIO® sample collection apparatus. A portion of the sample that has dried on the sample collection apparatus is then removed for nucleic acid extraction.
RNA, including mRNA, may be extracted using commercially available kits. RNA was extracted from whole blood, plasma, and dried blood spot samples using exoRNeasy (QIAGEN®, Germantown, Md.) according to the manufacturer's instructions. The extracted RNA or mRNA was sequenced using the ILLUMINA® system (San Diego, Calif.) to determine the RNA or mRNA expression level of each predictive gene. In various embodiments, mRNA may be sequenced using next-generation sequencing (NGS) to obtain raw sequencing data.
After the mRNA from the blood sample is sequenced, some embodiments provide methods of analyzing the data. For example, the analyzing steps of the methodology include steps such as processing the raw sequencing data/reads to remove information related to barcodes and adapters using technologies provided by Cutadapt and AlienTrimmer. Thereafter, the sequences can be aligned to a reference sequence using technologies such as STAR or Tophat. After alignment, the data can be quantitated to generate numerical estimates of each gene's expression or “counts” provided by technologies like FeatureCounts or htseq-count. For example, a number of copies or reads of a predictive gene in the sequencing data can be quantified or counted to determine a gene count. A gene count represents a relative expression level of the predictive gene in the blood sample and is independent of the volume of the blood sample. The gene count is a value that can then be used as an input into one or more bioinformatic analysis steps used to correlate the gene count to an output value of a blood test result.
Gene counts were obtained and normalized within each sample type for sequencing depth and then standardized for performing linear regression.
The normalization of gene counts reduces the impact of different sequencing length on the gene count. For example, when the total gene count of sample A is 1 million counts, and the total gene count for sample B is 1.3 million counts, the difference may mainly be attributed to technical variation and not a true biological difference. Accordingly, normalization is applied to the total gene counts of these samples so that the sequencing results of sample A can be compared to the sequencing results of sample B. A variety of algorithms for normalizing library size exist in the prior art, for example, DESeq2, and they may all be used for normalization the gene count in the methods of the invention.
The standardization of the gene count is a mathematical correction applied to ensure the variables of comparison are on the same scale. This step helps stabilize the results of any kind of machine learning. While gene counts do not need to be standardized, the step increases the accuracy of the blood test result determination. Any method of standardizing variables may be used. In one implementation, the gene counts are standardized by dividing each value by the root mean square of all the samples values for the given gene.
The samples were sent to Sonora Quest for the analysis of the specific blood tests listed in Table 10.
Simple linear regression was performed after removing any outliers by regressing each gene individually on each blood test. The single genes whose expression levels are most highly correlated values of standard blood chemistry tests (as measured by R2 values) were noted in Tables 1-3.
Multiple linear regression was performed by considering up to 5 genes that could be used on the regression model. Outliers were imputed with the mean, and two rounds of feature selection were performed to identify genes of interest for each blood test. The first round selected the 5 highest (3 for DBS) scoring genes based on a univariate F-test followed by a second round where genes were potentially removed based on the Akaike information criterion. This process was performed for each sample type separately and in combination by considering whole blood with plasma as well as all three sample types together. A combination of sample types was created by allowing genes from any of the included sample types to be selected during the first round of feature selection.
This application claims the benefit of U.S. Provisional Patent Application No. 62/586,301, filed on Nov. 15, 2017, the contents of which are incorporated herein by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2018/061394 | 11/15/2018 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62586301 | Nov 2017 | US |