This application claims the benefit of priority Israeli Patent Application No. 264581 filed Jan. 31, 2019, the contents of which are incorporated herein by reference in their entirety.
The ASCII file, entitled 80593 Sequence Listing.txt, created on 28 Jan. 2020, comprising 82,571,264 bytes, submitted concurrently with the filing of this application is incorporated herein by reference.
The present invention, in some embodiments thereof, relates to a non-invasive method of quantifying blood metabolites.
Blood serves as a liquid conveyor for molecules inside the body by delivering necessary substances to the cells and transporting metabolic waste products. Of particular importance are the thousands of circulating small molecules termed the serum metabolome, which are either naturally produced by the body or taken up from the environment. While the connection of most of these metabolites to human health is yet to be elucidated, some are known to be predictive diagnostic biomarkers or even causal agents in the development of disease. For example, high blood cholesterol leads to buildup of plaque in the blood vessels, termed atherosclerosis, which in turn increases the risk for a major cardiovascular event such as heart attack, stroke, and peripheral artery disease. As a result, blood cholesterol level serves as both a diagnostic biomarker and a therapeutic target for drugs such as statins. As another example, type II diabetes which impacts around 10% of the population, is diagnosed in part by measurements of blood glucose levels, with a recent study suggesting that a new set of metabolites significantly improves diagnosis. These are only examples for the wealth of potential biomarkers and therapeutic targets that could be found in the blood, making blood an attractive source in which to search for novel biomarkers for early detection and treatment of disease.
Mass spectrometry can accurately identify thousands of metabolites from different biofluids. While some of its identified compounds are well studied and characterized, the determinants of most serum metabolites are still unknown. Studies focusing on human genetics estimated a median heritability of 6.9% for serum metabolites, thereby leaving much of the variation in metabolite levels unaccounted for and suggesting major contributions from environmental factors. Other studies have suggested that the gut microbiome is actively involved in the metabolism of many metabolites which are detectable in human serum, including a diverse set of biochemicals such branched-chain and aromatic amino acids. A notable example is the metabolite trimethylamine N-oxide (TMAO), which is derived from gut microbial metabolism of choline and carnitine, and was reported to act as a marker for cardiovascular disease in humans, with further evidence indicating proatherogenicity and prothromboticity in mouse models. The effect of nutrition on serum metabolites was long established as dietary patterns such as the intake of red meat, whole-grain bread, tea and coffee were linked to changes in a wide range of compounds. Smoking was suggested as impacting serum metabolites, with some of these smoking-related changes in human serum metabolites being reversible after smoking cessation. However, no study to date incorporated all of the above potential determinants within a single human cohort and quantified their relative contribution in explaining serum metabolites.
According to an aspect of some embodiments of the present invention there is provided a method of predicting the quantity of a metabolite in the blood of a subject. The method comprises: accessing a computer readable medium storing a library of trained machine learning procedures, each being associated with a different metabolite; searching the library for a trained machine learning procedure associated with the metabolite; feeding the selected procedure with amount of a plurality of microbes of a microbiome of the subject; and receiving from the selected procedure an output indicative of the quantity of the metabolite in the blood.
According to some embodiments of the invention the method comprises measuring the amount of microbes of the microbiome of the subject prior to the analyzing.
According to some embodiments of the invention the microbiome is a fecal microbiome.
According to some embodiments of the invention the plurality of microbes comprises more than 20 microbes.
According to some embodiments of the invention the metabolite is set forth in Table 2.
According to some embodiments of the invention the metabolite is other than glucose and other than cholesterol.
According to some embodiments of the invention at least some of the trained machine learning procedures in the library comprises a set of decision trees.
According to some embodiments of the invention the selected machine learning procedure comprises a set of decision trees, each decision tree comprises a plurality of nodes associated with a respective plurality of decision rules, each decision rule relating to at least one microbe of the microbiome, and wherein a number of decision rules relating to microbes listed in Table 1 is larger than a number of decision rules relating to other microbes of the microbiome.
According to an aspect of some embodiments of the present invention there is provided a method of predicting the quantity of a metabolite set forth in Table 1. The method comprises: accessing a computer readable medium storing a trained machine learning procedure associated with the metabolite; feeding the trained procedure with an amount of N of the corresponding microbes set forth in Table 1, the N being at most 50; and receiving from the procedure an output indicative of the quantity of the metabolite in the blood, thereby predicting the quantity of the metabolite in the blood.
According to some embodiments of the invention the method comprises measuring the amount of microbes of the fecal microbiome of the subject prior to the analyzing.
According to some embodiments of the invention the metabolite is other than glucose and other than cholesterol.
According to an aspect of some embodiments of the present invention there is provided a method of predicting the quantity of a metabolite in the blood of a subject that consumes a diet of a plurality of food types. The method comprises: accessing a computer readable medium storing a library of trained machine learning procedures, each being associated with a different metabolite; searching the library for a trained machine learning procedure associated with the metabolite; feeding the selected procedure with a frequency of consumption of at least 5 of the food types over at least one month and/or a daily mean consumption of at least 5 of the food types; and receiving from the selected procedure an output indicative of the quantity of the metabolite in the blood.
According to some embodiments of the invention the metabolite is other than glucose and other than cholesterol.
According to some embodiments of the invention at least some of the trained machine learning procedures in the library comprises a set of decision trees.
According to some embodiments of the invention each set of decision trees comprises at least 1000 decision trees.
According to some embodiments of the invention the selected machine learning procedure comprises a set of decision trees, each decision tree comprises a plurality of nodes associated with a respective plurality of decision rules, each decision rule relating to at least one food type, and wherein a number of decision rules relating to food types listed in Table 3 is larger than a number of decision rules relating to other food types.
According to an aspect of some embodiments of the present invention there is provided a method of predicting the quantity of a metabolite set forth in Table 3. The method comprises: accessing a computer readable medium storing a trained machine learning procedure associated with the metabolite; feeding the selected procedure with a daily mean consumption and/or frequency of consumption over at least one month of N of the corresponding food types set forth in Table 3 of the subject; and receiving from the selected procedure an output indicative of the quantity of the metabolite in the blood, thereby predicting the quantity of the metabolite in the blood.
According to some embodiments of the invention the N is at most 50.
According to some embodiments of the invention the metabolite is other than glucose and other than cholesterol.
According to some embodiments of the invention the method comprises corroborating the quantity of the metabolite by measuring the amount of the metabolite in a blood sample of the subject.
According to an aspect of some embodiments of the present invention there is provided a method of diagnosing a disease of a subject. The method comprises predicting the quantity of at least one metabolite which is indicative of the disease, wherein the predicting is carried out according to any one of claims 1-21, thereby diagnosing the disease.
According to some embodiments of the invention the disease is selected from the group consisting of a metabolic disease, a cardiovascular disease and kidney disease.
According to an aspect of some embodiments of the present invention there is provided a method of altering the quantity of a metabolite in the blood of the subject. The method comprises: predicting the quantity of the metabolite; and administering to the subject at least one agent which specifically increases or decreases at least one microbe, wherein the agent is selected based on the quantity of the metabolite; wherein the predicting the quantity of the metabolite comprises: accessing a computer readable medium storing a library of trained machine learning procedures, each being associated with a different metabolite; searching the library for a trained machine learning procedure associated with the metabolite; feeding the selected procedure with an amount of a plurality of microbes; and receiving from the selected procedure an output indicative of the quantity of the metabolite in the blood.
According to an aspect of some embodiments of the present invention there is provided a method of altering the amount of a metabolite in the blood of the subject. The method comprises: accessing a computer readable medium storing a library of trained machine learning procedures, each being associated with a different metabolite; searching the library for a trained machine learning procedure associated with the metabolite; feeding the selected procedure with a predetermined quantity of the metabolite; receiving from the selected procedure an output indicative of at least one microbe; and administering to the subject at least one agent which specifically increases or decreases the amount of the at least one microbe, thereby altering the amount of the metabolite in the blood of the subject.
According to some embodiments of the invention the agent which increases the microbe is a probiotic.
According to some embodiments of the invention the agent which decreases the microbe is an antibiotic or a phage directed to the microbe.
According to an aspect of some embodiments of the present invention there is provided a method of providing dietary advice to a subject. The method comprises predicting the quantity of a metabolite in the blood by carrying out the method according to claim 14-22, wherein when the metabolite is above or below the recommended quantity of the metabolite, recommending consumption of at least one food type that alters the quantity of the metabolite.
According to some embodiments of the invention the metabolite is set forth in Table 4.
According to some embodiments of the invention the food type is the corresponding food type set forth in Table 4.
According to an aspect of some embodiments of the present invention there is provided a method of altering the amount of a metabolite set forth in Table 3 in the blood of the subject. The method comprises: accessing a computer readable medium storing a library of trained machine learning procedures, each being associated with a different metabolite; searching the library for a trained machine learning procedure associated with the metabolite; feeding the selected procedure with a predetermined quantity of the metabolite; receiving from the selected procedure an output indicative of a list of food types; and providing dietary advice to the subject, based on the output.
According to some embodiments of the invention the method comprises predicting the amount of the metabolite using another trained machine learning procedure.
Unless otherwise defined, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the invention, exemplary methods and/or materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and are not intended to be necessarily limiting.
Some embodiments of the invention are herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of embodiments of the invention. In this regard, the description taken with the drawings makes apparent to those skilled in the art how embodiments of the invention may be practiced.
In the drawings:
The present invention, in some embodiments thereof, relates to a non-invasive method of quantifying blood metabolites.
Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not necessarily limited in its application to the details set forth in the following description or exemplified by the Examples. The invention is capable of other embodiments or of being practiced or carried out in various ways.
The collection of metabolites circulating in the human blood, termed the serum metabolome, contains a plethora of biomarkers and causative agents. Although the origin of specific compounds is known, the understanding of the key determinants of most metabolites is poor.
The present inventors have now measured the levels of 1251 circulating metabolites in 521 serum samples from a healthy cohort, and devised machine learning algorithms to predict their levels in held-out subjects based on a comprehensive profile consisting of gut microbiome, clinical parameters, diet, lifestyle, anthropometric measurements and medication data. Notably, they obtained significant predictions for over 92% of the profiled metabolites, with diet and microbiome each explaining hundreds of metabolites, and with 64% of the variance of some metabolites explained using only gut microbiome data. To corroborate the causality of these predictions, the present inventors showed that some metabolites that were predicted to be positively associated with bread increased in levels following a randomized clinical trial of bread intervention. Overall, the present results unravel the potential determinants of over 1000 metabolites, paving the way towards mechanistic understanding of the alterations in metabolites under different conditions and to designing interventions for manipulating metabolite levels.
Thus, according to a first aspect of the present invention there is provided a method of predicting the quantity of a metabolite in the blood of a subject, the method comprising analyzing the amount of a plurality of microbes of a microbiome of the subject so as to reach a confidence level of at least 95% in the significance of the predictions, thereby predicting the quantity of the metabolite in the blood.
The methods described herein are preferably non-invasive methods. Thus, in one embodiment, the methods described herein are carried out without blood sampling.
As used herein the term “subject” refers to a mammalian subject (e.g. mouse, cow, dog, cat, horse, monkey, human), preferably human.
In one embodiment, the subject is a healthy subject.
As used herein, a “metabolite” is an intermediate or product of metabolism. The term metabolite is generally restricted to small molecules and does not include polymeric compounds such as DNA or proteins greater than 100 amino acids in length. A metabolite may serve as a substrate for an enzyme of a metabolic pathway, an intermediate of such a pathway or the product obtained by the metabolic pathway.
In preferred embodiments, metabolites include but are not limited to sugars, organic acids, amino acids, fatty acids, hormones, vitamins, as well as ionic fragments thereof. In another embodiment, the metabolite is an oligopeptides (less than about 100 amino acids in length). In still another embodiment, the metabolite is not a peptide or a nucleic acid.
In particular, the metabolites are less than about 3000 Daltons in molecular weight, and more particularly from about 50 to about 3000 Daltons.
The metabolite of this aspect of the present invention may be a primary metabolite (i.e. essential to the microbe for growth) or a secondary metabolite (one that does not play a role in growth, development or reproduction, and is formed during the end or near the stationary phase of growth.
Representative examples of metabolic pathways in which the metabolites of the present invention are involved include, without limitation, citric acid cycle, respiratory chain, photosynthesis, photorespiration, glycolysis, gluconeogenesis, hexose monophosphate pathway, oxidative pentose phosphate pathway, production and β-oxidation of fatty acids, urea cycle, amino acid biosynthesis pathways, protein degradation pathways such as proteasomal degradation, amino acid degrading pathways, biosynthesis or degradation of: lipids, polyketides (including, e.g., flavonoids and isoflavonoids), isoprenoids (including, e.g., terpenes, sterols, steroids, carotenoids, xanthophylls), carbohydrates, phenylpropanoids and derivatives, alkaloids, benzenoids, indoles, indole-sulfur compounds, porphyrines, anthocyans, hormones, vitamins, cofactors such as prosthetic groups or electron carriers, lignin, glucosinolates, purines, pyrimidines, nucleosides, nucleotides and related molecules such as tRNAs, microRNAs (miRNA) or mRNAs.
Preferably, the metabolite is set forth in the Human Metabolite Database which is available online at wwwdothmdb.ca/metabolites.
Exemplary metabolites that may be analyzed include, but are not limited to: (N(1)+N(8))-acetylspermidine, “1,2,3-benzenetriol sulfate (1)”, “1,2,3-benzenetriol sulfate (2)”, “1,2-dilinoleoyl-GPC (18:2/18:2)”, “1,2-dilinoleoyl-GPE (18:2/18:2)*”, “1,2-dipalmitoyl-GPC (16:0/16:0)”, “1,3,7-trimethylurate”, “1,3-dimethylurate”, “1,5-anhydroglucitol (1,5-AG)”, “1,7-dimethylurate”, 1-(1-enyl-oleoyl)-GPE (P-18:1)*, 1-(1-enyl-palmitoyl)-2-arachidonoyl-GPC (P-16:0/20:4)*, 1-(1-enyl-palmitoyl)-2-arachidonoyl-GPE (P-16:0/20:4)*, 1-(1-enyl-palmitoyl)-2-linoleoyl-GPC (P-16:0/18:2)*, 1-(1-enyl-palmitoyl)-2-linoleoyl-GPE (P-16:0/18:2)*, 1-(1-enyl-palmitoyl)-2-oleoyl-GPC (P-16:0/18:1)*, 1-(1-enyl-palmitoyl)-2-oleoyl-GPE (P-16:0/18:1)*, 1-(1-enyl-palmitoyl)-2-palmitoleoyl-GPC (P-16:0/16:1)*, 1-(1-enyl-palmitoyl)-2-palmitoyl-GPC (P-16:0/16:0)*, 1-(1-enyl-palmitoyl)-GPC (P-16:0)*, 1-(1-enyl-palmitoyl)-GPE (P-16:0)*, 1-(1-enyl-stearoyl)-2-arachidonoyl-GPE (P-18:0/20:4)*, 1-(1-enyl-stearoyl)-2-linoleoyl-GPE (P-18:0/18:2)*, 1-(1-enyl-stearoyl)-2-oleoyl-GPE (P-18:0/18:1), 1-(1-enyl-stearoyl)-GPE (P-18:0)*, 1-arachidonoyl-GPA (20:4), 1-arachidonoyl-GPC (20:4n6)*, 1-arachidonoyl-GPE (20:4n6)*, 1-arachidonoyl-GPI (20:4)*, 1-arachidonylglycerol (20:4), 1-dihomo-linolenylglycerol (20:3), 1-dihomo-linoleoylglycerol (20:2), 1-docosahexaenoylglycerol (22:6), 1-lignoceroyl-GPC (24:0), 1-linolenoyl-GPC (18:3)*, 1-linolenoylglycerol (18:3), 1-linoleoyl-2-arachidonoyl-GPC (18:2/20:4n6)*, 1-linoleoyl-2-linolenoyl-GPC (18:2/18:3)*, 1-linoleoyl-GPA (18:2)*, 1-linoleoyl-GPC (18:2), 1-linoleoyl-GPE (18:2)*, 1-linoleoyl-GPG (18:2)*, 1-linoleoyl-GPI (18:2)*, 1-linoleoylglycerol (18:2), 1-methylhistidine, 1-methylimidazoleacetate, 1-methylnicotinamide, 1-methylurate, 1-methylxanthine, 1-myristoyl-2-arachidonoyl-GPC (14:0/20:4)*, 1-myristoyl-2-palmitoyl-GPC (14:0/16:0), 1-myristoylglycerol (14:0), 1-oleoyl-2-docosahexaenoyl-GPC (18:1/22:6)*, 1-oleoyl-2-docosahexaenoyl-GPE (18:1/22:6)*, 1-oleoyl-GPC (18:1), 1-oleoyl-GPE (18:1), 1-oleoyl-GPG (18:1)*, 1-oleoyl-GPI (18:1)*, 1-oleoylglycerol (18:1), 1-palmitoleoyl-2-linolenoyl-GPC (16:1/18:3)*, 1-palmitoleoyl-2-linoleoyl-GPC (16:1/18:2)*, 1-palmitoleoyl-GPC (16:1)*, 1-palmitoleoylglycerol (16:1)*, 1-palmitoyl-2-arachidonoyl-GPC (16:0/20:4n6), 1-palmitoyl-2-arachidonoyl-GPE (16:0/20:4)*, 1-palmitoyl-2-arachidonoyl-GPI (16:0/20:4)*, 1-palmitoyl-2-docosahexaenoyl-GPC (16:0/22:6), 1-palmitoyl-2-docosahexaenoyl-GPE (16:0/22:6)*, 1-palmitoyl-2-gamma-linolenoyl-GPC (16:0/18:3n6)*, 1-palmitoyl-2-linoleoyl-GPC (16:0/18:2), 1-palmitoyl-2-linoleoyl-GPE (16:0/18:2), 1-palmitoyl-2-linoleoyl-GPI (16:0/18:2), 1-palmitoyl-2-oleoyl-GPC (16:0/18:1), 1-palmitoyl-2-oleoyl-GPE (16:0/18:1), 1-palmitoyl-2-oleoyl-GPI (16:0/18:1)*, 1-palmitoyl-2-palmitoleoyl-GPC (16:0/16:1)*, 1-palmitoyl-GPA (16:0), 1-palmitoyl-GPC (16:0), 1-palmitoyl-GPE (16:0), 1-palmitoyl-GPG (16:0)*, 1-palmitoyl-GPI (16:0), 1-palmitoylglycerol (16:0), 1-stearoyl-2-arachidonoyl-GPC (18:0/20:4), 1-stearoyl-2-arachidonoyl-GPE (18:0/20:4), 1-stearoyl-2-arachidonoyl-GPI (18:0/20:4), 1-stearoyl-2-docosahexaenoyl-GPC (18:0/22:6), 1-stearoyl-2-docosahexaenoyl-GPE (18:0/22:6)*, 1-stearoyl-2-linoleoyl-GPC (18:0/18:2)*, 1-stearoyl-2-linoleoyl-GPE (18:0/18:2)*, 1-stearoyl-2-linoleoyl-GPI (18:0/18:2), 1-stearoyl-2-oleoyl-GPC (18:0/18:1), 1-stearoyl-2-oleoyl-GPE (18:0/18:1), 1-stearoyl-2-oleoyl-GPI (18:0/18:1)*, 1-stearoyl-2-oleoyl-GPS (18:0/18:1), 1-stearoyl-GPC (18:0), 1-stearoyl-GPE (18:0), 1-stearoyl-GPG (18:0), 1-stearoyl-GPI (18:0), 1-stearoyl-GPS (18:0)*, 10-heptadecenoate (17:1n7), 10-nonadecenoate (19:1n9), 10-undecenoate (11:1n1), “12,13-DiHOME”, 12-HETE, 12-HHTrE, 13-HODE+9-HODE, 13-methylmyristate, 14-HDoHE/17-HDoHE, 15-methylpalmitate, 16a-hydroxy DHEA 3-sulfate, 17-methylstearate, 17alpha-hydroxypregnanolone glucuronide, 17alpha-hydroxypregnenolone 3-sulfate, 1H-indole-7-acetic acid, 2′-deoxyuridine, 2′-O-methylcytidine, 2′-O-methyluridine, “2,3-dihydroxy-2-methylbutyrate”, “2,3-dihydroxyisovalerate”, “2,3-dihydroxypyridine”, 2-acetamidophenol sulfate, 2-aminoadipate, 2-aminobutyrate, 2-aminoheptanoate, 2-aminooctanoate, 2-aminophenol sulfate, 2-arachidonoylglycerol (20:4), 2-docosahexaenoylglycerol (22:6)*, 2-hydroxy-3-methylvalerate, 2-hydroxyacetaminophen sulfate*, 2-hydroxyadipate, 2-hydroxybehenate, 2-hydroxybutyrate/2-hydroxyisobutyrate, 2-hydroxydecanoate, 2-hydroxyglutarate, 2-hydroxyhippurate (salicylurate), 2-hydroxyibuprofen, 2-hydroxylaurate, 2-hydroxynervonate*, 2-hydroxyoctanoate, 2-hydroxypalmitate, 2-hydroxyphenylacetate, 2-hydroxystearate, 2-keto-3-deoxy-gluconate, 2-linoleoylglycerol (18:2), 2-methoxyacetaminophen glucuronide*, 2-methoxyacetaminophen sulfate*, 2-methoxyresorcinol sulfate, 2-methylbutyrylcarnitine (C5), 2-methylcitrate/homocitrate, 2-methylserine, 2-oleoylglycerol (18:1), 2-oxoarginine*, 2-palmitoleoyl-GPC (16:1)*, 2-palmitoyl-GPC (16:0)*, 2-palmitoylglycerol (16:0), 2-piperidinone, 2-pyrrolidinone, 2-stearoyl-GPE (18:0)*, 21-hydroxypregnenolone disulfate, “3,4-methyleneheptanoate”, “3,7-dimethylurate”, 3-(3-hydroxyphenyl)propionate, 3-(3-hydroxyphenyl)propionate sulfate, 3-(4-hydroxyphenyl)lactate, 3-(cystein-S-yl)acetaminophen*, 3-(N-acetyl-L-cystein-S-yl) acetaminophen, 3-acetylphenol sulfate, 3-aminoisobutyrate, 3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-hydroxy-2-ethylpropionate, 3-hydroxy-3-methylglutarate, 3-hydroxybutyrate (BHBA), 3-hydroxybutyrylcarnitine (1),3-hydroxybutyrylcarnitine (2),3-hydroxycotinine glucuronide, 3-hydroxydecanoate, 3-hydroxyhexanoate, 3-hydroxyhippurate, 3-hydroxyisobutyrate, 3-hydroxylaurate, 3-hydroxyoctanoate, 3-hydroxypyridine sulfate, 3-hydroxyquinine, 3-indoxyl sulfate, 3-methoxycatechol sulfate (1),3-methoxycatechol sulfate (2),3-methoxytyramine sulfate, 3-methoxytyrosine, 3-methyl catechol sulfate (1),3-methyl catechol sulfate (2), 3-methyl-2-oxobutyrate, 3-methyl-2-oxovalerate, 3-methyladipate, 3-methylcytidine, 3-methylglutaconate, 3-methylglutarylcarnitine (2),3-methylhistidine, 3-methylxanthine, 3-phenylpropionate (hydrocinnamate), 3-sulfo-L-alanine, 3-ureidopropionate, 3b-hydroxy-5-cholenoic acid, 3beta-hydroxy-5-cholestenoate, 4-acetamidobenzoate, 4-acetamidobutanoate, 4-acetamidophenol, 4-acetamidophenylglucuronide, 4-acetaminophen sulfate, 4-acetylphenol sulfate, 4-allylphenol sulfate, 4-ethylphenylsulfate, 4-guanidinobutanoate, 4-hydroxybenzoate, 4-hydroxychlorothalonil, 4-hydroxycinnamate sulfate, 4-hydroxycoumarin, 4-hydroxyhippurate, 4-hydroxyphenylacetate, 4-hydroxyphenylpyruvate, 4-imidazoleacetate, 4-methyl-2-oxopentanoate, 4-methylcatechol sulfate, 4-vinylguaiacol sulfate, 4-vinylphenol sulfate, “5,6-dihydrothymine”, 5-(galactosylhydroxy)-L-lysine, 5-acetylamino-6-amino-3-methyluracil, 5-acetylamino-6-formylamino-3-methyluracil, 5-bromotryptophan, 5-dodecenoate (12:1n7), 5-hydroxyhexanoate, 5-hydroxyindoleacetate, 5-hydroxylysine, 5-hydroxymethyl-2-furoic acid, 5-methylthioadenosine (MTA), 5-methyluridine (ribothymidine), 5-oxoproline, “5alpha-androstan-3alpha,17alpha-diol monosulfate”, “5 alpha-androstan-3 alpha,17beta-diol disulfate”, “5alpha-androstan-3alpha,17beta-diol monosulfate (1)”, “5 alpha-androstan-3alpha,17beta-diol monosulfate (2)”, “5alpha-androstan-3beta,17alpha-diol disulfate”, “5alpha-androstan-3beta,17beta-diol disulfate”, “5alpha-androstan-3beta,17beta-diol monosulfate (2)”, “5alpha-pregnan-3 (alpha or beta),20beta-diol disulfate”, “5alpha-pregnan-3beta,20alpha-diol disulfate”, “5alpha-pregnan-3beta,20alpha-diol monosulfate (1)”, “5alpha-pregnan-3beta,20alpha-diol monosulfate (2)”, “5alpha-pregnan-3beta,20beta-diol monosulfate (1)”, “5alpha-pregnan-3beta-ol,20-one sulfate”, 6-hydroxyindole sulfate, 6-oxopiperidine-2-carboxylate, 7-alpha-hydroxy-3-oxo-4-cholestenoate (7-Hoca), 7-methylguanine, 7-methylurate, 7-methylxanthine, “9,10-DiHOME”, 9-hydroxystearate, acesulfame, acetoacetate, acetylcarnitine (C2), acisoga, aconitate [cis or trans], adenine, adenosine, adenosine 5′-monophosphate (AMP), adipate, adipoylcarnitine (C6-DC), ADpSGEGDFXAEGGGVR*, adrenate (22:4n6), ADSGEGDFXAEGGGVR*, alanine, allantoin, alliin, alpha-hydroxyisocaproate, alpha-hydroxyisovalerate, alpha-ketobutyrate, alpha-ketoglutarate, alpha-tocopherol, andro steroid monosulfate C19H28O6S (1)*, “androstenediol (3alpha, 17alpha) monosulfate (2)”, “androstenediol (3alpha, 17alpha) monosulfate (3)”, “androstenediol (3beta,17beta) disulfate (1)”, “androstenediol (3beta,17beta) disulfate (2)”, “androstenediol (3beta,17beta) monosulfate (1)”, “androstenediol (3beta,17beta) monosulfate (2)”, androsterone sulfate, anthranilate, arabinose, arabitol/xylitol, arabonate/xylonate, arachidate (20:0), arachidonate (20:4n6), arachidonoylcarnitine (C20:4), arachidonoylcholine, arachidoylcarnitine (C20)*, argininate*, arginine, asparagine, aspartate, atenolol, azelate (nonanedioate), behenoyl dihydrosphingomyelin (d18:0/22:0)*, behenoyl sphingomyelin (d18:1/22:0)*, benzoate, benzoylcarnitine*, beta-alanine, beta-citrylglutamate, beta-cryptoxanthin, beta-hydroxyisovalerate, betaine, “bilirubin (E,E)*”, “bilirubin (E,Z or Z,E)*”, “bilirubin (Z,Z)”, biliverdin, “bradykinin, des-arg(9)”, butyrylcarnitine (C4), C-glycosyltryptophan, caffeic acid sulfate, caffeine, caprate (10:0), caproate (6:0), caprylate (8:0), carboxyethyl-GABA, carboxyibuprofen, carnitine, carotene diol (1), carotene diol (2), carotene diol (3), catechol glucuronide, catechol sulfate, “ceramide (d16:1/24:1, d18:1/22:1)*”, “ceramide (d18:1/14:0, d16:1/16:0)*”, “ceramide (d18:1/20:0, d16:1/22:0, d20:1/18:0)*”, “ceramide (d18:2/24:1, d18:1/24:2)*”, cerotoylcarnitine (C26)*, cetirizine, chenodeoxycholate, chiro-inositol, cholate, cholesterol, choline, choline phosphate, cinnamoylglycine, cis-4-decenoylcarnitine (C10:1), citraconate/glutaconate, citrate, citrulline, corticosterone, cortisol, cortisone, cotinine, cotinine N-oxide, creatine, creatinine, “cys-gly, oxidized”, cystathionine, cysteine, cysteine s-sulfate, cysteine sulfinic acid, cysteine-glutathione disulfide, cysteinylglycine, cystine, cytidine, cytosine, daidzein sulfate (2), decanoylcarnitine (C10), dehydroisoandrosterone sulfate (DHEA-S), deoxycarnitine, deoxycholate, desmethylnaproxen sulfate, dexlansoprazole, dihomo-linoleate (20:2n6), dihomo-linolenate (20:3n3 or n6), dihomo-linolenoyl-choline, dihomo-linolenoylcarnitine (20:3n3 or 6)*, dihomo-linoleoylcarnitine (C20:2)*, dihydroferulic acid, dihydroorotate, dimethyl sulfone, dimethyl sulfoxide (DMSO), dimethylarginine (SDMA+ADMA), dimethylglycine, docosadienoate (22:2n6), docosadioate, docosahexaenoate (DHA; 22:6n3), docosahexaenoylcarnitine (C22:6)*, docosahexaenoylcholine, docosapentaenoate (n3 DPA; 22:5n3), docosapentaenoate (n6 DPA; 22:5n6), docosatrienoate (22:3n3), dodecanedioate, dopamine 3-O-sulfate, dopamine 4-sulfate, DSGEGDFXAEGGGVR*, ectoine, eicosanodioate, eicosapentaenoate (EPA; 20:5n3), eicosapentaenoylcholine, eicosenoate (20:1), eicosenoylcarnitine (C20:1)*, epiandrosterone sulfate, ergothioneine, erucate (22:1n9), erythritol, erythronate*, escitalopram, estrone 3-sulfate, ethyl glucuronide, ethylmalonate, etiocholanolone glucuronide, eugenol sulfate, ferulic acid 4-sulfate, ferulylglycine (1), fexofenadine, fluoxetine, formiminoglutamate, fructose, fumarate, furaneol sulfate, gabapentin, galactonate, gamma-CEHC, gamma-CEHC glucuronide*, gamma-glutamyl-2-aminobutyrate, gamma-glutamyl-alpha-lysine, gamma-glutamyl-epsilon-lysine, gamma-glutamylalanine, gamma-glutamylglutamate, gamma-glutamylglutamine, gamma-glutamylglycine, gamma-glutamylhistidine, gamma-glutamylisoleucine*, gamma-glutamylleucine, gamma-glutamylmethionine, gamma-glutamylphenylalanine, gamma-glutamylthreonine, gamma-glutamyltryptophan, gamma-glutamyltyrosine, gamma-glutamylvaline, gamma-tocopherol/beta-tocopherol, gentisate, gentisic acid-5-glucoside, gluconate, glucose, glucuronate, glutamate, glutamine, glutarate (pentanedioate), glutarylcarnitine (C5-DC), glycerate, glycerol, glycerol 3-phosphate, glycerophosphoethanolamine, glycerophosphoinositol*, glycerophosphorylcholine (GPC), glycine, glycochenodeoxycholate, glycochenodeoxycholate glucuronide (1), glycochenodeoxycholate sulfate, glycocholate, glycocholate glucuronide (1), glycocholenate sulfate*, glycodeoxycholate, glycodeoxycholate glucuronide (1), glycodeoxycholate sulfate, glycohyocholate, glycolithocholate, glycolithocholate sulfate*, “glycosyl ceramide (d18:1/20:0, d16:1/22:0)*”, “glycosyl ceramide (d18:2/24:1, d18:1/24:2)*”, glycosyl-N-(2-hydroxynervonoyl)-sphingosine (d18:1/24:1(2OH))*, glycosyl-N-behenoyl-sphingadienine (d18:2/22:0)*, glycosyl-N-palmitoyl-sphingosine (d18:1/16:0), glycosyl-N-stearoyl-sphingosine (d18:1/18:0), glycoursodeoxycholate, glycylvaline, guanidinoacetate, guanidinosuccinate, guanosine, gulonate*, heneicosapentaenoate (21:5n3), HEPES, heptanoate (7:0), hexadecadienoate (16:2n6), hexadecanedioate, hexanoylcarnitine (C6), hexanoylglutamine, hippurate, histidine, histidylalanine, homoarginine, homocitrulline, homostachydrine*, HWESASXX*, hydantoin-5-propionic acid, hydrochlorothiazide, hydroquinone sulfate, hydroxybupropion, hydroxycotinine, hypotaurine, hypoxanthine, I-urobilinogen, ibuprofen, ibuprofen acyl glucuronide, imidazole lactate, imidazole propionate, indole-3-carboxylic acid, indoleacetate, indoleacetylglutamine, indolelactate, indolepropionate, indolin-2-one, inosine, isobutyrylcarnitine (C4), isocitrate, isoeugenol sulfate, isoleucine, isoursodeoxycholate, isovalerate, isovalerylcarnitine (C5), isovalerylglycine, kynurenate, kynurenine, L-urobilin, lactate, lactose, lactosyl-N-behenoyl-sphingosine (d18:1/22:0)*, lactosyl-N-nervonoyl-sphingosine (d18:1/24:1)*, lactosyl-N-palmitoyl-sphingosine (d18:1/16:0), lanthionine, laurate (12:0), laurylcarnitine (C12), leucine, leucylalanine, leucylglycine, lignoceroyl sphingomyelin (d18:1/24:0), lignoceroylcarnitine (C24)*, linoleamide (18:2n6), linoleate (18:2n6), linolenate [alpha or gamma; (18:3n3 or 6)], linolenoylcarnitine (C18:3)*, linoleoyl ethanolamide, linoleoyl-arachidonoyl-glycerol (18:2/20:4) [1]*, linoleoyl-arachidonoyl-glycerol (18:2/20:4) [2]*, linoleoyl-linoleoyl-glycerol (18:2/18:2) [1]*, linoleoylcarnitine (C18:2)*, linoleoylcholine*, lysine, malate, maleate, malonate, mannitol/sorbitol, mannose, margarate (17:0), margaroylcarnitine*, metformin, methionine, methionine sulfone, methionine sulfoxide, methyl glucopyranoside (alpha+beta),methyl-4-hydroxybenzoate sulfate, methylphosphate, methylsuccinate, methylsuccinoylcarnitine (1), myo-inositol, myristate (14:0), myristoleate (14:1n5), myristoleoylcarnitine (C14:1)*, myristoyl dihydrosphingomyelin (d18:0/14:0)*, myristoylcarnitine (C14), “N,O-didesmethylvenlafaxine glucuronide”, N-(2-furoyl)glycine, N-acetyl-1-methylhistidine*, N-acetyl-3-methylhistidine*, N-acetyl-aspartyl-glutamate (NAAG), N-acetyl-beta-alanine, N-acetyl-cadaverine, N-acetyl-S-allyl-L-cysteine, N-acetylalanine, N-acetylalliin, N-acetylarginine, N-acetylasparagine, N-acetylaspartate (NAA), N-acetylcarnosine, N-acetylcitrulline, N-acetylglucosamine/N-acetylgalactosamine, N-acetylglucosaminylasparagine, N-acetylglutamate, N-acetylglutamine, N-acetylglycine, N-acetylhistidine, N-acetylisoleucine, N-acetylkynurenine (2), N-acetylleucine, N-acetylmethionine, N-acetylmethionine sulfoxide, N-acetylneuraminate, N-acetylphenylalanine, N-acetylproline, N-acetylputrescine, N-acetylserine, N-acetyltaurine, N-acetylthreonine, N-acetyltryptophan, N-acetyltyrosine, N-acetylvaline, N-behenoyl-sphingadienine (d18:2/22:0)*, N-delta-acetylornithine, N-formylanthranilic acid, N-formylmethionine, N-formylphenylalanine, N-methylpipecolate, N-methylproline, N-methyltaurine, N-oleoylserine, N-oleoyltaurine, N-palmitoyl-heptadecasphingosine (d17:1/16:0)*, N-palmitoyl-sphingadienine (d18:2/16:0)*, N-palmitoyl-sphinganine (d18:0/16:0), N-palmitoyl-sphingosine (d18:1/16:0), N-palmitoylglycine, N-palmitoylserine, N-palmitoyltaurine, N-stearoyl-sphingosine (d18:1/18:0)*, N-stearoyltaurine, N-trimethyl 5-aminovalerate, N1-Methyl-2-pyridone-5-carboxamide, N1-methyladenosine, N1-methylinosine, “N2,N2-dimethylguanosine”, “N2,N5-diacetylornithine”, N2-acetyllysine, N4-acetylcytidine, “N6,N6,N6-trimethyllysine”, N6-acetyllysine, N6-carbamoylthreonyladenosine, N6-succinyladenosine, naproxen, naringenin, naringenin 7-glucuronide, nervonoylcarnitine (C24:1)*, nicotinamide, nisinate (24:6n3), nonadecanoate (19:0), norcotinine, norfluoxetine, o-cresol sulfate, O-desmethylvenlafaxine, O-methylcatechol sulfate, O-sulfo-L-tyrosine, octadecanedioate, octanoylcarnitine (C8), oleamide, oleate/vaccenate (18:1), oleoyl ethanolamide, oleoyl-linoleoyl-glycerol (18:1/18:2) [1], oleoyl-linoleoyl-glycerol (18:1/18:2) [2], oleoylcarnitine (C18:1), oleoylcholine, omeprazole, ornithine, orotate, orotidine, oxalate (ethanedioate), oxypurinol, p-cresol sulfate, p-cresol-glucuronide*, palmitate (16:0), palmitic amide, palmitoleate (16:1n7), palmitoleoylcarnitine (C16:1)*, palmitoloelycholine, palmitoyl dihydrosphingomyelin (d18:0/16:0)*, palmitoyl sphingomyelin (d18:1/16:0), palmitoylcarnitine (C16), palmitoylcholine, pantoprazole, pantothenate, paraxanthine, paroxetine, pentadecanoate (15:0), perfluorooctanesulfonic acid (PFOS), phenol glucuronide, phenol sulfate, phenylacetate, phenylacetylcarnitine, phenylacetylglutamine, phenylalanine, phenylalanylglycine, phenyllactate (PLA), phenylpyruvate, phosphate, phosphoethanolamine, phytanate, picolinate, pimeloylcarnitine/3-methyladipoylcarnitine (C7-DC), pipecolate, piperine, pivaloylcarnitine (C5), pregn steroid monosulfate C21H34O5S*, pregnanediol-3-glucuronide, pregnanolone/allopregnanolone sulfate, pregnen-diol disulfate C21H34O8S2*, pregnenolone sulfate, pristanate, pro-hydroxy-pro, proline, prolylglycine, propionylcarnitine (C3), propionylglycine, propyl 4-hydroxybenzoate, propyl 4-hydroxybenzoate sulfate, pseudoephedrine, pseudouridine, pyridostigmine, pyridoxate, pyroglutamine*, pyrraline, pyruvate, quetiapine, quinate, quinine, quinolinate, retinol (Vitamin A), ribitol, riboflavin (Vitamin B2), ribonate, ribose, riluzole, S-1-pyrroline-5-carboxylate, S-adenosylhomocysteine (SAH), S-allylcysteine, S-carboxymethyl-L-cysteine, S-methylcysteine, S-methylcysteine sulfoxide, S-methylmethionine, saccharin, salicylate, salicyluric glucuronide*, sarcosine, sebacate (decanedioate), serine, serotonin, silibinin, sitagliptin, spermidine, sphinganine-1-phosphate, “sphingomyelin (d17:1/16:0, d18:1/15:0, d16:1/17:0)*”, “sphingomyelin (d17:2/16:0, d18:2/15:0)*”, “sphingomyelin (d18:0/18:0, d19:0/17:0)*”, “sphingomyelin (d18:0/20:0, d16:0/22:0)*”, “sphingomyelin (d18:1/14:0, d16:1/16:0)*”, “sphingomyelin (d18:1/17:0, d17:1/18:0, d19:1/16:0)”, “sphingomyelin (d18:1/18:1, d18:2/18:0)”, “sphingomyelin (d18:1/19:0, d19:1/18:0)*”, “sphingomyelin (d18:1/20:0, d16:1/22:0)*”, “sphingomyelin (d18:1/20:1, d18:2/20:0)*”, “sphingomyelin (d18:1/20:2, d18:2/20:1, d16:1/22:2)*”, “sphingomyelin (d18:1/21:0, d17:1/22:0, d16:1/23:0)*”, “sphingomyelin (d18:1/22:1, d18:2/22:0, d16:1/24:1)*”, “sphingomyelin (d18:1/22:2, d18:2/22:1, d16:1/24:2)*”, “sphingomyelin (d18:1/24:1, d18:2/24:0)*”, “sphingomyelin (d18:1/25:0, d19:0/24:1, d20:1/23:0, d19:1/24:0)*”, “sphingomyelin (d18:2/14:0, d18:1/14:1)*”, “sphingomyelin (d18:2/16:0, d18:1/16:1)*”, sphingomyelin (d18:2/18:1)*, “sphingomyelin (d18:2/21:0, d16:2/23:0)*”, “sphingomyelin (d18:2/23:0, d18:1/23:1, d17:1/24:1)*”, sphingomyelin (d18:2/23:1)*, “sphingomyelin (d18:2/24:1, d18:1/24:2)*”, sphingomyelin (d18:2/24:2)*, sphingosine, sphingosine 1-phosphate, stachydrine, stearate (18:0), stearidonate (18:4n3), stearoyl sphingomyelin (d18:1/18:0), stearoylcarnitine (C18), stearoylcholine*, suberate (octanedioate), suberoylcarnitine (C8-DC), succinate, succinylcarnitine (C4-DC), sucrose, sulfate*, syringol sulfate, tartarate, tartronate (hydroxymalonate), taurine, tauro-beta-muricholate, taurochenodeoxycholate, taurocholate, taurocholenate sulfate, taurodeoxycholate, taurolithocholate 3-sulfate, tauroursodeoxycholate, tetradecanedioate, theanine, theobromine, theophylline, thioproline, threonate, threonine, threonylphenylalanine, thymol sulfate, thyroxine, tiglylcarnitine (C5:1-DC), trans-4-hydroxyproline, trans-urocanate, tricosanoyl sphingomyelin (d18:1/23:0)*, triethanolamine, trigonelline (N′-methylnicotinate), trimethylamine N-oxide, tryptophan, tryptophan betaine, tyramine O-sulfate, tyrosine, umbelliferone sulfate, undecanedioate, uracil, urate, urea, uridine, ursodeoxycholate, valerate, valine, valsartan, vanillactate, vanillic alcohol sulfate, vanillylmandelate (VMA), venlafaxine, warfarin, xanthine, xanthosine, xanthurenate, ximenoylcarnitine (C26:1)*, xylose, X-01911, X-07765, X-11261, X-11299, X-11308, X-11315, X-11372, X-11378, X-11381, X-11407, X-11441, X-11442, X-11444, X-11470, X-11478, X-11483, X-11485, X-11491, X-11522, X-11530, X-11593, X-11640, X-11787, X-11795, X-11843, X-11847, X-11849, X-11850, X-11852, X-11858, X-11880, X-12007, X-12013, X-12015, X-12026, X-12063, X-12096, X-12100, X-12101, X-12104, X-12112, X-12117, X-12126, X-12127, X-12193, X-12206, X-12212, X-12216, X-12221, 4-ethylcatechol sulfate, X-12261, X-12263, X-12283, X-12306, X-12329, X-12407, X-12410, X-12411, X-12456, X-12462, X-12472, X-12524, X-12543, X-12544, X-12565, X-12680, X-12701, X-12712, X-12714, X-12718, X-12726, X-12729, X-12730, X-12731, X-12738, X-12739, X-12740, X-12753, X-12798, X-12812, X-12816, X-12818, X-12820, X-12822, X-12830, X-12831, X-12837, X-12839, X-12844, X-12846, X-12847, X-12849, X-12851, X-12879, X-12906, X-13007, X-13255, X-13431, X-13435, X-13553, X-13658, X-13684, X-13703, X-13723, X-13728, X-13729, X-13737, X-13835, X-13844, X-13846, X-13866, X-14056, X-14082, X-14095, X-14096, X-14314, X-14364, X-14662, X-14904, X-14939, X-15220, X-15245, X-15461, X-15469, X-15486, X-15492, X-15503, X-15666, X-15674, X-15728, X-16087, X-16124, X-16132, X-16397, X-16570, X-16576, X-16580, X-16654, X-16935, X-16938, X-16944, X-16946, X-16964, X-17010, X-17145, X-17146, X-17185, X-17325, X-17327, X-17328, X-17335, X-17337, X-17340, X-17343, X-17348, X-17351, X-17353, X-17354, X-17357, X-17359, X-17367, X-17438, X-17469, X-17612, X-17653, X-17654, X-17655, X-17673, X-17676, X-17677, X-17685, X-17690, X-17704, X-17765, X-18240, X-18249, X-18345, X-18606, X-18779, X-18886, X-18887, X-18899, X-18901, X-18913, X-18914, X-18921, X-18922, X-19141, X-19183, X-19434, X-19438, X-19561, X-21258, X-21285, X-21286, X-21295, X-21310, X-21312, X-21319, X-21327, X-21339, X-21341, X-21342, X-21353, X-21364, X-21383, X-21410, X-21411, X-21441, X-21442, X-21444, X-21448, X-21467, X-21470, X-21474, X-21607, X-21628, X-21657, X-21659, X-21661, X-21729, X-21736, X-21737, X-21742, X-21752, X-21792, X-21796, X-21803, X-21807, X-21815, X-21816, X-21821, X-21829, X-21834, X-21838, X-21839, X-21842, X-21845, X-21851, X-22143, X-22162, X-22475, X-22509, X-22520, X-22716, X-22764, X-22771, X-22775, X-22834, X-23276, X-23291, X-23294, X-23295, X-23297, X-23314, X-23369, X-23583, X-23585, X-23587, X-23588, X-23593, X-23637, X-23639, X-23644, X-23649, X-23652, X-23654, X-23655, X-23659, X-23666, X-23680, X-23739, X-23780, X-23782, X-23787, X-23974, X-23997, X-24106, X-24243, X-24293, X-24295, X-24309, X-24328, X-24329, X-24337, X-24348, X-24352, X-24410, X-24411, X-24422, X-24425, X-24432, X-24435, X-24455, X-24456, X-24473, X-24475, X-24498, X-24512, X-24518, X-24519, X-24527, X-24542, X-24544, X-24546, X-24549, X-24550, X-24551, X-24552, X-24554, X-24555, X-24556, X-24557, X-24558, X-24560, X-24571, X-24588, X-24637, X-24655, X-24686, X-24693, X-24699, X-24706, X-24728, X-24736, X-24747, X-24748, X-24757, X-24760, X-24765, X-24801, X-24809, X-24811, X-24812, X-24813, X-24831, X-24832, X-24849, X-24932, X-24947, X-24948, X-24949, X-24951, X-24952, X-24972, X-24983, X-25116, 1-carboxyethylisoleucine, 1-carboxyethylleucine, 1-carboxyethylphenylalanine, 1-carboxyethylvaline, 1-methyl-5-imidazoleacetate, 1-ribosyl-imidazoleacetate*, “2,2′-Methylenebis(6-tert-butyl-p-cresol)”, “2,3-dihydroxy-5-methylthio-4-pentenoate (DMTPA)*”, “2,6-dihydroxybenzoic acid”, 2-naphthol sulfate, 3-(methylthio)acetaminophen sulfate*, 3-amino-2-piperidone, 3-carboxy-4-methyl-5-pentyl-2-furanpropionate (3-CMPFP)**, 3-formylindole, 3-hydroxyhippurate sulfate, 3-hydroxystachydrine*, “5,6-dihydrouridine”, 5-dodecenoylcarnitine (C12:1), 5-methylthioribose**, androsterone glucuronide, cis-4-decenoate (10:1n6)*, cysteinylglycine disulfide*, dihydrocaffeate sulfate (2), dodecadienoate (12:2)*, dodecenedioate (C12:1-DC)*, eicosenedioate (C20:1-DC)*, Fibrinopeptide A (2-15)**, Fibrinopeptide A (3-15)**, Fibrinopeptide A (3-16)**, Fibrinopeptide A (4-15)**, Fibrinopeptide A (5-16)*, Fibrinopeptide A (7-16)*, Fibrinopeptide B (1-11)**, Fibrinopeptide B (1-12)**, Fibrinopeptide B (1-13)**, gamma-glutamylcitrulline*, glu-gly-asn-val**, glucuronide of C10H18O2 (1)*, glucuronide of C10H18O2 (7)*, glucuronide of C10H18O2 (8)*, glycine conjugate of C10H14O2 (1)*, glyco-beta-muricholate**, hexadecenedioate (C16:1-DC)*, hydroxy-CMPF*, “hydroxy-N6,N6,N6-trimethyllysine*”, hydroxyasparagine**, hydroxypalmitoyl sphingomyelin (d18:1/16:0(OH))**, “N,N,N-trimethyl-alanylproline betaine (TMAP)”, “N,N-dimethyl-5-aminovalerate”, N-acetyl-2-aminooctanoate*, N-acetyl-isoputreanine*, N-methylhydroxyproline**, nonanoylcarnitine (C9), octadecadienedioate (C18:2-DC)*, octadecenedioate (C18:1-DC)*, octadecenedioylcarnitine (C18:1-DC)*, perfluorooctanoate (PFOA), picolinoylglycine, pregnenetriol disulfate*, sulfate of piperine metabolite C16H19NO3 (2)*, sulfate of piperine metabolite C16H19NO3 (3)*, taurochenodeoxycholic acid 3-sulfate, taurodeoxycholic acid 3-sulfate, tetradecadienoate (14:2)*, tridecenedioate (C13:1-DC)*
According to a particular embodiment, the metabolite is not glucose and not cholesterol. According to a particular embodiment the metabolite is set forth in Table 1 and more preferably in Table 2. Sequence identifier for the metagenomic sequences of the unknown bacteria recited in Tables 1 and 2 are provided in Table 10.
As used herein, the term “microbiome” refers to the totality of microbes (bacteria, fungae, protists), their genetic elements (genomes) in a defined environment.
According to a particular embodiment, the microbiome is a gut microbiome (i.e. microbiota of the digestive track). In one embodiment, the environment is the small intestine. In another embodiment the environment is the large intestine. The microbiome may be of the lumen or the mucosa of the small intestine or large intestine. In still another embodiment, the gut microbiome is a fecal microbiome.
In some embodiments, a microbiota sample is collected by any means that allows recovery of the microbes and without disturbing the relative amounts of microbes or components or products thereof of a microbiome. In some embodiments, the microbiota sample is a fecal sample. In other embodiments, the microbiota sample is retrieved directly from the gut—e.g. by endoscopy from the lower gastrointestinal (GI) tract or from the upper GI tract. The microbiota sample may be of the lumen of the GI tract or the mucosa of the GI tract.
According to one embodiment the microbiome sample (e.g. fecal sample) is frozen and/or lyophilized prior to analysis. According to another embodiment, the sample may be subjected to solid phase extraction methods.
In some embodiments, the presence, level, and/or activity of between 5 and 10 species of microbes are measured. In some embodiments, the presence, level, and/or activity of between 5 and 20 species of microbes are measured. In some embodiments, the presence, level, and/or activity of between 5 and 50 species of microbes are measured. In some embodiments, the presence, level, and/or activity of between 5 and 100 species of microbes are measured. In some embodiments, the presence, level, and/or activity of between 5 and 500 species of microbes are measured. In some embodiments, the presence, level, and/or activity of between 5 and 1000 species of microbes are measured. In some embodiments, the presence, level, and/or activity of between 50 and 500 species of microbes (e.g. bacteria) are measured. In some embodiments, the presence, level, and/or activity of substantially all species/classes/families of bacteria within the microbiome are measured. In still more embodiments, the presence, level, and/or activity of substantially all the bacteria within the microbiome are measured.
Measuring a level or presence of a microbe may be effected by analyzing for the presence of microbial component or a microbial by-product. Thus, for example the level or presence of a microbe may be effected by measuring the level of a DNA sequence. In some embodiments, the level or presence of a microbe may be effected by measuring 16S rRNA gene sequences or 18S rRNA gene sequences. In other embodiments, the level or presence of a microbe may be effected by measuring RNA transcripts. In still other embodiments the level or presence of a microbe may be effected by measuring proteins. In still other embodiments, the level or presence of a microbe may be effected by measuring metabolites present in the microbiome sample.
Quantifying Microbial Levels:
It will be appreciated that determining the abundance of microbes may be affected by taking into account any feature of the microbiome. Thus, the abundance of microbes may be affected by taking into account the abundance at different phylogenetic levels; at the level of gene abundance; gene metabolic pathway abundances; sub-species strain identification; SNPs and insertions and deletions in specific bacterial regions; growth rates of bacteria, the diversity of the microbes of the microbiome, as further described herein below.
In some embodiments, determining a level or set of levels of one or more types of microbes or components or products thereof comprises determining a level or set of levels of one or more DNA sequences. In some embodiments, one or more DNA sequences comprises any DNA sequence that can be used to differentiate between different microbial types. In certain embodiments, one or more DNA sequences comprises 16S rRNA gene sequences. In certain embodiments, one or more DNA sequences comprises 18S rRNA gene sequences. In some embodiments, 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, 100, 1,000, 5,000 or more sequences are amplified.
16S and 18S rRNA gene sequences encode small subunit components of prokaryotic and eukaryotic ribosomes respectively. rRNA genes are particularly useful in distinguishing between types of microbes because, although sequences of these genes differs between microbial species, the genes have highly conserved regions for primer binding. This specificity between conserved primer binding regions allows the rRNA genes of many different types of microbes to be amplified with a single set of primers and then to be distinguished by amplified sequences.
In some embodiments, a microbiota sample (e.g. fecal sample) is directly assayed for a level or set of levels of one or more DNA sequences. In some embodiments, DNA is isolated from a microbiota sample and isolated DNA is assayed for a level or set of levels of one or more DNA sequences. Methods of isolating microbial DNA are well known in the art. Examples include but are not limited to phenol-chloroform extraction and a wide variety of commercially available kits, including QIAamp DNA Stool Mini Kit (Qiagen, Valencia, Calif.).
In some embodiments, a level or set of levels of one or more DNA sequences is determined by amplifying DNA sequences using PCR (e.g., standard PCR, semi-quantitative, or quantitative PCR) and then sequencing. In some embodiments, a level or set of levels of one or more DNA sequences is determined by amplifying DNA sequences using quantitative PCR. These and other basic DNA amplification procedures are well known to practitioners in the art and are described in Ausebel et al. (Ausubel F M, Brent R, Kingston R E, Moore D, Seidman J G, Smith J A, Struhl K (eds). 1998. Current Protocols in Molecular Biology. Wiley: New York).
In some embodiments, DNA sequences are amplified using primers specific for one or more sequence that differentiate(s) individual microbial types from other, different microbial types. In some embodiments, 16S rRNA gene sequences or fragments thereof are amplified using primers specific for 16S rRNA gene sequences. In some embodiments, 18S DNA sequences are amplified using primers specific for 18S DNA sequences.
In some embodiments, a level or set of levels of one or more 16S rRNA gene sequences is determined using phylochip technology. Use of phylochips is well known in the art and is described in Hazen et al. (“Deep-sea oil plume enriches indigenous oil-degrading bacteria.” Science, 330, 204-208, 2010), the entirety of which is incorporated by reference. Briefly, 16S rRNA genes sequences are amplified and labeled from DNA extracted from a microbiota sample. Amplified DNA is then hybridized to an array containing probes for microbial 16S rRNA genes. Level of binding to each probe is then quantified providing a sample level of microbial type corresponding to 16S rRNA gene sequence probed. In some embodiments, phylochip analysis is performed by a commercial vendor. Examples include but are not limited to Second Genome Inc. (San Francisco, Calif.).
In some embodiments, determining a level or set of levels of one or more types of microbes comprises determining a level or set of levels of one or more microbial RNA molecules (e.g., transcripts). Methods of quantifying levels of RNA transcripts are well known in the art and include but are not limited to northern analysis, semi-quantitative reverse transcriptase PCR, quantitative reverse transcriptase PCR, and microarray analysis.
Methods for sequence determination are generally known to the person skilled in the art. Preferred sequencing methods are next generation sequencing methods or parallel high throughput sequencing methods. For example, a bacterial genomic sequence may be obtained by using Massively Parallel Signature Sequencing (MPSS). An example of an envisaged sequence method is pyrosequencing, in particular 454 pyrosequencing, e.g. based on the Roche 454 Genome Sequencer. This method amplifies DNA inside water droplets in an oil solution with each droplet containing a single DNA template attached to a single primer-coated bead that then forms a clonal colony. Pyrosequencing uses luciferase to generate light for detection of the individual nucleotides added to the nascent DNA, and the combined data are used to generate sequence read-outs. Yet another envisaged example is Illumina or Solexa sequencing, e.g. by using the Illumina Genome Analyzer technology, which is based on reversible dye-terminators. DNA molecules are typically attached to primers on a slide and amplified so that local clonal colonies are formed. Subsequently one type of nucleotide at a time may be added, and non-incorporated nucleotides are washed away. Subsequently, images of the fluorescently labeled nucleotides may be taken and the dye is chemically removed from the DNA, allowing a next cycle. Yet another example is the use of Applied Biosystems' SOLiD technology, which employs sequencing by ligation. This method is based on the use of a pool of all possible oligonucleotides of a fixed length, which are labeled according to the sequenced position. Such oligonucleotides are annealed and ligated. Subsequently, the preferential ligation by DNA ligase for matching sequences typically results in a signal informative of the nucleotide at that position. Since the DNA is typically amplified by emulsion PCR, the resulting bead, each containing only copies of the same DNA molecule, can be deposited on a glass slide resulting in sequences of quantities and lengths comparable to Illumina sequencing. A further method is based on Helicos' Heliscope technology, wherein fragments are captured by polyT oligomers tethered to an array. At each sequencing cycle, polymerase and single fluorescently labeled nucleotides are added and the array is imaged. The fluorescent tag is subsequently removed and the cycle is repeated. Further examples of sequencing techniques encompassed within the methods of the present invention are sequencing by hybridization, sequencing by use of nanopores, microscopy-based sequencing techniques, microfluidic Sanger sequencing, or microchip-based sequencing methods.
According to one embodiment, the sequencing method allows for quantitating the amount of microbe—e.g. by deep sequencing such as Illumina deep sequencing.
As used herein, the term “deep sequencing” refers to a sequencing method wherein the target sequence is read multiple times in the single test. A single deep sequencing run is composed of a multitude of sequencing reactions run on the same target sequence and each, generating independent sequence readout.
In some embodiments, determining a level or set of levels of one or more types of microbes comprises determining a level or set of levels of one or more microbial polypeptides. Methods of quantifying polypeptide levels are well known in the art and include but are not limited to Western analysis and mass spectrometry.
It will be appreciated that although the abundance of any number of microbes may be measured, a limited number are preferably used in the prediction analysis.
The present inventors have shown that the number of microbes whose abundance should be analyzed in order to predict the amount of a blood metabolite may be particular to that metabolite. Preferably, the abundance of at least 5 bacterial species are analyzed, at least 10 bacterial species are analyzed, at least 15 bacterial species are analyzed, at least 20 bacterial species are analyzed, at least 25 bacterial species are analyzed or more than 25 bacterial species are analyzed.
According to another embodiment, in order to classify a microbe as belonging to a particular genus, family, order, class or phylum, it must comprise at least 90% sequence homology, at least 91% sequence homology, at least 92% sequence homology, at least 93% sequence homology, at least 94% sequence homology, at least 95% sequence homology, at least 96% sequence homology, at least 97% sequence homology, at least 98% sequence homology, at least 99% sequence homology to a reference microbe known to belong to the particular genus. According to a particular embodiment, the sequence homology is at least 95%.
According to another embodiment, in order to classify a microbe as belonging to a particular species, it must comprise at least 90% sequence homology, at least 91% sequence homology, at least 92% sequence homology, at least 93% sequence homology, at least 94% sequence homology, at least 95% sequence homology, at least 96% sequence homology, at least 97% sequence homology, at least 98% sequence homology, at least 99% sequence homology to a reference microbe known to belong to the particular species. According to a particular embodiment, the sequence homology is at least 97%.
In determining whether a nucleic acid or protein is substantially homologous or shares a certain percentage of sequence identity with a sequence of the invention, sequence similarity may be defined by conventional algorithms, which typically allow introduction of a small number of gaps in order to achieve the best fit. In particular, “percent identity” of two polypeptides or two nucleic acid sequences is determined using the algorithm of Karlin and Altschul (Proc. Natl. Acad. Sci. USA 87:2264-2268, 1993). Such an algorithm is incorporated into the BLASTN and BLASTX programs of Altschul et al. (J. Mol. Biol. 215:403-410, 1990). BLAST nucleotide searches may be performed with the BLASTN program to obtain nucleotide sequences homologous to a nucleic acid molecule of the invention. Equally, BLAST protein searches may be performed with the BLASTX program to obtain amino acid sequences that are homologous to a polypeptide of the invention. To obtain gapped alignments for comparison purposes, Gapped BLAST is utilized as described in Altschul et al. (Nucleic Acids Res. 25:3389-3402, 1997). When utilizing BLAST and Gapped BLAST programs, the default parameters of the respective programs (e.g., BLASTX and BLASTN) are employed. See www(dot)ncbi(dot)nlm(dot)nih(dot)gov for more details.
In one embodiment, the abundance of no more than 30 bacterial species are analyzed, no more than 40 bacterial species are analyzed or no more than 50 bacterial species are analyzed.
Preferably, at least one of the bacteria that is analyzed belongs to the Clostridiales order.
Preferably at least one of the bacteria that is analyzed belongs to the phylum Firmicutes.
Preferably, at least 20% of the bacteria that are analyzed for the prediction of a single metabolite belong to the phylum Firmicutes. Preferably, at least 30% of the bacteria that are analyzed for the prediction of a single metabolite belong to the phylum Firmicutes. Preferably, at least 40% of the bacteria that are analyzed for the prediction of a single metabolite, belong to the phylum Firmicutes. Preferably, at least 50% of the bacteria that are analyzed for the prediction of a single metabolite belong to the phylum Firmicutes. Preferably, at least 60% of the bacteria that are analyzed for the prediction of a single metabolite belong to the phylum Firmicutes. Preferably, at least 70% of the bacteria that are analyzed for the prediction of a single metabolite belong to the phylum Firmicutes.
In another embodiment, the bacteria that is analyzed does not belong to the Bacteroidetes phylum. Preferably, less than 50% of the bacteria that are analyzed for the prediction of a single metabolite belong to the Bacteroidetes phylum. Preferably, less than 40% of the bacteria that are analyzed for the prediction of a single metabolite belong to the Bacteroidetes phylum. Preferably, less than 30% of the bacteria that are analyzed for the prediction of a single metabolite belong to the Bacteroidetes phylum. Preferably, less than 20% of the bacteria that are analyzed for the prediction of a single metabolite belong to the Bacteroidetes phylum. Preferably, less than 10% of the bacteria that are analyzed for the prediction of a single metabolite belong to the Bacteroidetes phylum.
According to a particular embodiment at least one of the bacterial features whose abundance are analyzed includes: (8002) S: Streptococcus thermophiles; (4810) S: Blautia sp CAG 237; (4961) G: Eubacterium; (3957) F: Lachnospiraceae; (4960) G: Eubacterium; (4581) S: Dorea longicatena; (4782) U: Unknown; (14322) S: Eggerthella sp CAG 209; (5190) S: Firmicutes bacterium CAG 102; (4577) S: Coprococcus comes; (6359) F: Clostridiaceae; (14861) U: Unknown; (3926) U: Unknown; (15073) G: Oscillibacter; (4749) S: Clostridium sp CAG 7; (6148) F: Peptostreptococcaceae; (4705) S: Clostridium sp CAG 43; (14397) S: Collinsella sp CAG 289; (15119) F: Clostridiales unclassified; (15041) F: Clostridiales unclassified; (5843) S: Allisonella histaminiformans; (14921) U: Unknown; (14306) S: Clostridium sp CAG 138; (15154) F: Clostridiales unclassified; (14816) F: Eggerthellaceae.
Table 1 provides a list of preferred bacteria whose abundance may be measured for the quantitative prediction per metabolite.
According to a particular embodiment, the metabolite which is analyzed is set forth in Table 1 and more preferably in Table 2.
The analysis of the amounts of the microbes of the microbiome is optionally and preferably by executing a machine learning procedure.
As used herein the term “machine learning” refers to a procedure embodied as a computer program configured to induce patterns, regularities, or rules from previously collected data to develop an appropriate response to future data, or describe the data in some meaningful way.
Representative examples of machine learning procedures suitable for the present embodiments, include, without limitation, clustering, association rule algorithms, feature evaluation algorithms, subset selection algorithms, support vector machines, classification rules, cost-sensitive classifiers, vote algorithms, stacking algorithms, Bayesian networks, decision trees, neural networks, instance-based algorithms, linear modeling algorithms, k-nearest neighbors (KNN) analysis, ensemble learning algorithms, probabilistic models, graphical models, logistic regression methods (including multinomial logistic regression methods), gradient ascent methods, singular value decomposition methods and principle component analysis.
Following is an overview of some machine learning procedures suitable for the present embodiments.
Support vector machines are algorithms that are based on statistical learning theory. A support vector machine (SVM) according to some embodiments of the present invention can be used for classification purposes and/or for numeric prediction. A support vector machine for classification is referred to herein as “support vector classifier,” support vector machine for numeric prediction is referred to herein as “support vector regression”.
An SVM is typically characterized by a kernel function, the selection of which determines whether the resulting SVM provides classification, regression or other functions. Through application of the kernel function, the SVM maps input vectors into high dimensional feature space, in which a decision hyper-surface (also known as a separator) can be constructed to provide classification, regression or other decision functions. In the simplest case, the surface is a hyper-plane (also known as linear separator), but more complex separators are also contemplated and can be applied using kernel functions. The data points that define the hyper-surface are referred to as support vectors.
The support vector classifier selects a separator where the distance of the separator from the closest data points is as large as possible, thereby separating feature vector points associated with objects in a given class from feature vector points associated with objects outside the class. For support vector regression, a high-dimensional tube with a radius of acceptable error is constructed which minimizes the error of the data set while also maximizing the flatness of the associated curve or function. In other words, the tube is an envelope around the fit curve, defined by a collection of data points nearest the curve or surface.
An advantage of a support vector machine is that once the support vectors have been identified, the remaining observations can be removed from the calculations, thus greatly reducing the computational complexity of the problem. An SVM typically operates in two phases: a training phase and a testing phase. During the training phase, a set of support vectors is generated for use in executing the decision rule. During the testing phase, decisions are made using the decision rule. A support vector algorithm is a method for training an SVM. By execution of the algorithm, a training set of parameters is generated, including the support vectors that characterize the SVM. A representative example of a support vector algorithm suitable for the present embodiments includes, without limitation, sequential minimal optimization.
In KNN analysis, the affinity or closeness of objects is determined. The affinity is also known as distance in a feature space between objects. Based on the determined distances, the objects are clustered and an outlier is detected. Thus, the KNN analysis is a technique to find distance-based outliers based on the distance of an object from its kth-nearest neighbors in the feature space. Specifically, each object is ranked on the basis of its distance to its kth-nearest neighbors. The farthest away object is declared the outlier. In some cases the farthest objects are declared outliers. That is, an object is an outlier with respect to parameters, such as, a k number of neighbors and a specified distance, if no more than k objects are at the specified distance or less from the object. The KNN analysis is a classification technique that uses supervised learning. An item is presented and compared to a training set with two or more classes. The item is assigned to the class that is most common amongst its k-nearest neighbors. That is, compute the distance to all the items in the training set to find the k nearest, and extract the majority class from the k and assign to item.
Association rule algorithm is a technique for extracting meaningful association patterns among features.
The term “association”, in the context of machine learning, refers to any interrelation among features, not just ones that predict a particular class or numeric value. Association includes, but it is not limited to, finding association rules, finding patterns, performing feature evaluation, performing feature subset selection, developing predictive models, and understanding interactions between features.
The term “association rules” refers to elements that co-occur frequently within the datasets. It includes, but is not limited to association patterns, discriminative patterns, frequent patterns, closed patterns, and colossal patterns.
A usual primary step of association rule algorithm is to find a set of items or features that are most frequent among all the observations. Once the list is obtained, rules can be extracted from them.
The aforementioned self-organizing map is an unsupervised learning technique often used for visualization and analysis of high-dimensional data. Typical applications are focused on the visualization of the central dependencies within the data on the map. The map generated by the algorithm can be used to speed up the identification of association rules by other algorithms. The algorithm typically includes a grid of processing units, referred to as “neurons”. Each neuron is associated with a feature vector referred to as observation. The map attempts to represent all the available observations with optimal accuracy using a restricted set of models. At the same time the models become ordered on the grid so that similar models are close to each other and dissimilar models far from each other. This procedure enables the identification as well as the visualization of dependencies or associations between the features in the data.
Feature evaluation algorithms are directed to the ranking of features or to the ranking followed by the selection of features based on their impact.
Information gain is one of the machine learning methods suitable for feature evaluation. The definition of information gain requires the definition of entropy, which is a measure of impurity in a collection of training instances. The reduction in entropy of the target feature that occurs by knowing the values of a certain feature is called information gain. Information gain may be used as a parameter to determine the effectiveness of a feature in explaining the response to the treatment. Symmetrical uncertainty is an algorithm that can be used by a feature selection algorithm, according to some embodiments of the present invention. Symmetrical uncertainty compensates for information gain's bias towards features with more values by normalizing features to a [0,1] range.
Subset selection algorithms rely on a combination of an evaluation algorithm and a search algorithm. Similarly to feature evaluation algorithms, subset selection algorithms rank subsets of features. Unlike feature evaluation algorithms, however, a subset selection algorithm suitable for the present embodiments aims at selecting the subset of features with the highest impact on the metabolite of interest, while accounting for the degree of redundancy between the features included in the subset. The benefits from feature subset selection include facilitating data visualization and understanding, reducing measurement and storage requirements, reducing training and utilization times, and eliminating distracting features to improve classification.
Two basic approaches to subset selection algorithms are the process of adding features to a working subset (forward selection) and deleting from the current subset of features (backward elimination). In machine learning, forward selection is done differently than the statistical procedure with the same name. The feature to be added to the current subset in machine learning is found by evaluating the performance of the current subset augmented by one new feature using cross-validation. In forward selection, subsets are built up by adding each remaining feature in turn to the current subset while evaluating the expected performance of each new subset using cross-validation. The feature that leads to the best performance when added to the current subset is retained and the process continues. The search ends when none of the remaining available features improves the predictive ability of the current subset. This process finds a local optimum set of features.
Backward elimination is implemented in a similar fashion. With backward elimination, the search ends when further reduction in the feature set does not improve the predictive ability of the subset. The present embodiments contemplate search algorithms that search forward, backward or in both directions. Representative examples of search algorithms suitable for the present embodiments include, without limitation, exhaustive search, greedy hill-climbing, random perturbations of subsets, wrapper algorithms, probabilistic race search, schemata search, rank race search, and Bayesian classifier.
A decision tree is a decision support algorithm that forms a logical pathway of steps involved in considering the input to make a decision.
The term “decision tree” refers to any type of tree-based learning algorithms, including, but not limited to, model trees, classification trees, and regression trees.
A decision tree can be used to classify the datasets or their relation hierarchically. The decision tree has tree structure that includes branch nodes and leaf nodes. Each branch node specifies an attribute (splitting attribute) and a test (splitting test) to be carried out on the value of the splitting attribute, and branches out to other nodes for all possible outcomes of the splitting test. The branch node that is the root of the decision tree is called the root node. Each leaf node can represent a classification (e.g., whether a particular input dataset corresponds to a particular metabolite in the subject's blood) or a value (e.g., the predicted quantity of the particular metabolite in the subject's blood). The leaf nodes can also contain additional information about the represented classification such as a confidence score that measures a confidence level in the represented classification (i.e., the likelihood of the classification being accurate). For example, the confidence score can be a continuous value ranging from 0 to 1, in which a score of 0 indicating a very low confidence (e.g., the indication value of the represented classification is very low) and a score of 1 indicating a very high confidence (e.g., the represented classification is almost certainly accurate).
Regression techniques which may be used in accordance with some embodiments the present invention include, but are not limited to linear Regression, Multiple Regression, logistic regression, probit regression, ordinal logistic regression ordinal Probit-Regression, Poisson Regression, negative binomial Regression, multinomial logistic Regression (MLR) and truncated regression.
A logistic regression or logit regression is a type of regression analysis used for predicting the outcome of a categorical dependent variable (a dependent variable that can take on a limited number of values, whose magnitudes are not meaningful but whose ordering of magnitudes may or may not be meaningful) based on one or more predictor variables. Logistic regression may also predict the probability of occurrence for each data point. Logistic regressions also include a multinomial variant. The multinomial logistic regression model is a regression model which generalizes logistic regression by allowing more than two discrete outcomes. That is, it is a model that is used to predict the probabilities of the different possible outcomes of a categorically distributed dependent variable, given a set of independent variables (which may be real-valued, binary-valued, categorical-valued, etc.). For binary-valued variables, a cutoff between the 0 and 1 associations is typically determined using the Yuden Index.
A Bayesian network is a model that represents variables and conditional interdependencies between variables. In a Bayesian network variables are represented as nodes, and nodes may be connected to one another by one or more links. A link indicates a relationship between two nodes. Nodes typically have corresponding conditional probability tables that are used to determine the probability of a state of a node given the state of other nodes to which the node is connected. In some embodiments, a Bayes optimal classifier algorithm is employed to apply the maximum a posteriori hypothesis to a new record in order to predict the probability of its classification, as well as to calculate the probabilities from each of the other hypotheses obtained from a training set and to use these probabilities as weighting factors for future predictions of the subject's blood contents (particularly the metabolites and optionally and preferably their quantity). An algorithm suitable for a search for the best Bayesian network, includes, without limitation, global score metric-based algorithm. In an alternative approach to building the network, Markov blanket can be employed. The Markov blanket isolates a node from being affected by any node outside its boundary, which is composed of the node's parents, its children, and the parents of its children.
Instance-based techniques generate a new model for each instance, instead of basing predictions on trees or networks generated (once) from a training set.
The term “instance”, in the context of machine learning, refers to an example from a dataset.
Instance-based techniques typically store the entire dataset in memory and build a model from a set of records similar to those being tested. This similarity can be evaluated, for example, through nearest-neighbor or locally weighted methods, e.g., using Euclidian distances. Once a set of records is selected, the final model may be built using several different techniques, such as the naive Bayes.
Neural networks are a class of algorithms based on a concept of inter-connected “neurons.” In a typical neural network, neurons contain data values, each of which affects the value of a connected neuron according to connections with pre-defined strengths, and whether the sum of connections to each particular neuron meets a pre-defined threshold. By determining proper connection strengths and threshold values (a process also referred to as training), a neural network can achieve efficient recognition of images and characters. Oftentimes, these neurons are grouped into layers in order to make connections between groups more obvious and to each computation of values. Each layer of the network may have differing numbers of neurons, and these may or may not be related to particular qualities of the input data.
In one implementation, called a fully-connected neural network, each of the neurons in a particular layer is connected to and provides input value to those in the next layer. These input values are then summed and this sum compared to a bias, or threshold. If the value exceeds the threshold for a particular neuron, that neuron then holds a positive value which can be used as input to neurons in the next layer of neurons. This computation continues through the various layers of the neural network, until it reaches a final layer. At this point, the output of the neural network routine can be read from the values in the final layer. Unlike fully-connected neural networks, convolutional neural networks operate by associating an array of values with each neuron, rather than a single value. The transformation of a neuron value for the subsequent layer is generalized from multiplication to convolution.
The machine learning procedure used according to some embodiments of the present invention is a trained machine learning procedure. A machine learning procedure can be trained according to some embodiments of the present invention by feeding a machine learning training program with microbiome data of a cohort of subjects from which the quantities of the metabolite have been determined by blood tests. Once the data are fed, the machine learning training program generates a trained machine learning procedure of a selected type which can then be used without the need to re-train it.
For example, when it is desired to employ decision trees, machine learning training program learns the structure of each tree in a plurality of decision trees (e.g., how many nodes there are in each tree, and how these are connected to one another), and also selects the decision rules for split nodes of each tree. At least a portion of the decision rules relate to one or more microbes in the microbiome. A simple decision rule may be a threshold for the amount of a particular microbes, but more complex rules, relating to more than one microbes are also contemplated. The machine learning training program also accumulates data at the leaves of the trees. The structures of the trees, the decision rules for the split nodes, and the data at the leaves are all selected by the machine learning training program, automatically and typically without user intervention, such that the microbiome data at the root of the trees provide the quantities of the metabolite as determined by blood tests at the leaves of the trees. The final result of the machine learning training program in this case is a set of trees for each metabolite, where the structures, the decision rules for split nodes, and leaf data for each trees are defined by the machine learning training program.
The Examples section that follows describes machine learning training that was used to generate a set of trees for each of a plurality of metabolite, using training data including metabolite quantities and microbiome data collected from a cohort of about 500 subjects.
While the embodiments below are described with a particular emphasis to decision trees, it is to be understood that other types of machine learning procedures can be employed. The skilled person, provided with training data and the description provided herein would know how to train a different type of machine learning procedure to predict the quantity of the metabolite one fed by a plurality of microbes of the microbiome of the subject.
A schematic illustration of the analysis technique according to some embodiments of the present invention is illustrated in
The library is accessed and searched for a trained machine learning procedure associated with the metabolite.
When machine learning procedure 112 includes a set of decision trees, each of the trees receives amounts of microbes, processes these amounts by the split node decision rules that were defined during the training phase, and provides output values in accordance with the data at the leaves that were also defined during the training phase. The output of all trees is optionally and preferably combined (e.g., summed) to provide the quantity of the respective metabolite.
Preferably, the number of trees in the set is at least 1000 or at least 2000 or more. It was found by the inventors that the microbes listed in Table 1 dominate the predicting ability of the decision trees. Thus, in some embodiments of the present invention the number of decision rules relating to microbes listed in Table 1 for the respective metabolite is larger than the number of decision rules relating to other microbes of the microbiome.
According to another aspect of the present invention, there is provided a method of predicting the quantity of a metabolite set forth in Table 1, comprising analyzing the amount of each of the corresponding microbes set forth in Table 1 in the fecal microbiome of the subject, wherein the predicting does not comprise analyzing more than 50 microbes, thereby predicting the quantity of the metabolite in the blood.
Table 1 provides the top five microbes whose abundance should be analyzed in order to predict the quantity of that metabolite.
It will be appreciated that in some cases, additional microbes may be analyzed for each metabolite such that a level of confidence is reached such that the outputted quantities are of clinical relevance e.g. a confidence level of at least 90% and more preferably at least 95%.
As well as using microbial levels to predict the quantity of a blood metabolite, the present inventors further propose using dietary data of the subjects as a proxy for predicting the quantity of a blood metabolite.
Thus, according to another aspect of the present invention there is provided a method of predicting the quantity of a metabolite in the blood of a subject that consumes a diet of a plurality of food types, the method comprising analyzing the frequency of consumption of at least 5 of said food types over at least one month and/or the daily mean consumption of at least 5 of said food types, wherein said frequency and/or said daily mean consumption is predicative, within a confidence level of at least 95% in the significance of the predictions, of the quantity of the metabolite in the blood of the subject consuming said diet.
It will be appreciated that for this aspect of the present invention, the level of a particular metabolite can be predicted in a subject so long as he/she has not significantly changed his/her dietary habits at the time of prediction.
The term “food type” as used herein refers to either a general classification of a food or a particular food product.
In some embodiments of the present invention the food is a food product (e.g., a specific food product marketed as such by a specific manufacturer, or by two or more manufacturers manufacturing the same food product). In some embodiments of the present invention the food is a food type (e.g., a food which exhibit different modifications, for example, white rice, that may have different species, all of which are referred to as “white rice”, or whole wheat bread that may be backed from various mixtures, etc). In some embodiments of the present invention the food is a family of food types. The family can be categorized according to the main ingredient of the food type, for example, sweets, dairies, fruits, herbs, vegetables, fish, meet, etc. In some embodiments of the present invention the family of food types is a food group, such as, but not limited to, carbohydrates, which is a family encompassing food types rich in carbohydrates, proteins, which is a family encompassing food types rich in protein, and fats, which is a family encompassing food types rich in fats, minerals which is a family encompassing food types rich in minerals, vitamins which is a family encompassing food types rich in vitamins, etc. In some embodiments of the present invention the food is a food combination which comprises a plurality of different food products, and/or different food types and/or different food families. Such a combination is referred to as “a complex meal.” The complex meal can be provided as a list of the food products, food types and/or families of food types that form the combination. The list may or may not include the particular amount of each food product, food type and/or family of food types in the combination.
Depending on the particular metabolite being predicted, only the long-term consumption (e.g. over the period of one month) of a particular food type is measured. In another embodiment, only the average daily consumption of a particular food type is measured for predicting the amount of particular metabolites. In other embodiments both the long-term consumption and the average daily consumption is measured.
The information about the subject's food consumption may be obtained by providing the subject with a food questionnaire. The questionnaire may be tailored according to the particular metabolite (or metabolites) which are being investigated. In a particular embodiment, a full survey is obtained from the subject in which the subject is asked to divulge a complete set of food intake per month/per day.
Irrespective of the level of detail the subject is asked to provide with respect to his/her food intake, at least 5 food types are used to predict the level of metabolite. In a particular embodiment, at least 10 food types are used to predict the level of metabolite, at least 15 food types are used to predict the level of metabolite, at least 20 food types are used to predict the level of metabolite, at least 25 food types are used to predict the level of metabolite, at least 30 food types are used to predict the level of metabolite, at least 4 food types are used to predict the level of metabolite, at least 50 food types are used to predict the level of metabolite, or even more than 50 food types are used to predict the level of metabolite. In one embodiment, no more than 50, 60, 70, 80, 90 or 100 food types are used to predict the quantity of a particular metabolite.
The number of food types that are used in the prediction are also dependent on the level of confidence required in the prediction. According to a particular embodiment, the level of confidence is such that the predicted level is clinically relevant. In one embodiment, the prediction is within a confidence level of at least 90%. In another embodiment, the prediction is within a confidence level of at least 95%.
Table 3 herein below, provides exemplary food types that can used to predict particular metabolites.
Clostridium sp
Firmicutes
bacterium
Clostridium sp
Firmicutes
bacterium
Clostridium sp
Clostridium sp
Firmicutes
bacterium
Clostridium sp
Ruthenibacterium
lactatiformans
Roseburia
Faecalibacterium sp
intestinalis
Ruthenibacterium
Clostridium sp
Firmicutes
lactatiformans
Clostridium sp
Firmicutes
Eggerthella sp
Eubacterium
hallii
Dorea
longicatena
Eggerthella sp
Eubacterium
hallii
Dorea
longicatena
Eubacterium
hallii
Firmicutes
Intestinibacter
bartlettii
Faecalibacterium
Firmicutes
Intestinibacter
bacterium
bartlettii
Bacteroides
thetaiotaomicron
Eubacterium
hallii
Eggerthella sp
Gemmiger
Eubacterium
hallii
Faecalibacterium
Phascolarctobacterium
Eubacterium
Eubacterium
hallii
Faecalibacterium
Firmicutes
bacterium
Allisonella
Bifidobacterium
Firmicutes
histaminiformans
longum
bacterium
Firmicutes
Ruminococcus
Ruminococcus
gnavus
Clostridium sp
Clostridium
Firmicutes
Oscillibacter
Firmicutes
bacterium
Firmicutes
bacterium
Bacteroides
Bacteroides
xylanisolvens
salyersiae
Ruminococcus
Roseburia
torques
inulinivorans
Eubacterium
hallii
Roseburia
inulinivorans
Eubacterium
hallii
Eubacterium
Ruminococcus
Roseburia
Anaerostipes
torques
inulinivorans
hadrus
Ruthenibacterium
Roseburia
lactatiformans
intestinalis
Eubacterium
hallii
Firmicutes
Dorea sp
bacterium
Intestinibacter
bartlettii
Ruminococcus
Firmicutes
bacterium
Eubacterium
Collinsella sp
Blautia sp
Eubacterium
Clostridium sp
Roseburia
Clostridium
faecis
Coprococcus
Eubacterium
comes
Dorea
Clostridium sp
Firmicutes
longicatena
bacterium
Eggerthella sp
Firmicutes
bacterium
Ruminococcus
Roseburia
torques
intestinalis
Eubacterium
Firmicutes
hallii
bacterium
Faecalibacterium
Clostridium sp
Clostridium sp
Clostridium sp
Coprococcus
comes
Firmicutes
Dorea
bacterium
longicatena
Blautia sp
Eubacterium
Clostridium sp
Firmicutes
Dorea
bacterium
formicigenerans
Eggerthella sp
Blautia sp
Faecalibacterium
Blautia sp
prausnitzii
Dorea
Clostridium
longicatena
Firmicutes
Dorea
bacterium
longicatena
Eubacterium
Ruminococcus
Firmicutes
bacterium
Clostridium sp
Firmicutes
bacterium
Blautia sp
Blautia sp
Clostridium
Firmicutes
bacterium
Intestinibacter
Intestinibacter
bartlettii
Intestinibacter
Alistipes
bartlettii
indistinctus
Ruminococcus
Ruminococcus
torques
torques
Clostridium sp
Blautia sp
Blautia sp
Eggerthella sp
Eubacterium
Clostridium sp
Clostridium sp
Megamonas
funiformis
Roseburia sp
Firmicutes
bacterium
Oscillibacter
Blautia sp
Eubacterium
Faecalibacterium
Firmicutes
bacterium
Clostridium sp
Faecalibacterium
Alistipes
shahii
Bifidobacterium
Roseburia
Anaerostipes
adolescentis
inulinivorans
hadrus
Firmicutes
Dorea
formicigenerans
Blautia sp
Clostridium
Clostridium
Eubacterium
Flavonifractor
Clostridium
rectale
plautii
Oscillibacter
Gemmiger
Dorea
longicatena
Eubacterium
Streptococcus
thermophilus
Eubacterium
Ruminococcus
Roseburia
torques
Bifidobacterium
adolescentis
Anaerostipes
Acidaminococcus
Dorea
hadrus
intestini
longicatena
Eubacterium
Oscillibacter
Gemmiger
hallii
Roseburia
inulinivorans
Blautia sp
Flavonifractor
Clostridium
plautii
Clostridium sp
Firmicutes
Dorea
bacterium
longicatena
Bifidobacterium
Acidaminococcus
Roseburia
adolescentis
intestini
inulinivorans
Gordonibacter
Bacteroides
pamelaeae
ovatus
Coprococcus
Dorea
comes
longicatena
Oscillibacter
Bacteroides
plebeius CAG
Eubacterium
Streptococcus
thermophilus
Clostridium sp
Dorea
Firmicutes
formicigenerans
bacterium
Firmicutes
Bifidobacterium
longum
Clostridium sp
Roseburia
inulinivorans
Clostridium sp
Clostridium sp
Escherichia
coli
Eubacterium
Eubacterium
hallii
Blautia
Roseburia
Streptococcus
Clostridium
inulinivorans
thermophilus
Coprococcus
Firmicutes
comes
bacterium
Clostridium sp
Blautia sp
Blautia sp
Faecalibacterium
Roseburia
inulinivorans
Bifidobacterium
Anaerostipes
adolescentis
hadrus
Oscillibacter
Clostridium sp
Ruminococcus
torques
Clostridium sp
Flavonifractor
Dorea
plautii
formicigenerans
Streptococcus
Ruthenibacterium
thermophilus
lactatiformans
Roseburia
Bifidobacterium
inulinivorans
longum
Eubacterium
Eggerthella
hallii
Gemmiger
Clostridium sp
Acidaminococcus
intestini
Coprococcus
Firmicutes
comes
bacterium
Coprococcus
Dorea
comes
longicatena
Dorea
Clostridium
formicigenerans
Eubacterium
hallii
Alistipes
shahii
Eubacterium
Eubacterium
Roseburia
hallii
Bifidobacterium
Clostridium sp
adolescentis
Ruthenibacterium
Anaerostipes
Dialister sp
lactatiformans
hadrus
Flavonifractor
plautii
Eubacterium
Eubacterium
Sutterella
wadsworthensis
Roseburia
Eggerthella
Eubacterium
Clostridium sp
Clostridium sp
Streptococcus
Clostridium sp
Eubacterium
thermophilus
Clostridium sp
Blautia
Bacteroides
caccae
Clostridium sp
Clostridium sp
Clostridium
Roseburia
Roseburia
Bacteroides
inulinivorans
intestinalis
vulgatus
Dorea
Blautia sp
Streptococcus
longicatena
thermophilus
Eubacterium
Blautia sp
Blautia sp
Akkermansia
Anaerostipes
Clostridium
muciniphila
hadrus
Clostridium sp
Phascolarctobacterium
Ruminococcus
Acidaminococcus
torques
intestini
Clostridium
Ruminococcus
Roseburia
spiroforme
inulinivorans
Faecalibacterium
prausnitzii
Clostridium sp
Dorea
Blautia sp
longicatena
Allisonella
Dorea
Bacteroides
histaminiformans
longicatena
ovatus
Holdemanella
Holdemanella
Ruminococcus
biformis
Clostridium sp
Blautia sp
Collinsella
Burkholderiales
Blautia
Blautia sp
bacterium
Ruminococcus
Clostridium sp
torques
Alistipes
Eubacterium
Firmicutes
putredinis
siraeum
bacterium
Clostridium sp
Streptococcus
thermophilus
Blautia
Blautia sp
Bifidobacterium
adolescentis
Bifidobacterium
Dorea
adolescentis
longicatena
Clostridium sp
Ruthenibacterium
lactatiformans
Roseburia
Butyricicoccus
inulinivorans
Bacteroides
Holdemanella
uniformis
Blautia sp
Ruminiclostridium
Bacteroides
Gemmiger
Clostridium
finegoldii
Anaerostipes
Dorea
Bifidobacterium
hadrus
longicatena
adolescentis
Roseburia
Streptococcus
hominis
thermophilus
Anaerostipes
Bifidobacterium
hadrus
adolescentis
Clostridium sp
Clostridium sp
Roseburia
Bacteroides
Ruminococcus
inulinivorans
vulgatus
torques
Flavonifractor
Roseburia sp
Blautia
plautii
obeum
Eubacterium
rectale
Streptococcus
Faecalibacterium
thermophilus
Firmicutes
Dorea
Clostridium
bacterium
longicatena
Eubacterium
eligens
Coprococcus
Dorea
comes
longicatena
Eggerthella sp
Bacteroides
ovatus
Flavonifractor
Oscillibacter sp
Firmicutes
Clostridium
Ruthenibacterium
lactatiformans
Clostridium
Alistipes
Clostridium sp
finegoldii
Clostridium
Flavonifractor
Lachnospira
plautii
pectinoschiza
Firmicutes
Intestinibacter
bacterium
bartlettii
Dorea
Roseburia
Acidaminococcus
longicatena
inulinivorans
intestini
Clostridium sp
Clostridium sp
Roseburia sp
Clostridium sp
Bacteroides
Escherichia
massiliensis
coli
Clostridium sp
Faecalibacterium
Bifidobacterium
Blautia sp
prausnitzii
Clostridium
Ruminococcus
Blautia sp
Blautia
torques
obeum
Eubacterium
Clostridium
Eubacterium
Eggerthella sp
Eubacterium
Oscillibacter
Eubacterium
hallii
Clostridium sp
Anaeromassili
Alistipes
bacillus sp
finegoldii
Eubacterium
Dorea
Dorea
longicatena
longicatena
Coprococcus
Firmicutes
comes
bacterium
Acidaminococcus
Firmicutes
intestini
Bifidobacterium
Anaerostipes
adolescentis
hadrus
Streptococcus
Clostridium
Coprococcus
thermophilus
comes
Roseburia
Faecalibacterium
inulinivorans
Oscillibacter
Allisonella
Collinsella
histaminiformans
Bifidobacterium
pseudocatenulatum
Butyricimonas
Faecalibacterium
synergistica
prausnitzii
Flavonifractor
plautii
Roseburia
Ruminococcus
inulinivorans
torques
Blautia
Collinsella
Faecalibacterium
prausnitzii
Clostridium sp
Intestinibacter
Ruminococcus
gnavus
Eubacterium
Bilophila sp 4
Clostridium sp
Eubacterium
Clostridium sp
Bacteroides
ovatus
Sutterella
Eubacterium
wadsworthensis
rectale
Coprococcus
Firmicutes
comes
bacterium
Phascolarctobacterium
Gemmiger
formicilis
Bacteroides
Sutterella
intestinalis
wadsworthensis
Intestinibacter
Alistipes
bartlettii
finegoldii
Bacteroides
vulgatus
Flavonifractor
Bacteroides
Faecalibacterium
plautii
vulgatus
prausnitzii
Oscillibacter
Akkermansia
muciniphila
Streptococcus
Faecalibacterium
Clostridium
thermophilus
prausnitzii
Acidaminococcus
Anaerostipes
intestini
hadrus
Lachnospira
Anaerostipes
pectinoschiza
hadrus
Clostridium sp
Eubacterium
hallii
Clostridium sp
Roseburia
Odoribacter
Clostridium
intestinalis
splanchnicus
Clostridium
Faecalibacterium
Clostridium
prausnitzii
Ruminococcus
Veillonella
torques
parvula
Subdoligranulum
Clostridium
Clostridium sp
Faecalibacterium
Clostridium sp
Clostridium sp
Roseburia
Eubacterium
Clostridium sp
Eubacterium
hallii
hallii
Coprococcus
Dorea
Eubacterium
comes
longicatena
rectale
Bifidobacterium
Clostridium sp
Parabacteroides
longum
distasonis
Bifidobacterium
Dorea
Faecalibacterium
adolescentis
longicatena
prausnitzii
Bacteroides
Faecalibacterium
Butyricimonas
plebeius CAG
synergistica
Allisonella
Clostridium sp
Coprococcus
histaminiformans
comes
Streptococcus
Coprococcus
thermophilus
comes
Faecalibacterium
Roseburia
prausnitzii
inulinivorans
Lachnospira
Parabacteroides
Akkermansia
pectinoschiza
merdae
muciniphila
Eubacterium
hallii
Ruminococcus
Eubacterium
rectale
Eggerthella sp
Bacteroides
ovatus
Intestinibacter
Streptococcus
Butyricimonas
bartlettii
thermophilus
synergistica
Faecalibacterium
Azospirillum
Bifidobacterium
Blautia sp
pseudocatenulatum
Firmicutes
Faecalibacterium
Roseburia
bacterium
intestinalis
Blautia sp
Clostridium sp
Phascolarctobacterium
Clostridium sp
Allisonella
histaminiformans
Eubacterium
Eggerthella sp
hallii
Bifidobacterium
Ruthenibacterium
Streptococcus
adolescentis
lactatiformans
thermophilus
Streptococcus
Roseburia
Clostridium
thermophilus
inulinivorans
Coprococcus
Bifidobacterium
comes
adolescentis
Alistipes
Dorea
Eubacterium
finegoldii
formicigenerans
rectale
Blautia sp
Megamonas
funiformis
Eubacterium
Streptococcus
eligens
parasanguinis
Eubacterium
Clostridium sp
Firmicutes
Clostridium
bacterium
Butyricicoccus
Butyricicoccus
Clostridium
bolteae
Clostridium sp
Clostridium
Faecalibacterium
Roseburia
Streptococcus
Clostridium
inulinivorans
thermophilus
Clostridium sp
Clostridium
Roseburia sp
Clostridium sp
Eubacterium
rectale
Alistipes
Streptococcus
finegoldii
thermophilus
Faecalibacterium
Dorea sp CAG
Firmicutes
bacterium
Streptococcus
Dorea
parasanguinis
formicigenerans
Coprococcus
catus
Roseburia sp
Clostridium
Roseburia
Firmicutes
Blautia sp
inulinivorans
Clostridium sp
Firmicutes
bacterium
Alistipes
Dorea
finegoldii
longicatena
Clostridium sp
Roseburia
Faecalibacterium
Streptococcus
inulinivorans
thermophilus
Dorea
Clostridiales
Streptococcus
longicatena
bacterium
salivarius
Clostridium sp
Flavonifractor
plautii
Faecalibacterium
Ruminococcus
Bilophila
prausnitzii
Roseburia
Clostridium
hominis
leptum
Roseburia
Faecalibacterium
Clostridium
inulinivorans
Bifidobacterium
Ruminococcus
Eubacterium
bifidum
Faecalibacterium
Azospirillum
Streptococcus
thermophilus
Blautia
Eubacterium
obeum
Collinsella
Ruminococcus
Clostridium sp
Dorea
Clostridium sp
Coprococcus
longicatena
comes
Eubacterium
Blautia
obeum
Clostridium
Bifidobacterium
Firmicutes
adolescentis
Clostridium
Anaerostipes
Clostridium sp
Clostridium sp
hadrus
Bifidobacterium
catenulatum
Blautia sp
Oscillibacter
Firmicutes
bacterium
Streptococcus
Faecalibacterium
Clostridium sp
thermophilus
Faecalibacterium
Clostridium sp
Eubacterium
Faecalibacterium
Intestinimonas
Ruminococcus
prausnitzii
butyriciproducens
Clostridium sp
Clostridium sp
Bifidobacterium
catenulatum
Clostridium
Butyricimonas
Eubacterium
synergistica
hallii
Akkermansia
Roseburia
muciniphila
hominis
Clostridium sp
Coprococcus
catus
Roseburia
Faecalibacterium
Eubacterium
inulinivorans
prausnitzii
rectale
Ruthenibacterium
lactatiformans
Blautia sp
Roseburia
Clostridium
inulinivorans
Eubacterium
Bacteroides
Parabacteroides
ovatus
merdae
Blautia
Dorea
Dorea sp
longicatena
Allisonella
Bifidobacterium
Clostridium
histaminiformans
longum
Ruminococcus
Bacteroides
Clostridium sp
Eggerthella sp
Eubacterium
Firmicutes
hallii
bacterium
Bifidobacterium
Parabacteroides
bifidum
distasonis
Clostridium sp
Blautia sp
Dorea
Eubacterium
Dorea
formicigenerans
rectale
longicatena
Firmicutes
Clostridium
Anaerostipes
Parabacteroides
Roseburia
hadrus
distasonis
Clostridium
Blautia sp
Firmicutes
Coprococcus
Eubacterium
comes
rectale
Clostridium sp
Ruminococcus
Bacteroides
Firmicutes
torques
ovatus
bacterium
Oscillibacter
Gemmiger
Eggerthella
Roseburia
Firmicutes
inulinivorans
bacterium
Coprococcus
Clostridium sp
Faecalibacterium
comes
prausnitzii
Bacteroides
Roseburia sp
Bilophila
massiliensis
Eggerthella sp
Firmicutes
Streptococcus
bacterium
thermophilus
Clostridium sp
Clostridium sp
Ruminococcus
Clostridium sp
torques
Blautia sp
Roseburia
inulinivorans
Clostridium sp
Eubacterium
Clostridium
Faecalibacterium
Roseburia
Eubacterium
inulinivorans
siraeum
Blautia sp
Eubacterium
Coprococcus
ramulus
catus
Candidatus
Bacteroides
Lachnospiraceae
Gastranaerophilales
pectinophilus
bacterium
Streptococcus
Intestinibacter
Faecalibacterium
thermophilus
bartlettii
prausnitzii
Akkermansia
Coprococcus
muciniphila
comes
Butyricimonas
Clostridium sp
Eubacterium
synergistica
Eubacterium
rectale
Bacteroides
Fusicatenibacter
Dorea
saccharivorans
formicigenerans
Akkermansia
Ruminococcus
muciniphila
torques
Dorea
Alistipes
Eubacterium
longicatena
putredinis
eligens
Oscillibacter
Clostridium
Faecalibacterium
Coprococcus
Clostridium sp
Phascolarctobacterium
comes
Coprococcus
Eubacterium
comes
Coprococcus
Bacteroides
comes
finegoldii
Faecalibacterium
Clostridium sp
Clostridium sp
Eubacterium
Anaerostipes
hadrus
Faecalibacterium
Eubacterium
rectale
Dorea
Eubacterium
Blautia sp
longicatena
eligens
Bacteroides
Firmicutes
Roseburia
dorei
bacterium
Lachnospira
Eubacterium
pectinoschiza
rectale
Bifidobacterium
Clostridium sp
Desulfovibrio
longum
piger
Clostridium
Dorea
Lactococcus
longicatena
lactis
Firmicutes
Oscillibacter
Alistipes
bacterium
finegoldii
Oscillibacter
Flavonifractor
plautii
Faecalibacterium
Eubacterium
prausnitzii
hallii
Lactococcus
Clostridium sp
Streptococcus
lactis
thermophilus
Coprococcus
Firmicutes
comes
bacterium
Roseburia
Butyricicoccus
inulinivorans
Clostridium sp
Clostridium sp
Clostridium sp
Ruminococcus
Bifidobacterium
Butyricimonas
catenulatum
synergistica
Ruminococcus
Firmicutes
bacterium
Eggerthella sp
Bacteroides
ovatus
Eubacterium
Blautia sp
Clostridium
Eubacterium
Clostridiales
Blautia sp
bacterium
Bacteroides
Adlercreutzia
Phascolarctobacterium
uniformis
Roseburia
Firmicutes
Dorea
inulinivorans
longicatena
Bifidobacterium
catenulatum
Oscillibacter
Dorea
longicatena
Mycoplasma
Blautia
Clostridium sp
Roseburia
Firmicutes
intestinalis
bacterium
Oscillibacter
Blautia
Collinsella
obeum
Roseburia
Streptococcus
hominis
thermophilus
Butyricicoccus
Eubacterium
Eubacterium
Dorea
longicatena
Butyricicoccus
Oscillibacter
Clostridium sp
Clostridium sp
Clostridium sp
Clostridium
Dorea
Ruminococcus
longicatena
torques
Eubacterium
Eubacterium
hallii
Coprococcus
comes
Bacteroides
Coprococcus
massiliensis
comes
Anaerostipes
Phascolarctobacterium
hadrus
Faecalibacterium
Odoribacter
Coprococcus
splanchnicus
catus
Bilophila sp 4
Firmicutes
Faecalibacterium
bacterium
prausnitzii
Coprococcus
Bacteroides
comes
massiliensis
Eubacterium
Eubacterium
Ruminococcus
rectale
Clostridium sp
Bifidobacterium
Clostridium
pseudocatenulatum
Ruminiclostridium
Paraprevotella
xylaniphila
Roseburia
Ruminococcus
Gemmiger
inulinivorans
torques
Clostridium sp
Butyricicoccus
Roseburia
inulinivorans
Butyricimonas
Bifidobacterium
synergistica
catenulatum
Blautia sp
Bacteroides
Clostridium sp
plebeius CAG
Eubacterium
Akkermansia
Clostridium
rectale
muciniphila
Allisonella
Anaeromassili
Bacteroides
histaminiformans
bacillus sp
dorei
Bilophila
Parabacteroides
Bifidobacterium
Clostridium
distasonis
adolescentis
Megamonas
Eubacterium
funiformis
Clostridium sp
Roseburia sp
Streptococcus
thermophilus
Roseburia
Bifidobacterium
inulinivorans
longum
Eggerthella sp
Firmicutes
bacterium
Blautia sp
Clostridium sp
Lachnospira
Roseburia
pectinoschiza
inulinivorans
Paraprevotella
Dialister sp
clara
Clostridium sp
Firmicutes
Faecalibacterium
Clostridium sp
Roseburia
intestinalis
Gemmiger
Anaerostipes
Clostridium sp
Clostridium
hadrus
Flavonifractor
Intestinibacter
bartlettii
Bacteroides
Flavonifractor
plautii
Ruminococcus
Clostridium sp
Bifidobacterium
torques
adolescentis
Clostridium
Clostridium sp
Bacteroides
Bifidobacterium
ovatus
longum
Clostridium
Eubacterium
siraeum
Blautia
Roseburia
Eubacterium
Clostridium
Firmicutes
Akkermansia
muciniphila
Bifidobacterium
Faecalibacterium
Clostridium
animalis
prausnitzii
Ruminococcus
Streptococcus
Clostridium
thermophilus
Roseburia
Faecalibacterium
inulinivorans
Clostridium sp
Blautia sp
Anaerostipes
hadrus
Bacteroides
Parabacteroides
finegoldii
distasonis
Roseburia sp
Faecalibacterium
prausnitzii
Firmicutes
Oscillibacter
bacterium
Faecalibacterium
Streptococcus
Clostridium sp
thermophilus
Haemophilus
Mycoplasma
Eubacterium
parainfluenzae
hallii
Streptococcus
Oscillibacter
Clostridium
thermophilus
Flavonifractor
Clostridium sp
Bacteroides
plautii
vulgatus
Firmicutes
Clostridium sp
Oscillibacter
bacterium
Roseburia
Firmicutes
Odoribacter
hominis
splanchnicus
Dorea
Faecalibacterium
longicatena
prausnitzii
Clostridium sp
Blautia sp
Bacteroides
Butyricimonas
Parabacteroides
finegoldii
synergistica
johnsonii
Eubacterium
Bifidobacterium
ventriosum
catenulatum
Blautia
Clostridium sp
Allisonella
Faecalibacterium
Clostridium
histaminiformans
Clostridium sp
Clostridium sp
Odoribacter
splanchnicus
Clostridium sp
Odoribacter
Streptococcus
splanchnicus
thermophilus
Collinsella
Eubacterium
Bifidobacterium
Ruminococcus sp
bifidum
Clostridium
Eubacterium
Phascolarctobacterium
Sutterella
wadsworthensis
Roseburia
Eubacterium
hallii
Clostridium sp
Eubacterium
Blautia
ventriosum
Coprococcus
Oscillibacter sp
comes
Bifidobacterium
Firmicutes
Bifidobacterium
adolescentis
pseudocatenulatum
Eggerthella sp
Eubacterium
Bacteroides
hallii
caccae
Faecalibacterium
Faecalibacterium
Flavonifractor
prausnitzii
plautii
Akkermansia
Clostridium
muciniphila
Oscillibacter
Blautia sp
Streptococcus
Faecalibacterium
Blautia sp
salivarius
prausnitzii
Blautia
Bacteroides
Collinsella
stercoris
Roseburia sp
Bacteroides
Faecalibacterium
massiliensis
Ruminococcus
Odoribacter
callidus
splanchnicus
Bacteroides
Dialister sp
Eubacterium
Coprococcus
catus
Blautia sp
Intestinibacter
Faecalibacterium
Faecalibacterium
prausnitzii
prausnitzii
Roseburia sp
Coprococcus
Bacteroides
comes
massiliensis
Eubacterium
Eubacterium
Clostridium sp
Eubacterium
Roseburia sp
Eubacterium
eligens
ventriosum
Firmicutes
Prevotella
Streptococcus
Bifidobacterium
copri
thermophilus
animalis
Blautia sp
Anaerostipes
Bacteroides
hadrus
xylanisolvens
Anaeromassili
Allisonella
Eubacterium
bacillus sp
histaminiformans
eligens
Clostridium sp
Anaeromassili
Bifidobacterium
bacillus sp
pseudocatenulatum
Roseburia
Butyricicoccus
Ruthenibacterium
inulinivorans
lactatiformans
Oscillibacter
Butyricicoccus
Firmicutes
bacterium
Butyricimonas
Firmicutes
Eubacterium
bacterium
Bifidobacterium
Bifidobacterium
Bifidobacterium
adolescentis
bifidum
longum
Faecalibacterium
Blautia sp
Sutterella
prausnitzii
wadsworthensis
Firmicutes
Akkermansia
Candidatus
Faecalibacterium
muciniphila
Gastranaerophi-
prausnitzii
lalesbacterium
Dorea
Anaerostipes
longicatena
hadrus
Anaeromassili
Clostridium
bacillus sp
Blautia sp
Clostridium
Ruminococcus
Butyricimonas
Allisonella
Anaerostipes
histaminiformans
hadrus
Bacteroides
Bacteroides
Parabacteroides
finegoldii
stercoris
distasonis
Firmicutes
Coprococcus
Faecalibacterium
comes
Clostridium
Clostridium sp
Blautia sp
Coprococcus
Bacteroides
comes
massiliensis
Faecalibacterium
prausnitzii
Clostridium sp
Clostridium sp
Eubacterium
Escherichia
Gemmiger
hallii
coli
Anaerotruncus
Faecalibacterium
colihominis
Clostridium sp
Blautia sp
Bifidobacterium
bifidum
Lactobacillus
Clostridium
acidophilus
Eubacterium
Blautia sp
Clostridium
Akkermansia
Eubacterium
muciniphila
Clostridium sp
Blautia sp
Collinsella
Clostridium sp
Dorea
Clostridium
longicatena
Eubacterium
Eisenbergiella
siraeum
tayi
Adlercreutzia
Blautia sp
Bacteroides
Bifidobacterium
Clostridium
thetaiotaomicron
adolescentis
Clostridium sp
Clostridium
Roseburia
inulinivorans
Alistipes
putredinis
Bacteroides
dorei
Oscillibacter
Blautia sp
Parabacteroides
Clostridium
distasonis
leptum
Firmicutes
Firmicutes
bacterium
bacterium
Blautia sp
Anaerostipes
Ruminococcus
hadrus
torques
Clostridium sp
Lactobacillus
acidophilus
Gordonibacter
Bacteroides
pamelaeae
clarus
Oscillibacter
Oscillibacter
Firmicutes
bacterium
Oscillibacter
Firmicutes
Ruthenibacterium
Firmicutes
lactatiformans
Oscillibacter
Firmicutes
bacterium
Ruthenibacterium
lactatiformans
Roseburia
Eubacterium
rectale
Gemmiger
Eggerthella sp
Bacteroides
thetaiotaomicron
Eubacterium
Eubacterium
hallii
Eggerthella sp
Clostridium
Eggerthella sp
Butyricicoccus
Clostridium
Faecalibacterium
Firmicutes
prausnitzii
bacterium
Firmicutes
Clostridium
bacterium
Gemmiger
Eubacterium
formicilis
Bacteroides
Eubacterium
thetaiotaomicron
Dorea
Eubacterium
longicatena
hallii
Blautia sp
Gemmiger
Bacteroides
thetaiotaomicron
Catenibacterium
Clostridium
Ruthenibacterium
lactatiformans
Clostridiales
Alistipes
bacterium
finegoldii
Firmicutes
bacterium
Alistipes
Gordonibacter
putredinis
pamelaeae
Clostridium sp
Butyrivibrio
crossotus
Roseburia sp
Collinsella
Bacteroides
Ruminococcus
Anaerostipes
torques
hadrus
Butyricicoccus
Faecalibacterium
Gemmiger
Eubacterium
Eubacterium
Gemmiger
hallii
Ruminococcus
torques
Oscillibacter
Faecalibacterium
Gemmiger
Coprococcus
eutactus
Alistipes
indistinctus
Ruminococcus
Clostridium sp
Roseburia
Oscillibacter
Eubacterium
hallii
Akkermansia
Ruthenibacterium
muciniphila
lactatiformans
Clostridium sp
Ruminococcus
gnavus
Eubacterium
Eggerthella
hallii
Desulfovibrio
piger
Collinsella
Bacteroides
xylanisolvens
Roseburia
inulinivorans
Eubacterium
hallii
Firmicutes
Sutterella
wadsworthensis
Eubacterium
Eubacterium
rectale
Eubacterium
Dorea
Firmicutes
formicigenerans
Clostridium sp
Flavonifractor
Eggerthella sp
Megamonas
funiformis
Sutterella
wadsworthensis
Clostridium
Dorea
spiroforme
longicatena
Bifidobacterium
adolescentis
Clostridium sp
Eubacterium
hallii
Bacteroides
Roseburia
pectinophilus
Clostridium sp
Anaerostipes
hadrus
Clostridium
leptum
Gordonibacter
Gemmiger
pamelaeae
Clostridium
Firmicutes
Intestinibacter
bacterium
bartlettii
Ruthenibacterium
Firmicutes
lactatiformans
bacterium
Clostridium sp
Dorea
longicatena
Dorea
Clostridium
longicatena
Blautia sp
Roseburia
faecis
Ruminococcus
Akkermansia
muciniphila
Roseburia
inulinivorans
Clostridium sp
Bacteroides
thetaiotaomicron
Azospirillum
Faecalibacterium
prausnitzii
Faecalibacterium
Eubacterium
Eubacterium
hallii
Clostridium sp
Dorea
longicatena
Bacteroides
vulgatus
Firmicutes
Intestinibacter
bacterium
bartlettii
Clostridium
Dorea
Ruthenibacterium
longicatena
lactatiformans
Faecalibacterium
Faecalibacterium
prausnitzii
Collinsella
Clostridium
aerofaciens
Eubacterium
hallii
Ruminococcus
torques
Blautia
Blautia sp
Coprobacter
Flavonifractor
fastidiosus
plautii
Faecalibacterium
Alistipes
prausnitzii
shahii
Bifidobacterium
Akkermansia
adolescentis
muciniphila
Clostridium sp
Flavonifractor
plautii
Blautia sp
Ruminococcus
torques
Faecalibacterium
Acidaminococcus
prausnitzii
intestini
Dorea
longicatena
Eubacterium
Roseburia
Alistipes
faecis
finegoldii
Paraprevotella
Intestinibacter
clara
bartlettii
Intestinibacter
Lachnospira
pectinoschiza
Megamonas
Alistipes
funiformis
shahii
Clostridium
Coprococcus
Streptococcus
salivarius
Ruminococcus
torques
Oscillibacter
Bacteroides
Clostridium
intestinalis
Faecalibacterium
Ruminococcus
lactaris
Butyricicoccus
Faecalibacterium
Dorea
longicatena
Acidaminococcus
Dorea
intestini
longicatena
Clostridium sp
Streptococcus
thermophilus
Clostridium sp
Clostridium
Clostridium sp
Flavonifractor
Bilophila
plautii
Faecalibacterium
Eubacterium
Oscillibacter
hallii
Ruthenibacterium
lactatiformans
Roseburia
Clostridium
faecis
Alistipes
Clostridium
finegoldii
Alistipes
Prevotella
finegoldii
copri
Firmicutes
Eggerthella
Faecalibacterium sp
Gordonibacter
pamelaeae
Blautia sp
Eubacterium
Eubacterium
eligens
Parabacteroides
Dorea
merdae
longicatena
Dorea
formicigenerans
Blautia sp
Eubacterium
ramulus
Clostridium
Escherichia
coli
Roseburia
faecis
Clostridium sp
Bacteroides
massiliensis
Clostridium sp
Dorea
formicigenerans
Oscillibacter
Ruminococcus
torques
Bacteroides
Bacteroides
vulgatus
dorei
Clostridium sp
Flavonifractor
Dorea
Ruminococcus
longicatena
Clostridium sp
Intestinibacter
bartlettii
Blautia sp
Eubacterium
eligens
Faecalibacterium
Blautia sp
prausnitzii
Firmicutes
bacterium
Parabacteroides
Blautia sp
distasonis
Faecalibacterium
Roseburia
prausnitzii
Faecalibacterium
Roseburia
Ruthenibacterium
lactatiformans
Bifidobacterium
Clostridium
bifidum
Enterobacter
Acidaminococcus
cloacae
intestini
Coprococcus
catus
Butyricicoccus
Firmicutes
Blautia sp
Catenibacterium
Bifidobacterium
Eubacterium
adolescentis
Faecalibacterium
prausnitzii
Roseburia
Roseburia
intestinalis
inulinivorans
Akkermansia
Clostridium
muciniphila
leptum
Roseburia
Blautia
intestinalis
Roseburia
Roseburia
intestinalis
Blautia sp
Collinsella
Clostridium sp
Coprococcus
Bacteroides
catus
finegoldii
Clostridium sp
Roseburia
inulinivorans
Eubacterium
Clostridium sp
Faecalibacterium
Acetobacter
Phascolarctobacterium
Oscillibacter
Firmicutes
bacterium
Acidaminococcus
intestini
Faecalibacterium
Allisonella
histaminiformans
Collinsella
Bacteroides
Clostridium
finegoldii
Dorea
Bifidobacterium
longicatena
longum
Lachnospiraceae
Clostridium
Eubacterium
Bacteroides
ovatus
Odoribacter
Clostridium
splanchnicus
Eggerthella sp
Bacteroides
ovatus
Coprococcus
Ruthenibacterium
catus
lactatiformans
Paraprevotella
clara
Oscillibacter
Butyricicoccus sp
Paraprevotella
clara
Eubacterium
Roseburia
Gemmiger
Paraprevotella
clara
Eggerthella
Bacteroides
finegoldii
Eubacterium
Veillonella
Clostridium
atypica
Coprococcus
catus
Lactobacillus
Dorea
ruminis
longicatena
Veillonella
parvula
Blautia sp
Firmicutes
Faecalibacterium
Eubacterium
ventriosum
Butyricimonas
Ruthenibacterium
synergistica
lactatiformans
Bacteroides
Bifidobacterium
finegoldii
bifidum
Bacteroides
Eubacterium
Dorea
eligens
longicatena
Bifidobacterium
bifidum
Eubacterium
Bacteroides
ovatus
Faecalibacterium
Ruminococcus
torques
Firmicutes
Butyricimonas
bacterium
Butyricicoccus
Dorea
Faecalibacterium
longicatena
prausnitzii
Clostridium sp
Bacteroides
Flavonifractor
plebeius CAG
plautii
Roseburia
Clostridium
inulinivorans
Catenibacterium
Blautia
obeum
Clostridium sp
Clostridium sp
Roseburia
Clostridium sp
Bacteroides
caccae
Dorea
Intestinibacter
longicatena
Firmicutes
Butyricicoccus
bacterium
Bacteroides
clarus
Eubacterium
Bacteroides
hallii
clarus
Alistipes
Clostridium
indistinctus
Eubacterium
Lactococcus
rectale
lactis
Clostridium sp
Acidaminococcus
intestini
Alistipes
Butyricicoccus
finegoldii
Bacteroides
clarus
Clostridium sp
Akkermansia
muciniphila
Coprococcus
comes
Akkermansia
Ruminococcus
muciniphila
Blautia sp
Clostridium
Anaerostipes
Oscillibacter
hadrus
Clostridium sp
Faecalibacterium
prausnitzii
Clostridium sp
Clostridium
Lactococcus
Roseburia
lactis
inulinivorans
Bacteroides
Eubacterium
thetaiotaomicron
eligens
Oscillibacter
Clostridium
Clostridium
spiroforme
Butyricicoccus
Roseburia sp
Streptococcus
parasanguinis
Bacteroides
Roseburia
finegoldii
inulinivorans
Dorea sp CAG
Faecalibacterium
Eubacterium
Bacteroides
caccae
Ruthenibacterium
Anaerostipes
lactatiformans
hadrus
Blautia sp
Faecalibacterium
Akkermansia
muciniphila
Ruminococcus
Clostridium
torques
Dialister sp
Bacteroides
Clostridium
stercoris
Clostridium sp
Clostridium
Flavonifractor
Blautia sp
plautii
Bacteroides
vulgatus
Butyricimonas
Clostridium sp
synergistica
Clostridium sp
Faecalibacterium
Ruminococcus
torques
Dorea
Faecalibacterium
longicatena
Ruminococcus
Clostridium
Alistipes
Ruminococcus
Clostridium sp
Faecalibacterium
Eubacterium
Streptococcus
salivarius
Eubacterium
Lactococcus
lactis
Firmicutes
Anaeroma
ssilibacillus
Bifidobacterium
Blautia sp
adolescentis
Bacteroides
Clostridium sp
massiliensis
Clostridium
Bifidobacterium
Ruminococcus
bifidum
lactaris
Alistipes
putredinis
Firmicutes
Firmicutes
bacterium
Ruminococcus
Butyricimonas
torques
synergistica
Clostridium
Ruminococcus
innocuum
torques
Flavonifractor
Roseburia
plautii
inulinivorans
Anaerostipes
Clostridium sp
hadrus
Streptococcus
thermophilus
Eubacterium
rectale
Dorea
Roseburia
longicatena
inulinivorans
Faecalibacterium
Butyricicoccus
Lachnospira
pectinoschiza
Clostridium
Eubacterium
rectale
Eubacterium
Clostridium
Blautia
Phascolarctobacterium
obeum
Faecalibacterium
Clostridium
prausnitzii
Butyricimonas
Eubacterium
synergistica
ventriosum
Clostridium sp
Eubacterium
hallii
Eubacterium
Dorea sp
rectale
Roseburia
Clostridium
inulinivorans
Acidaminococcus
Anaerostipes
intestini
hadrus
Dorea
longicatena
Candidatus
Blautia sp
Gastranaerophilales
bacterium
Eubacterium
Prevotellacopri
Anaeromassili
Clostridium
bacillus sp
Lactobacillus
ruminis
Parabacteroides
distasonis
Bacteroides
Clostridium sp
massiliensis
Butyricicoccus
Butyricicoccus sp
Coprococcus
catus
Firmicutes
bacterium
Butyricimonas
Collinsella
synergistica
Bilophila sp 4
Eubacterium
hallii
Streptococcus
Blautia sp
thermophilus
Roseburia
inulinivorans
Gemmiger
Alistipes
Faecalibacterium
putredinis
prausnitzii
Faecalibacterium
Eubacterium
Bacteroides
caccae
Eubacterium
Ruminococcus
torques
Clostridium sp
Oscillibacter
Firmicutes
bacterium
Eubacterium
Clostridium
Burkholderiales
Faecalibacterium
bacterium
prausnitzii
Bifidobacterium
Eubacterium
Ruminococcus
torques
Clostridium
Lactococcus
lactis
Butyricimonas
Clostridium sp
synergistica
Clostridium
Ruminococcus
Alistipes
finegoldii
Oscillibacter
Roseburia
inulinivorans
Collinsella
Bilophila sp 4
Roseburia
Faecalibacterium
Firmicutes
bacterium
Faecalibacterium
Eubacterium
eligens
Anaerotruncus
Blautia sp
colihominis
Bifidobacterium
Clostridium
pseudocatenulatum
Gordonibacter
Bacteroides
pamelaeae
uniformis
Clostridium
Clostridium sp
Clostridium
Blautia
obeum
Clostridium sp
Blautia sp
Parabacteroides
merdae
Blautia sp
Faecalibacterium
prausnitzii
Fusicatenibacter
saccharivorans
Firmicutes
bacterium
Clostridium
Firmicutes
Collinsella
Bifidobacterium
Flavonifractor sp
catenulatum
Bacteroides
Akkermansia
clarus
muciniphila
Clostridium sp
Butyricimonas sp
Clostridium sp
Ruthenibacterium
lactatiformans
Roseburia sp
Oscillibacter
Faecalibacterium
prausnitzii
Bifidobacterium
Butyricicoccus
longum
Dorea
Clostridium
longicatena
Butyricicoccus
Allisonella
Eggerthella
histaminiformans
Veillonella
Clostridium
atypica
Blautia sp
Collinsella
Parabacteroides
Faecalibacterium sp
merdae
Roseburia sp
Ruminococcus
lactaris
Alistipes
finegoldii
Eggerthella sp
Bacteroides
xylanisolvens
Blautia
Blautia sp
Blautia sp
Butyricimonas
Megamonas
Ruminococcus
funiformis
Clostridium sp
Sutterella
wadsworthensis
Butyricimonas
Eubacterium
synergistica
ventriosum
Blautia
Collinsella
obeum
Subdoligranulum
Ruthenibacterium
Dorea
lactatiformans
longicatena
Coprobacter
secundus
Ruthenibacterium
lactatiformans
Eubacterium
Lachnospira
pectinoschiza
Roseburia sp
Faecalibacterium
Streptococcus
Roseburia
parasanguinis
intestinalis
Acetobacter
Streptococcus
Dialister sp
thermophilus
Roseburia
Roseburia
Clostridium sp
Parabacteroides
distasonis
Bacteroides
Clostridium
uniformis
Odoribacter
Bacteroides
splanchnicus
finegoldii
Firmicutes
Adlercreutzia
bacterium
Butyricicoccus
Dorea
longicatena
Bifidobacterium
Bacteroides
massiliensis
Roseburia
Clostridium
inulinivorans
Faecalibacterium
Bacteroides
uniformis
Eggerthella sp
Dialister sp
Coprobacter
Anaeroma
fastidiosus
ssilibacillus
Faecalibacterium
Bifidobacterium
pseudocatenulatum
Bacteroides
Blautia
Roseburia
inulinivorans
Burkholderiales
bacterium
Eubacterium
Faecalibacterium
eligens
prausnitzii
Odoribacter
Clostridium
splanchnicus
Bifidobacterium
Bacteroides
adolescentis
stercoris
Ruminococcus
torques
Butyricicoccus
Fusicatenibacter
saccharivorans
Eubacterium
hallii
Flavonifractor
Clostridium
plautii
Oscillibacter
Gemmiger
Roseburia
Clostridium sp
Blautia sp
Clostridium sp
Blautia
Eubacterium
Blautia
ramulus
obeum
Clostridium sp
Dialister sp
Ruminococcus
torques
Eubacterium
Bilophila
rectale
Eubacterium
Anaerostipes
Coprococcus
hadrus
catus
Allisonella
Oscillibacter
histaminiformans
Ruminococcus
Faecalibacterium
torques
Streptococcus
Bacteroides
thermophilus
stercoris
Lactobacillus
acidophilus
Firmicutes
Streptococcus
thermophilus
Clostridium sp
Oscillibacter sp
Bifidobacterium
Bilophila
adolescentis
Bacteroides
ovatus
Roseburia
Clostridium
hominis
Firmicutes
bacterium
Veillonella
Dorea sp
parvula
Blautia sp
Blautia
obeum
Streptococcus
thermophilus
Roseburia
Phascolarctobacterium
hominis
Firmicutes
Coprobacter
bacterium
secundus
Bacteroides
Megamonas
finegoldii
funiformis
Faecalibacterium
prausnitzii
Firmicutes
Faecalibacterium
bacterium
prausnitzii
Bacteroides
Bifidobacterium
xylanisolvens
longum
Firmicutes
Clostridium
bacterium
Roseburia
Dorea
faecis
formicigenerans
Alistipes
Clostridium
finegoldii
Bifidobacterium
catenulatum
Dialister sp
Roseburia
Parabacteroides
Blautia sp
distasonis
Anaerostipes
Odoribacter
hadrus
splanchnicus
Firmicutes
Faecalibacterium
Dorea
longicatena
Clostridium
spiroforme
Gemmiger
Faecalibacterium
prausnitzii
Clostridium
Clostridium sp
Butyricicoccus
Blautia sp
Eubacterium
Firmicutes
Blautia sp
bacterium
Roseburia sp
Bifidobacterium
Butyricicoccus
animalis
Clostridium sp
Anaerostipes
hadrus
Blautia sp
Clostridium sp
Ruminococcus
torques
Clostridium sp
Oscillibacter
Cryptobacterium
Methanobrevibacter
Bifidobacterium
Eubacterium hallii
adolescentis
Ruthenibacterium
lactatiformans
Clostridium sp
Faecalibacterium
Dorea
longicatena
Roseburia
hominis
Clostridium sp
Megamonas
funiformis
Bilophila sp 4
Clostridium
Blautia
Parabacteroides
distasonis
Alistipes
Bacteroides
finegoldii
uniformis
Streptococcus
Blautia
parasanguinis
Bifidobacterium
longum
Bifidobacterium
catenulatum
Bacteroides
Lachnospira
plebeius CAG
pectinoschiza
Clostridium
Odoribacter
splanchnicus
Collinsella
Faecalibacterium
Bacteroides
prausnitzii
Roseburia
faecis
Faecalibacterium
Clostridium
prausnitzii
Roseburia
inulinivorans
Faecalibacterium
prausnitzii
Coprococcus
Streptococcus
thermophilus
Faecalibacterium
Roseburia
prausnitzii
Eubacterium
Eubacterium
ramulus
Haemophilus
Akkermansia
muciniphila
Bacteroides
vulgatus
Butyricicoccus
Collinsella
Eubacterium
Clostridium sp
hallii
Ruminococcus
torques
Faecalibacterium
Streptococcus
Oscillibacter
thermophilus
Clostridium sp
Blautia sp
Bacteroides
Eubacterium
clarus
hallii
According to a particular embodiment, the metabolite which is predicted is set forth in Table 4.
Food types that can be used for predicting the corresponding metabolite are also recited in Tables 3 and 4.
The analysis of the frequency of consumption of the food types and/or the daily mean consumption of the food types is optionally and preferably by executing a machine learning procedure. Any of the aforementioned types of machine learning procedures can be used for predicting the quantity of the metabolite based on the food types and/or the daily mean consumption of the food types.
When the metabolite is predicted based on the frequency of consumption and/or the daily mean consumption of the food types, the machine learning procedure used is a trained machine learning procedure. A machine learning procedure can be trained according to some embodiments of the present invention by feeding a machine learning training program with the frequency and/or the daily mean of food types consumed by a cohort of subjects from which the quantities of the metabolite have been determined by blood tests. Once the data are fed, the machine learning training program generates a trained machine learning procedure of a selected type which can then be used without the need to re-train it.
For example, when it is desired to employ decision trees, machine learning training program learns the structure of each tree in a plurality of decision trees (e.g., how many nodes there are in each tree, and how these are connected to one another), and also selects the decision rules for split nodes of each tree. At least a portion of the decision rules relate to one or more food types. A simple decision rule may be a threshold for the frequency of consumption and/or the daily mean consumption of a particular food type, but more complex rules, relating to more than one food type are also contemplated. The machine learning training program also accumulates data at the leaves of the trees.
The structures of the trees, the decision rules for the split nodes, and the data at the leaves are all selected by the machine learning training program, automatically and typically without user intervention, such that the frequency of consumption and/or the daily mean consumption of the food types at the root of the trees provide the quantities of the metabolite as determined by blood tests at the leaves of the trees. The final result of the machine learning training program in this case is a set of trees for each metabolite, where the structures, the decision rules for split nodes, and leaf data for each trees are defined by the machine learning training program.
The Examples section that follows describes machine learning training that was used to generate a set of trees for each of a plurality of metabolite, using training data including metabolite quantities and diet data collected from a cohort of about 500 subjects.
In various exemplary embodiments of the invention a library of machine learning procedures is accessed and searched for a trained machine learning procedure associated with the metabolite. It was found by the inventors that different libraries of machine learning procedures are suitable for microbiome data and for diet data. Thus, when the metabolite is predicted based on the frequency of consumption and/or the daily mean consumption of the food types, the library on medium 110 that is used is preferably not the same as the library used for predicting the metabolite based on the microbiome.
When the metabolite is predicted based on the frequency of consumption and/or the daily mean consumption of the food types, the library can include a machine learning procedure for each of the aforementioned metabolites (in which case N equals the number of the aforementioned metabolites), or a machine learning procedure for each of the metabolites set forth in Table 3 (in which case N equals the number of the metabolites set forth in Table 3), or a machine learning procedure for each of the metabolites set forth in Table 4 (in which case N equals the number of the metabolites set forth in Table 4). Also contemplated are embodiments in which the library includes a machine learning procedure for each of a subset of the aforementioned metabolites or of the metabolites in set forth Table 3, or of the metabolites in set forth Table 4.
When machine learning procedure 114 includes a set of decision trees, each of the trees receives food consumption data (typically frequency of consumption and/or the daily mean consumption of the food types), processes the received food consumption data by the split node decision rules that were defined during the training phase, and provides output values in accordance with the data at the leaves that were also defined during the training phase. The output of all trees is optionally and preferably combined (e.g., summed) to provide the quantity of the respective metabolite.
Preferably, the number of trees in the set is at least 1000 or at least 2000 or more. It was found by the inventors that the food types listed in Table 3 dominate the predicting ability of the decision trees. Thus, in some embodiments of the present invention the number of decision rules relating to the food types listed in Table 3 for the respective metabolite is larger than the number of decision rules relating to other food types.
The Inventors found that the machine learning procedures, particularly, but not exclusively the decision trees, can also be used for solving the inverse problem, wherein the machine learning procedure can recommend one or more amounts of microbiomes of an individual, or recommend consumption of one or more food types.
These embodiments are illustrated in
With reference to
With reference to
It was surprisingly found by the Inventors that a trained machine learning procedure that solves the forward problem, wherein the procedure provides a metabolite quantity after beaning fed with microbiome data (
It will be appreciated that additional features may be used together with the information regarding bacterial abundance and/or food intake to raise the confidence level of the prediction. Such features include for example a macronutrients feature group which can include the daily mean consumption of macronutrients (lipids, proteins, carbohydrates), calories and water, calculated from real-time logging; an anthropometrics feature group which can include weight, BMI, waist and hips circumference, and waist to hips ratio (WHR); a cardiometabolic feature group which can include systolic and diastolic blood pressure, heart rate in beats per minute and a glycemic status; a lifestyle feature group which can include smoking status (current, past) from questionnaires, and the daily mean sleeping time, exercise time and midday sleep time based on the real time logging; a “drugs” feature group which can included binary features representing the reported medication intake of common drugs from questionnaires, and medication groups; a “time of day” feature which is a binary feature indicating whether the sample was taken during the first half of the day; a “seasonal effects” feature which is the month in which the sample was taken, and may also be also grouped months by season (Winter: December-February; Spring: March-May; Summer: June-August; Fall: September-November).
Once the prediction has been made about the metabolite, the present inventors contemplate corroborating the quantity of the metabolite by directly analyzing the amount of that metabolite in the blood of the subject. It is to be understood, however, that while such corroboration is contemplated in some embodiments of the present invention, the corroboration not necessary for the prediction itself. As demonstrated in the Example section that follows, the present inventors were able to train a machine learning procedure such that when fed by the input data (e.g., microbiome data, food consumption data) machine learning procedure, once trained, is capable of predicting the quantity of the metabolite in the blood of the subject even without performing direct analysis of the quantity of the metabolite in the blood of the subject.
Direct analysis of the quantity of the metabolite in the blood of the subject can be performed, for example, during or after the training of the machine learning procedure in order to determine whether the quantity of the metabolite that the machine learning procedure predicts is of clinical relevance, e.g. with a confidence level of at least 90% or at least 95%.
The confidence level of the metabolite quantity can be affirmed by conducting a hypothesis test as known in the art. Typically, the hypothesis test includes selecting the null and alternative hypotheses, and also selecting decision criteria, which are factors upon which a decision to reject or fail to reject the null hypothesis is based. Typical decision criteria include a choice of a test statistic and significance level (denoted algebraically as “alpha”) to be applied to the analysis. Many different test statistics can be used in hypothesis testing, including mean, variance and the like. A p-value can be calculated and be compared to the significance level. The p-value is quantitative assessment of the probability of observing a value of the test statistic that is either as extreme as or more extreme than the calculated value of the test statistic.
Once it is established that a particular trained machine learning procedure is capable of providing clinically relevant predictions for a particular metabolite, the trained machine learning procedure can execute without performing direct analysis of the quantity of the metabolite in the blood of the subject.
Following is a description of techniques suitable for corroborating the quantity of the metabolite in the blood of the subject by direct analysis.
In one embodiment, metabolites are identified using a physical separation method.
The term “physical separation method” as used herein refers to any method known to those with skill in the art sufficient to produce a profile of changes and differences in small molecules produced in hSLCs, contacted with a toxic, teratogenic or test chemical compound according to the methods of this invention. In a preferred embodiment, physical separation methods permit detection of cellular metabolites including but not limited to sugars, organic acids, amino acids, fatty acids, hormones, vitamins, and oligopeptides, as well as ionic fragments thereof and low molecular weight compounds (preferably with a molecular weight less than 3000 Daltons, and more particularly between 50 and 3000 Daltons). For example, mass spectrometry can be used. In particular embodiments, this analysis is performed by liquid chromatography/electrospray ionization time of flight mass spectrometry (LC/ESI-TOF-MS), however it will be understood that metabolites as set forth herein can be detected using alternative spectrometry methods or other methods known in the art for analyzing these types of compounds in this size range.
Certain metabolites can be identified by, for example, gene expression analysis, including real-time PCR, RT-PCR, Northern analysis, and in situ hybridization.
In addition, metabolites can be identified using Mass Spectrometry such as MALDI/TOF (time-of-flight), SELDI/TOF, liquid chromatography-mass spectrometry (LC-MS), gas chromatography-mass spectrometry (GC-MS), high performance liquid chromatography-mass spectrometry (HPLC-MS), capillary electrophoresis-mass spectrometry, nuclear magnetic resonance spectrometry, tandem mass spectrometry (e.g., MS/MS, MS/MS/MS, ESI-MS/MS etc.), secondary ion mass spectrometry (SIMS), or ion mobility spectrometry (e.g. GC-IMS, IMS-MS, LC-IMS, LC-IMS-MS etc.).
Mass spectrometry methods are well known in the art and have been used to quantify and/or identify biomolecules, such as proteins and other cellular metabolites (see, e.g., Li et al., 2000; Rowley et al., 2000; and Kuster and Mann, 1998).
In certain embodiments, a gas phase ion spectrophotometer is used. In other embodiments, laser-desorption/ionization mass spectrometry is used to identify metabolites. Modern laser desorption/ionization mass spectrometry (“LDI-MS”) can be practiced in two main variations: matrix assisted laser desorption/ionization (“MALDI”) mass spectrometry and surface-enhanced laser desorption/ionization (“SELDI”).
In MALDI, the metabolite is mixed with a solution containing a matrix, and a drop of the liquid is placed on the surface of a substrate. The matrix solution then co-crystallizes with the biomarkers. The substrate is inserted into the mass spectrometer. Laser energy is directed to the substrate surface where it desorbs and ionizes the proteins without significantly fragmenting them. However, MALDI has limitations as an analytical tool. It does not provide means for fractionating the biological fluid, and the matrix material can interfere with detection, especially for low molecular weight analytes.
In SELDI, the substrate surface is modified so that it is an active participant in the desorption process. In one variant, the surface is derivatized with adsorbent and/or capture reagents that selectively bind the biomarker of interest. In another variant, the surface is derivatized with energy absorbing molecules that are not desorbed when struck with the laser. In another variant, the surface is derivatized with molecules that bind the biomarker of interest and that contain a photolytic bond that is broken upon application of the laser. In each of these methods, the derivatizing agent generally is localized to a specific location on the substrate surface where the sample is applied. The two methods can be combined by, for example, using a SELDI affinity surface to capture an analyte (e.g. biomarker) and adding matrix-containing liquid to the captured analyte to provide the energy absorbing material.
For additional information regarding mass spectrometers, see, e.g., Principles of Instrumental Analysis, 3rd edition., Skoog, Saunders College Publishing, Philadelphia, 1985; and Kirk-Othmer Encyclopedia of Chemical Technology, 4.sup.th ed. Vol. 15 (John Wiley & Sons, New York 1995), pp. 1071-1094.
In some embodiments, the data from mass spectrometry is represented as a mass chromatogram. A “mass chromatogram” is a representation of mass spectrometry data as a chromatogram, where the x-axis represents time and the y-axis represents signal intensity. In one aspect the mass chromatogram is a total ion current (TIC) chromatogram. In another aspect, the mass chromatogram is a base peak chromatogram. In other embodiments, the mass chromatogram is a selected ion monitoring (SIM) chromatogram. In yet another embodiment, the mass chromatogram is a selected reaction monitoring (SRM) chromatogram. In one embodiment, the mass chromatogram is an extracted ion chromatogram (EIC).
In an EIC, a single feature is monitored throughout the entire run. The total intensity or base peak intensity within a mass tolerance window around a particular analyte's mass-to-charge ratio is plotted at every point in the analysis. The size of the mass tolerance window typically depends on the mass accuracy and mass resolution of the instrument collecting the data. As used herein, the term “feature” refers to a single small metabolite, or a fragment of a metabolite. In some embodiments, the term feature may also include noise upon further investigation.
Detection of the presence of a metabolite will typically involve detection of signal intensity. This, in turn, can reflect the quantity and character of a biomarker bound to the substrate. For example, in certain embodiments, the signal strength of peak values from spectra of a first sample and a second sample can be compared (e.g., visually, by computer analysis etc.) to determine the relative amounts of particular metabolites. Software programs such as the Biomarker Wizard program (Ciphergen Biosystems, Inc., Fremont, Calif.) can be used to aid in analyzing mass spectra. The mass spectrometers and their techniques are well known.
A person skilled in the art understands that any of the components of a mass spectrometer, e.g., desorption source, mass analyzer, detect, etc., and varied sample preparations can be combined with other suitable components or preparations described herein, or to those known in the art. For example, in some embodiments a control sample may contain heavy atoms, e.g. 13C, thereby permitting the test sample to be mixed with the known control sample in the same mass spectrometry run. Good stable isotopic labeling is included.
In one embodiment, a laser desorption time-of-flight (TOF) mass spectrometer is used. In laser desorption mass spectrometry, a substrate with a bound marker is introduced into an inlet system. The marker is desorbed and ionized into the gas phase by laser from the ionization source. The ions generated are collected by an ion optic assembly, and then in a time-of-flight mass analyzer, ions are accelerated through a short high voltage field and let drift into a high vacuum chamber. At the far end of the high vacuum chamber, the accelerated ions strike a sensitive detector surface at a different time. Since the time-of-flight is a function of the mass of the ions, the elapsed time between ion formation and ion detector impact can be used to identify the presence or absence of molecules of specific mass to charge ratio.
In one embodiment of the invention, levels of metabolites are detected by MALDI-TOF mass spectrometry.
Methods of detecting metabolites also include the use of surface plasmon resonance (SPR). The SPR biosensing technology has been combined with MALDI-TOF mass spectrometry for the desorption and identification of metabolites.
Data for statistical analysis can be extracted from chromatograms (spectra of mass signals) using softwares for statistical methods known in the art. “Statistics” is the science of making effective use of numerical data relating to groups of individuals or experiments. Methods for statistical analysis are well-known in the art.
In one embodiment a computer is used for statistical analysis.
In one embodiment, the Agilent MassProfiler or MassProfilerProfessional software is used for statistical analysis. In another embodiment, the Agilent MassHunter software Qual software is used for statistical analysis. In other embodiments, alternative statistical analysis methods can be used. Such other statistical methods include the Analysis of Variance (ANOVA) test, Chi-square test, Correlation test, Factor analysis test, Mann-Whitney U test, Mean square weighted derivation (MSWD), Pearson product-moment correlation coefficient, Regression analysis, Spearman's rank correlation coefficient, Student's T test, Welch's T-test, Tukey's test, and Time series analysis.
In different embodiments signals from mass spectrometry can be transformed in different ways to improve the performance of the method. Either individual signals or summaries of the distributions of signals (such as mean, median or variance) can be so transformed. Possible transformations include taking the logarithm, taking some positive or negative power, for example the square root or inverse, or taking the arcsin (Myers, Classical and Modern Regression with Applications, 2nd edition, Duxbury Press, 1990).
The ability to quantitate the amount of a metabolite allows for the diagnosis of diseases which are known to be associated with an up- or down-regulation of that metabolite.
Thus, according to another aspect of the present invention there is provided a method of diagnosing a disease of a subject comprising predicting the quantity of at least one metabolite which is indicative of the disease, wherein the predicting is carried out as described herein, thereby diagnosing the disease.
As used herein the term “diagnosing” refers to determining presence or absence of a pathology (e.g., a disease, disorder, condition or syndrome), classifying a pathology or a symptom, determining a severity of the pathology, monitoring pathology progression, forecasting an outcome of a pathology and/or prospects of recovery and screening of a subject for a specific disease.
Once the level of the metabolite is measured, it is typically compared to a level of that metabolite in a control subject who is known not to be suffering from said disease. If the amount of the metabolite is significantly up- or down-regulated (e.g. by as much as 1.5 fold, 2 fold, 5 fold, 10 fold or more), then it is indicative that the subject has the disease.
Measuring the amount of the metabolite in the control subject may be carried out prior to, at the same time as, or following measuring the amount of the metabolite of the test subject. Preferably, the abundance of said metabolite is measured in a plurality of control subjects. The data from such measurements may be stored in a database, as further described herein below.
Examples of metabolites whose levels are indicative of diseases include cholesterol (for diagnosis of atherosclerosis, cardio vascular disease (CVD)), and glucose (for diagnosis of diabetes). Particular embodiments of the present invention contemplate a metabolite that is not glucose and is also not cholesterol.
Additional examples of metabolites whose levels are indicative of diseases include trimethylamine N-oxide (TMAO) (for diagnosis of CVD); 3-Carboxy-4-methyl-5-propyl-2-furanpropionic acid (CMPF)—(for diagnosis of chronic kidney disease (CKD)); indoxyl sulfate (for diagnosis of CKD, CVD); and phenylacetylglutamine for diagnosis of CKD, CVD, overall mortality. Additional metabolites which are indicative of disease are listed in Man Lam et al., Journal of Genetics and Genomics 44 (2017) 127e138, the contents of which are incorporated herein by reference.
Examples of diseases that may be diagnosed according to this aspect of the present invention include, but are not limited to atherosclerosis, cardio vascular disease (CVD), metabolic diseases such as diabetes, chronic kidney disease and cancer.
According to some embodiments of the invention, screening of the subject for a specific disease is followed by substantiation of the screen results using gold standard methods. Furthermore, once the disease has been diagnosed, the disease may be treated using methods known in the art, particular to each disease.
It will be appreciated that since the methods describe herein pinpoint particular bacterial functions (e.g. species, genus, families etc.) that contribute to the amount of blood metabolites, the present invention can be used for determining which microbes should be altered in order to bring about a particular effect on a particular blood metabolite.
Thus, according to yet another aspect of the present invention there is provided a method of altering the amount of a metabolite. The method optionally and preferably comprises predicting the amount of the metabolite, and administering to the subject one or more agents which specifically increases or decreases the microbe(s), wherein the agent is selected based on the quantity of the metabolite. The prediction of the metabolite can be done using a machine learning procedure, as described above with respect to
The microbe(s) of the microbiome to be specifically increased or decreased can be selected, according to some embodiments of the present invention, using machine learning. This can be done by operating the trained machine learning procedure to solve the aforementioned inverse problem (
Suppose, for example, that a biological microbiota sample is taken from the body of the subject and is analyzed by biological assays. Suppose that the results of the assays show that the biological microbiota sample contains a set of microbes present at a respective set of amounts in the biological microbiota sample. Suppose further that the amounts of microbes found by the biological assays are fed to a machine learning procedure that has been trained using microbiome data and that is associated with a particular metabolite. Suppose further that the machine learning procedure predicts (
The recommended amounts of microbes found by the machine learning procedure can then be compared to the amounts of microbes found by the biological assays, and the agents that are administered are selected based on this comparison. For example, when for a particular microbe, the recommended amount is less that the amount found by the biological assays, the subject is administered with an agent that increases the amount of that particular microbe. Conversely, when for a particular microbe, the recommended amount is more that the amount found by the biological assays, the subject is administered with an agent that decreases the amount of that particular microbe. Also, when for a particular microbe, the recommended amount is the same or approximately the same (with tolerance of up to 10%) as the amount found by the biological assays, no agent is administered for this microbe.
According to one particular embodiment, the altering is carried out by increasing a bacterial population whose level is predicted to being below the level in a healthy subject. Table 1 provides examples of bacterial populations which positively and negatively correlate with a particular metabolite, predictor 1 being of the most significance and predictor 5 being of the least significance.
For example, according to Table 1, a positive number represents a positive correlation of that microbe with the corresponding metabolite and a negative number represents an inverse correlation of that microbe with the corresponding metabolite. Therefore in order to increase the level of X-16124 for example, agents may be provided which increase the level of F: Eggerthellaceae; and decrease the level of S: Gordonibacter pamelaeae.
Altering the amount of particular metabolites may be beneficial to the health of the subject.
According to a particular embodiment, altering the amount of a metabolite is beneficial for the treatment and/or prevention of a disease. Exemplary diseases include, but are not limited to those described herein above.
The term “treating” refers to inhibiting, preventing or arresting the development of a pathology (disease, disorder or condition) and/or causing the reduction, remission, or regression of a pathology. Those of skill in the art will understand that various methodologies and assays can be used to assess the development of a pathology, and similarly, various methodologies and assays may be used to assess the reduction, remission or regression of a pathology.
As used herein, the term “preventing” refers to keeping a disease, disorder or condition from occurring in a subject who may be at risk for the disease, but has not yet been diagnosed as having the disease.
Upregulation:
An agent which increases the amount of a particular bacteria includes that particular bacteria itself (i.e. a probiotic composition).
The term “probiotic” as used herein, refers to one or more microorganisms which, when administered appropriately, can confer a health benefit on the host or subject and/or reduction of risk and/or symptoms of a disease, disorder, condition, or event in a host organism.
The present invention contemplates an agent which up-regulates at least one strain, 10 strains, 20 strains, 30 strains, 40 strains, 50 strains, 60 strains, 70 strains, 80 strains, 90 strains or all of the strains of the above disclosed species.
In one embodiment, the agent specifically upregulates the specified species of bacteria.
Thus, for example, the agent may increase the amount of the specified bacterial species as compared to at least one other bacterial species of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent upregulates the particular bacterial species by at least 5 fold, 10 fold or more as compared to at least one other bacterial species of the microbiome.
In another embodiment, the agent increases the amount of the specified bacterial species as compared to at least 10% of the total bacterial species of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent upregulates the specified bacterial species by at least 5 fold, 10 fold or more as compared to at least 10% of the total bacterial species of the microbiome of the subject.
In another embodiment, the agent increases the amount of the specified bacterial species as compared to at least 20% of the total bacterial species of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent upregulates the specified bacterial species by at least 5 fold, 10 fold or more as compared to at least 20% of the total bacterial species of the microbiome of the subject.
In another embodiment, the agent increases the amount of the specified bacterial species as compared to at least 30% of the total bacterial species of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent upregulates the specified bacterial species by at least 5 fold, 10 fold or more as compared to at least 30% of the total bacterial species of the microbiome of the subject.
In another embodiment, the agent increases the amount of the specified bacterial species as compared to at least 40% of the total bacterial species of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent upregulates the specified bacterial species by at least 5 fold, 10 fold or more as compared to at least 40% of the total bacterial species of the microbiome of the subject.
In another embodiment, the agent increases the amount of the specified bacterial species as compared to at least 50% of the total bacterial species of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent upregulates the specified bacterial species by at least 5 fold, 10 fold or more as compared to at least 50% of the total bacterial species of the microbiome of the subject.
In another embodiment, the agent increases the amount of the specified bacterial species as compared to at least 60% of the total bacterial species of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent upregulates the specified bacterial species by at least 5 fold, 10 fold or more as compared to at least 60% of the total bacterial species of the microbiome of the subject.
In another embodiment, the agent increases the amount of the specified bacterial species as compared to at least 70% of the total bacterial species of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent upregulates the specified bacterial species by at least 5 fold, 10 fold or more as compared to at least 70% of the total bacterial species of the microbiome of the subject.
In another embodiment, the agent increases the amount of the specified bacterial species as compared to at least 80% of the total bacterial species of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent upregulates the specified bacterial species by at least 5 fold, 10 fold or more as compared to at least 80% of the total bacterial species of the microbiome of the subject.
In another embodiment, the agent increases the amount of the specified bacterial species as compared to at least 90% of the total bacterial species of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent upregulates the specified bacterial species by at least 5 fold, 10 fold or more as compared to at least 90% of the total bacterial species of the microbiome of the subject.
According to an embodiment of this aspect of the present invention, the agent increases the species of bacteria by at least 2 fold as compared to at least one other species of bacteria that belongs to a different genus present in the microbiome.
According to a particular embodiment the agent increases the species of bacteria by at least 5 fold, 10 fold or more as compared to at least one other species of bacteria that belongs to a different genus present in the microbiome.
According to one embodiment, the agent increases the species of bacteria by at least 2 fold as compared to at least one other species of bacteria that belongs to the same genus present in the microbiome.
According to a particular embodiment the agent increases the species of bacteria by at least 5 fold, 10 fold or more as compared to at least one other species of bacteria that belongs to the same genus present in the microbiome.
Preferably, the agents of this aspect of the present invention are capable of increases the growth and/or colonization of the bacterial species.
Exemplary agents that are capable of increasing the specified species include microbial compositions. Such microbial compositions typically do not comprise more than 100 bacterial species, more than 90 bacterial species, more than 80 bacterial species, more than 70 bacterial species, more than 60 bacterial species, more than 50 bacterial species, more than 40 bacterial species, more than 30 bacterial species, more than 20 bacterial species, more than 10 bacterial species, or even more than 5 bacterial species.
The microbial compositions of the present invention are not fecal transplants derived from a healthy subject.
The bacterial compositions can comprise more than one strain of a bacterial species, more than 2 strains of a bacterial species, more than 3 strains of a bacterial species, more than 4 strains of a bacterial species, more than 5 strains of a bacterial species, more than 6 strains of a bacterial species, more than 7 strains of a bacterial species, more than 8 strains of a bacterial species, more than 9 strains of a bacterial species, more than 10 strains of a bacterial species, more than 11 strains of a bacterial species, more than 12 strains of a bacterial species, more than 13 strains of a bacterial species, more than 14 strains of a bacterial species, more than 15 strains of a bacterial species, more than 16 strains of a bacterial species, more than 17 strains of a bacterial species, more than 18 strains of a bacterial species, more than 19 strains of a bacterial species, more than 20 strains of a bacterial species or more.
The present inventors contemplate microbial compositions where more than 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90% or even 100%, of the bacteria of the composition is bacteria of the specified bacterial species.
The present inventors contemplate any formulation for the microbial compositions so long as the bacterial population within is capable of propagating when administered to the subject.
The compositions of the present invention may be formulated as a food supplement, an enema, a tablet, a capsule or a syringe.
The compositions of the invention can be formulated as a slurry, saline or buffered suspensions (e.g., for an enema, suspended in a buffer or a saline), in a drink (e.g., a milk, yoghurt, a shake, a flavoured drink or equivalent) for oral delivery, and the like.
In alternative embodiments, compositions of the invention can be formulated as an enema product, a spray dried product, reconstituted enema, a small capsule product, a small capsule product suitable for administration to children, a bulb syringe, a bulb syringe suitable for a home enema with a saline addition, a powder product, a powder product in oxygen deprived sachets, a powder product in oxygen deprived sachets that can be added to, for example, a bulb syringe or enema, or a spray dried product in a device that can be attached to a container with an appropriate carrier medium such as yoghurt or milk and that can be directly incorporated and given as a dosing for example for children.
In one embodiment, compositions of the invention can be delivered directly in a carrier medium via a screw-top lid wherein the bacterial material is suspended in the lid and released on twisting the lid straight into the carrier medium.
In alternative embodiments methods of delivery of compositions of the invention include use of bacterial slurries into the bowel, via an enema suspended in saline or a buffer, via a small bowel infusion via a nasoduodenal tube, via a gastrostomy, or by using a colonoscope.
According to still another embodiment, the microbial composition of any of the aspects of the present invention is devoid (or comprises only trace quantities) of fecal material (e.g., fiber).
The probiotic bacteria may be in any suitable form, for example in a powdered dry form. In addition, the probiotic microorganism may have undergone processing in order for it to increase its survival. For example, the microorganism may be coated or encapsulated in a polysaccharide, fat, starch, protein or in a sugar matrix. Standard encapsulation techniques known in the art can be used. For example, techniques discussed in U.S. Pat. No. 6,190,591, which is hereby incorporated by reference in its entirety, may be used.
According to a particular embodiment, the probiotic microorganism composition is formulated in a food product, functional food or nutraceutical.
In some embodiments, a food product, functional food or nutraceutical is or comprises a dairy product. In some embodiments, a dairy product is or comprises a yogurt product. In some embodiments, a dairy product is or comprises a milk product. In some embodiments, a dairy product is or comprises a cheese product. In some embodiments, a food product, functional food or nutraceutical is or comprises a juice or other product derived from fruit. In some embodiments, a food product, functional food or nutraceutical is or comprises a product derived from vegetables. In some embodiments, a food product, functional food or nutraceutical is or comprises a grain product, including but not limited to cereal, crackers, bread, and/or oatmeal. In some embodiments, a food product, functional food or nutraceutical is or comprises a rice product. In some embodiments, a food product, functional food or nutraceutical is or comprises a meat product.
Prior to administration, the subject may be pretreated with an agent which reduces the number of naturally occurring microbes in the microbiome (e.g. by antibiotic treatment). According to a particular embodiment, the treatment significantly eliminates the naturally occurring gut microflora by at least 20%, 30% 40%, 50%, 60%, 70%, 80% or even 90%.
Downregulation:
The present invention contemplates an agent which down-regulates at least one strain, 10% of the strains, 20% of the strains, 30% of the strains, 40% of the strains, 50% of the strains, 60% of the strains, 70% of the strains, 80% of the strains, 90% of the strains or all of the strains of any of the uncovered species recited in Table 1.
Thus, for example, the agent may reduce the amount of the specified bacterial species as compared to at least one other bacterial species of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent downregulates the particular bacterial species by at least 5 fold, 10 fold or more as compared to at least one other bacterial species of the microbiome.
In another embodiment, the agent reduces the amount of the specified bacterial species as compared to at least 10% of the total bacterial species of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent downregulates the specified bacterial species by at least 5 fold, 10 fold or more as compared to at least 10% of the total bacterial species of the microbiome of the subject.
In another embodiment, the agent reduces the amount of the specified bacterial species as compared to at least 20% of the total bacterial species of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent downregulates the specified bacterial species by at least 5 fold, 10 fold or more as compared to at least 20% of the total bacterial species of the microbiome of the subject.
In another embodiment, the agent reduces the amount of the specified bacterial species as compared to at least 30% of the total bacterial species of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent downregulates the specified bacterial species by at least 5 fold, 10 fold or more as compared to at least 30% of the total bacterial species of the microbiome of the subject.
In another embodiment, the agent reduces the amount of the specified bacterial species as compared to at least 40% of the total bacterial species of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent downregulates the specified bacterial species by at least 5 fold, 10 fold or more as compared to at least 40% of the total bacterial species of the microbiome of the subject.
In another embodiment, the agent reduces the amount of the specified bacterial species as compared to at least 50% of the total bacterial species of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent downregulates the specified bacterial species by at least 5 fold, 10 fold or more as compared to at least 50% of the total bacterial species of the microbiome of the subject.
In another embodiment, the agent reduces the amount of the specified bacterial species as compared to at least 60% of the total bacterial species of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent downregulates the specified bacterial species by at least 5 fold, 10 fold or more as compared to at least 60% of the total bacterial species of the microbiome of the subject.
In another embodiment, the agent reduces the amount of the specified bacterial species as compared to at least 70% of the total bacterial species of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent downregulates the specified bacterial species by at least 5 fold, 10 fold or more as compared to at least 70% of the total bacterial species of the microbiome of the subject.
In another embodiment, the agent reduces the amount of the specified bacterial species as compared to at least 80% of the total bacterial species of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent downregulates the specified bacterial species by at least 5 fold, 10 fold or more as compared to at least 80% of the total bacterial species of the microbiome of the subject.
In another embodiment, the agent reduces the amount of the specified bacterial species as compared to at least 90% of the total bacterial species of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent downregulates the specified bacterial species by at least 5 fold, 10 fold or more as compared to at least 90% of the total bacterial species of the microbiome of the subject.
According to an embodiment of this aspect of the present invention, the agent reduces the species of bacteria by at least 2 fold as compared to at least one other species of bacteria that belongs to a different genus present in the microbiome.
According to a particular embodiment the agent reduces the species of bacteria by at least 5 fold, 10 fold or more as compared to at least one other species of bacteria that belongs to a different genus present in the microbiome.
According to one embodiment, the agent reduces the species of bacteria by at least 2 fold as compared to at least one other species of bacteria that belongs to the same genus present in the microbiome.
According to a particular embodiment the agent reduces the species of bacteria by at least 5 fold, 10 fold or more as compared to at least one other species of bacteria that belongs to the same genus present in the microbiome.
Preferably, the agents of this aspect of the present invention are capable of decreasing the growth and/or colonization of the bacterial species.
The agent which downregulates the bacteria that is recited in Tables 1 or 2 may be able to reduce the amount (either absolute or relative amount) and/or activity (either absolute or relative activity) of a particular strain of bacteria.
According to a particular embodiment, the agent specifically downregulates the specified strain.
Thus, in one embodiment, the agent reduces the amount of the specified bacterial strain as compared to at least one other bacterial strain of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent downregulates the particular bacterial strain by at least 5 fold, 10 fold or more as compared to at least one other bacterial strain of the microbiome.
In another embodiment, the agent reduces the amount of the specified bacterial strain as compared to at least 10% of the total bacterial strains of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent downregulates the specified bacterial strain by at least 5 fold, 10 fold or more as compared to at least 10% of the total bacterial strains of the microbiome of the subject.
In another embodiment, the agent reduces the amount of the specified bacterial strain as compared to at least 20% of the total bacterial strains of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent downregulates the specified bacterial strain by at least 5 fold, 10 fold or more as compared to at least 20% of the total bacterial strains of the microbiome of the subject.
In another embodiment, the agent reduces the amount of the specified bacterial strain as compared to at least 30% of the total bacterial strains of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent downregulates the specified bacterial strain by at least 5 fold, 10 fold or more as compared to at least 30% of the total bacterial strains of the microbiome of the subject.
In another embodiment, the agent reduces the amount of the specified bacterial strain as compared to at least 40% of the total bacterial strains of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent downregulates the specified bacterial strain by at least 5 fold, 10 fold or more as compared to at least 40% of the total bacterial strains of the microbiome of the subject.
In another embodiment, the agent reduces the amount of the specified bacterial strain as compared to at least 50% of the total bacterial strains of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent downregulates the specified bacterial strain by at least 5 fold, 10 fold or more as compared to at least 50% of the total bacterial strains of the microbiome of the subject.
In another embodiment, the agent reduces the amount of the specified bacterial strain as compared to at least 60% of the total bacterial strains of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent downregulates the specified bacterial strain by at least 5 fold, 10 fold or more as compared to at least 60% of the total bacterial strains of the microbiome of the subject.
In another embodiment, the agent reduces the amount of the specified bacterial strain as compared to at least 70% of the total bacterial strains of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent downregulates the specified bacterial strain by at least 5 fold, 10 fold or more as compared to at least 70% of the total bacterial strains of the microbiome of the subject.
In another embodiment, the agent reduces the amount of the specified bacterial strain as compared to at least 80% of the total bacterial strains of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent downregulates the specified bacterial strain by at least 5 fold, 10 fold or more as compared to at least 80% of the total bacterial strains of the microbiome of the subject.
In another embodiment, the agent reduces the amount of the specified bacterial strain as compared to at least 90% of the total bacterial strains of the microbiome of the subject, by at least 2 fold. According to a particular embodiment, the agent downregulates the specified bacterial strain by at least 5 fold, 10 fold or more as compared to at least 90% of the total bacterial strains of the microbiome of the subject.
According to an embodiment of this aspect of the present invention, the agent reduces the strain of bacteria by at least 2 fold as compared to at least one other strain of bacteria that belongs to a different species present in the microbiome.
According to a particular embodiment the agent reduces the strain of bacteria by at least 5 fold, 10 fold or more as compared to at least one other strain of bacteria that belongs to a different species present in the microbiome.
According to one embodiment, the agent reduces the strain of bacteria by at least 2 fold as compared to at least one other strain of bacteria that belongs to the same species present in the microbiome.
According to a particular embodiment the agent reduces the strain of bacteria by at least 5 fold, 10 fold or more as compared to at least one other strain of bacteria that belongs to the same species present in the microbiome.
Preferably, the agents of this aspect of the present invention are capable of decreasing the growth and/or colonization of the bacterial strain.
An exemplary agent which is capable of reducing a particular bacterial species or strain is an antibiotic.
As used herein, the term “antibiotic agent” refers to a group of chemical substances, isolated from natural sources or derived from antibiotic agents isolated from natural sources, having a capacity to inhibit growth of, or to destroy bacteria, and other microorganisms, used chiefly in treatment of infectious diseases.
Examples of antibiotics contemplated by the present invention include, but are not limited to Daptomycin; Gemifloxacin; Telavancin; Ceftaroline; Fidaxomicin; Amoxicillin; Ampicillin; Bacampicillin; Carbenicillin; Cloxacillin; Dicloxacillin; Flucloxacillin; Mezlocillin; Nafcillin; Oxacillin; Penicillin G; Penicillin V; Piperacillin; Pivampicillin; Pivmecillinam; Ticarcillin; Aztreonam; Imipenem; Doripenem; Meropenem; Ertapenem; Clindamycin; Lincomycin; Pristinamycin; Quinupristin; Cefacetrile (cephacetrile); Cefadroxil (cefadroxyl); Cefalexin (cephalexin); Cefaloglycin (cephaloglycin); Cefalonium (cephalonium); Cefaloridine (cephaloridine); Cefalotin (cephalothin); Cefapirin (cephapirin); Cefatrizine; Cefazaflur; Cefazedone; Cefazolin (cephazolin); Cefradine (cephradine); Cefroxadine; Ceftezole; Cefaclor; Cefamandole; Cefmetazole; Cefonicid; Cefotetan; Cefoxitin; Cefprozil (cefproxil); Cefuroxime; Cefuzonam; Cefcapene; Cefdaloxime; Cefdinir; Cefditoren; Cefetamet; Cefixime; Cefmenoxime; Cefodizime; Cefotaxime; Cefpimizole; Cefpodoxime; Cefteram; Ceftibuten; Ceftiofur; Ceftiolene; Ceftizoxime; Ceftriaxone; Cefoperazone; Ceftazidime; Cefclidine; Cefepime; Cefluprenam; Cefoselis; Cefozopran; Cefpirome; Cefquinome; Fifth Generation; Ceftobiprole; Ceftaroline; Not Classified; Cefaclomezine; Cefaloram; Cefaparole; Cefcanel; Cefedrolor; Cefempidone; Cefetrizole; Cefivitril; Cefmatilen; Cefmepidium; Cefovecin; Cefoxazole; Cefrotil; Cefsumide; Cefuracetime; Ceftioxide; Azithromycin; Erythromycin; Clarithromycin; Dirithromycin; Roxithromycin; Telithromycin; Amikacin; Gentamicin; Kanamycin; Neomycin; Netilmicin; Paromomycin; Streptomycin; Tobramycin; Flumequine; Nalidixic acid; Oxolinic acid; Piromidic acid; Pipemidic acid; Rosoxacin; Ciprofloxacin; Enoxacin; Lomefloxacin; Nadifloxacin; Norfloxacin; Ofloxacin; Pefloxacin; Rufloxacin; Balofloxacin; Gatifloxacin; Grepafloxacin; Levofloxacin; Moxifloxacin; Pazufloxacin; Sparfloxacin; Temafloxacin; Tosufloxacin; Besifloxacin; Clinafloxacin; Gemifloxacin; Sitafloxacin; Troyafloxacin; Prulifloxacin; Sulfamethizole; Sulfamethoxazole; Sulfisoxazole; Trimethoprim-Sulfamethoxazole; Demeclocycline; Doxycycline; Minocycline; Oxytetracycline; Tetracycline; Tigecycline; Chloramphenicol; Metronidazole; Tinidazole; Nitrofurantoin; Vancomycin; Teicoplanin; Telavancin; Linezolid; Cycloserine 2; Rifampin; Rifabutin; Rifapentine; B acitracin; Polymyxin B; Viomycin; Capreomycin.
Antibacterial agents also include antibacterial peptides. Examples include but are not limited to abaecin; andropin; apidaecins; bombinin; brevinins; buforin II; CAP18; cecropins; ceratotoxin; defensins; dermaseptin; dermcidin; drosomycin; esculentins; indolicidin; LL37; magainin; maximum H5; melittin; moricin; prophenin; protegrin; and or tachyplesins.
According to a particular embodiment, the antibiotic is a non-absorbable antibiotic.
Other agents which are not antibiotics are also contemplated by the present inventors.
Thus the present inventors contemplate the use of bacteriophages to downregulate the disclosed bacterial species/strains.
As used herein, the term “bacteriophage” refers to a virus that infects and replicates within bacteria. Bacteriophages are composed of proteins that encapsulate a genome comprising either DNA or RNA. Bacteriophages replicate within bacteria following the injection of their genome into the bacterial cytoplasm.
In one embodiment, the bacteriophage is a lytic bacteriophage. In another embodiment, the bacteriophage is lysogenic.
In some embodiments, the bacteriophages are used in combination with one or more other bacteriophages. The combinations of bacteriophages can target the same detrimental microorganism or different detrimental microorganisms. Preferably, the combination of bacteriophages targets the same detrimental microorganism.
In some embodiments, the bacteriophage or combination of bacteriophages are used in combination with one or more probiotic microorganisms—such as those described herein below.
In other embodiments, the bacteriophages or combination of bacteriophages are used in combination with one or more antibiotic, as disclosed herein.
In some embodiments, the bacteriophage is administered orally at a dose ranging from 105 to 1010 plaque-forming units (PFU)/g, preferably 107 to 108 PFU/g. In some embodiments, the bacteriophages are administered at a dose of 105 to 1010 PFU/day, preferably 107 to 108 PFU/day.
According to another embodiment, the agent is a bacteriophage protein such as an isolated phage protein, e.g., a lysin protein, tail protein, or active fragment.
In one embodiment, the agent which is capable of down-regulating a particular bacterial species/strain is a bacterial population that competes with the bacterial species/strain for essential resources. Bacterial compositions are further described herein below.
In still another embodiment, the agent which is capable of down-regulating a particular bacterial species/strain is a metabolite of a competing bacterial population (or even from the same species/strain) that serves to decrease the relative amount of the bacterial species/strain.
Additional agents that can specifically reduce a particular bacterial species or strain are known in the art and include polynucleotide silencing agents.
Preferably, the polynucleotide silencing agent of this aspect of the present invention targets a sequence that encodes at least one essential gene (i.e., compatible with life) in the bacteria. The sequence which is targeted should be specific to the particular bacteria species that it is desired to down-regulate. Such genes include ribosomal RNA genes (16S and 23S), ribosomal protein genes, tRNA-synthetases, as well as additional genes shown to be essential such as dnaB, fabI, folA, gyrB, murA, pytH, metG, and tufA(B).
According to an embodiment of the invention, the polynucleotide silencing agent is specific to the target RNA and does not cross inhibit or silence other targets or a splice variant which exhibits 99% or less global homology to the target gene, e.g., less than 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81% global homology to the target gene; as determined by PCR, Western blot, Immunohistochemistry and/or flow cytometry.
One agent capable of downregulating an essential bacterial gene is a RNA-guided endonuclease technology e.g. CRISPR system. In one embodiment, the CRISPR system is expressed in a bacteriophage.
As used herein, the term “CRISPR system” also known as Clustered Regularly Interspaced Short Palindromic Repeats refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated genes, including sequences encoding a Cas gene (e.g. CRISPR-associated endonuclease 9), a tracr (trans-activating CRISPR) sequence (e.g. tracrRNA or an active partial tracrRNA), a tracr-mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat) or a guide sequence (also referred to as a “spacer”) including but not limited to a crRNA sequence (i.e. an endogenous bacterial RNA that confers target specificity yet requires tracrRNA to bind to Cas) or a sgRNA sequence (i.e. single guide RNA).
In some embodiments, one or more elements of a CRISPR system is derived from a type I, type II, or type III CRISPR system. In some embodiments, one or more elements of a CRISPR system (e.g. Cas) is derived from a particular organism comprising an endogenous CRISPR system, such as Streptococcus pyogenes, Neisseria meningitides, Streptococcus thermophilus or Treponema denticola.
In general, a CRISPR system is characterized by elements that promote the formation of a CRISPR complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous CRISPR system).
In the context of formation of a CRISPR complex, “target sequence” refers to a sequence to which a guide sequence (i.e. guide RNA e.g. sgRNA or crRNA) is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex. Full complementarity is not necessarily required, provided there is sufficient complementarity to cause hybridization and promote formation of a CRISPR complex. Thus, according to some embodiments, global homology to the target sequence may be of 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95% or 99%. A target sequence may comprise any polynucleotide, such as DNA or RNA polynucleotides. In some embodiments, a target sequence is located in the nucleus or cytoplasm of a cell.
Thus, the CRISPR system comprises two distinct components, a guide RNA (gRNA) that hybridizes with the target sequence, and a nuclease (e.g. Type-II Cas9 protein), wherein the gRNA targets the target sequence and the nuclease (e.g. Cas9 protein) cleaves the target sequence. The guide RNA may comprise a combination of an endogenous bacterial crRNA and tracrRNA, i.e. the gRNA combines the targeting specificity of the crRNA with the scaffolding properties of the tracrRNA (required for Cas9 binding). Alternatively, the guide RNA may be a single guide RNA capable of directly binding Cas.
Typically, in the context of an endogenous CRISPR system, formation of a CRISPR complex (comprising a guide sequence hybridized to a target sequence and complexed with one or more Cas proteins) results in cleavage of one or both strands in or near (e.g. within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, or more base pairs from) the target sequence. Without wishing to be bound by theory, the tracr sequence, which may comprise or consist of all or a portion of a wild-type tracr sequence (e.g. about or more than about 20, 26, 32, 45, 48, 54, 63, 67, 85, or more nucleotides of a wild-type tracr sequence), may also form part of a CRISPR complex, such as by hybridization along at least a portion of the tracr sequence to all or a portion of a tracr mate sequence that is operably linked to the guide sequence.
In some embodiments, the tracr sequence has sufficient complementarity to a tracr mate sequence to hybridize and participate in formation of a CRISPR complex. As with the target sequence, a complete complementarity is not needed, provided there is sufficient to be functional. In some embodiments, the tracr sequence has at least 50%, 60%, 70%, 80%, 90%, 95% or 99% of sequence complementarity along the length of the tracr mate sequence when optimally aligned.
Introducing CRISPR/Cas into a cell may be effected using one or more vectors driving expression of one or more elements of a CRISPR system such that expression of the elements of the CRISPR system direct formation of a CRISPR complex at one or more target sites. For example, a Cas enzyme, a guide sequence linked to a tracr-mate sequence, and a tracr sequence could each be operably linked to separate regulatory elements on separate vectors. Alternatively, two or more of the elements expressed from the same or different regulatory elements, may be combined in a single vector, with one or more additional vectors providing any components of the CRISPR system not included in the first vector. CRISPR system elements that are combined in a single vector may be arranged in any suitable orientation, such as one element located 5′ with respect to (“upstream” of) or 3′ with respect to (“downstream” of) a second element. The coding sequence of one element may be located on the same or opposite strand of the coding sequence of a second element, and oriented in the same or opposite direction. A single promoter may drive expression of a transcript encoding a CRISPR enzyme and one or more of the guide sequence, tracr mate sequence (optionally operably linked to the guide sequence), and a tracr sequence embedded within one or more intron sequences (e.g. each in a different intron, two or more in at least one intron, or all in a single intron).
As well as altering the bacterial composition of the microbiome of the subject, the present inventors also contemplate altering food intake to control the level of a metabolite.
Thus, according to a particular aspect of the present invention there is provided a method of providing dietary advice to a subject, the method comprising predicting the level of a metabolite in the blood by carrying out the methods described herein, wherein when said metabolite is above or below the recommended level of said metabolite, recommending consumption of at least one food type that alters the level of said metabolite.
The dietary advice can be provided, according to some embodiments of the present invention, using machine learning. This can be done by operating the trained machine learning procedure to solve the aforementioned inverse problem (
Suppose, for example, that for a particular subject it was found that a certain quantity Q1 of a particular metabolite is clinically unsatisfactory, and that it is desired to alter the quantity of the particular metabolite to a new, desired, quantity Q2. The quantity Q1 can be found by performing a blood test or, more preferably, by feeding a machine learning procedure that has been trained using food consumption data and that is associated with a particular metabolite, with the frequency and/or the daily mean consumption of several food types (
The desired quantity Q2 of the particular metabolite can fed to a machine learning procedure (that has been trained using food consumption data and that is associated with the particular metabolite) in a manner that the machine learning procedure propagates backwards to solve the inverse problem and to provide a recommended food consumption (
In one embodiment, the metabolite is set forth in Table 3 and more preferably in Table 4.
The dietary advise provided to the subject could include a list of foods that may help in increasing or decreasing that metabolite.
According to one particular embodiment, the altering is carried out by increasing intake of a food whose level is predicted to being below the level in a healthy subject. Table 3 provides examples food types which positively correlate with a particular metabolite.
For example, according to Table 3, in order to increase the level of 1-methylxanthine for example, the amount of coffee intake should be increased.
Tables 3 and 4 list the most preferred foods that can be altered in order to alter the level of the corresponding metabolite, predictor 1 being of the most significance and predictor 5 being of the least significance. Of note, the abbreviation “wt” which appears in the Tables refers to the daily mean consumption of specific food types in grams.
As used herein the term “about” refers to ±10%.
The terms “comprises”, “comprising”, “includes”, “including”, “having” and their conjugates mean “including but not limited to”.
The term “consisting of” means “including and limited to”.
The term “consisting essentially of” means that the composition, method or structure may include additional ingredients, steps and/or parts, but only if the additional ingredients, steps and/or parts do not materially alter the basic and novel characteristics of the claimed composition, method or structure.
As used herein, the singular form “a”, “an” and “the” include plural references unless the context clearly dictates otherwise. For example, the term “a compound” or “at least one compound” may include a plurality of compounds, including mixtures thereof.
Throughout this application, various embodiments of this invention may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
Whenever a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range. The phrases “ranging/ranges between” a first indicate number and a second indicate number and “ranging/ranges from” a first indicate number “to” a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals therebetween.
As used herein the term “method” refers to manners, means, techniques and procedures for accomplishing a given task including, but not limited to, those manners, means, techniques and procedures either known to, or readily developed from known manners, means, techniques and procedures by practitioners of the chemical, pharmacological, biological, biochemical and medical arts.
As used herein, the term “treating” includes abrogating, substantially inhibiting, slowing or reversing the progression of a condition, substantially ameliorating clinical or aesthetical symptoms of a condition or substantially preventing the appearance of clinical or aesthetical symptoms of a condition.
When reference is made to particular sequence listings, such reference is to be understood to also encompass sequences that substantially correspond to its complementary sequence as including minor sequence variations, resulting from, e.g., sequencing errors, cloning errors, or other alterations resulting in base substitution, base deletion or base addition, provided that the frequency of such variations is less than 1 in 50 nucleotides, alternatively, less than 1 in 100 nucleotides, alternatively, less than 1 in 200 nucleotides, alternatively, less than 1 in 500 nucleotides, alternatively, less than 1 in 1000 nucleotides, alternatively, less than 1 in 5,000 nucleotides, alternatively, less than 1 in 10,000 nucleotides.
It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as suitable in any other described embodiment of the invention. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements.
Various embodiments and aspects of the present invention as delineated hereinabove and as claimed in the claims section below find experimental support in the following examples.
Reference is now made to the following examples, which together with the above descriptions illustrate some embodiments of the invention in a non limiting fashion.
Generally, the nomenclature used herein and the laboratory procedures utilized in the present invention include molecular, biochemical, microbiological and recombinant DNA techniques. Such techniques are thoroughly explained in the literature. See, for example, “Molecular Cloning: A laboratory Manual” Sambrook et al., (1989); “Current Protocols in Molecular Biology” Volumes I-III Ausubel, R. M., ed. (1994); Ausubel et al., “Current Protocols in Molecular Biology”, John Wiley and Sons, Baltimore, Md. (1989); Perbal, “A Practical Guide to Molecular Cloning”, John Wiley & Sons, New York (1988); Watson et al., “Recombinant DNA”, Scientific American Books, New York; Birren et al. (eds) “Genome Analysis: A Laboratory Manual Series”, Vols. 1-4, Cold Spring Harbor Laboratory Press, New York (1998); methodologies as set forth in U.S. Pat. Nos. 4,666,828; 4,683,202; 4,801,531; 5,192,659 and 5,272,057; “Cell Biology: A Laboratory Handbook”, Volumes I-III Cellis, J. E., ed. (1994); “Culture of Animal Cells—A Manual of Basic Technique” by Freshney, Wiley-Liss, N. Y. (1994), Third Edition; “Current Protocols in Immunology” Volumes I-III Coligan J. E., ed. (1994); Stites et al. (eds), “Basic and Clinical Immunology” (8th Edition), Appleton & Lange, Norwalk, Conn. (1994); Mishell and Shiigi (eds), “Selected Methods in Cellular Immunology”, W. H. Freeman and Co., New York (1980); available immunoassays are extensively described in the patent and scientific literature, see, for example, U.S. Pat. Nos. 3,791,932; 3,839,153; 3,850,752; 3,850,578; 3,853,987; 3,867,517; 3,879,262; 3,901,654; 3,935,074; 3,984,533; 3,996,345; 4,034,074; 4,098,876; 4,879,219; 5,011,771 and 5,281,521; “Oligonucleotide Synthesis” Gait, M. J., ed. (1984); “Nucleic Acid Hybridization” Hames, B. D., and Higgins S. J., eds. (1985); “Transcription and Translation” Hames, B. D., and Higgins S. J., eds. (1984); “Animal Cell Culture” Freshney, R. I., ed. (1986); “Immobilized Cells and Enzymes” IRL Press, (1986); “A Practical Guide to Molecular Cloning” Perbal, B., (1984) and “Methods in Enzymology” Vol. 1-317, Academic Press; “PCR Protocols: A Guide To Methods And Applications”, Academic Press, San Diego, Calif. (1990); Marshak et al., “Strategies for Protein Purification and Characterization—A Laboratory Course Manual” CSHL Press (1996); all of which are incorporated by reference as if fully set forth herein. Other general references are provided throughout this document. The procedures therein are believed to be well known in the art and are provided for the convenience of the reader. All the information contained therein is incorporated herein by reference.
This Example examines the relationship between levels of serum metabolites and a rich resource of clinical parameters, dietary intake patterns, lifestyle measurements, human genetics and gut microbiota composition across a large healthy cohort. This Example demonstrates that using these features highly accurate out-of-sample predictions for over 1000 circulating serum metabolites can be obtained, with diet and gut microbiome having the highest predictive power, and being particularly predictive for unknown compounds. The inventors uncovered a list of associations between genetic loci and circulating blood metabolites and showed that we replicate several known links between specific SNPs and metabolites. By applying the prediction models of the present embodiments to an independent cohort of 31 participants, the inventors validated many of the associations. Using feature attribution analysis on the resulting predictive models, the inventors uncovered both known and novel associations between diet, gut microbiome and the levels of blood metabolites.
This Example demonstrates that many metabolites are exclusively explained by gut microbiome composition, highlighting its potential as their key determinant, and revealed the identities and predicted candidate structure of many unknown compounds which are highly predictable by the microbiome.
This Example also demonstrates that the uncovered associations are causal, as levels of metabolites were predicted to be positively associated with bread increased following a randomized clinical trial of bread intervention.
This Example concentrates on estimates computed via out-of-sample predictions, since such evaluation of performance is based only on unseen samples as the most strict and conservative estimate of performance. As such, the results presented herein constitute a lower bound for the amount of variance in metabolite levels that may be explained by the various features we examined.
The heterogeneity of the data is advantageous since its estimates do not depend on modeling assumptions.
All statistical and machine learning analyses were performed using Python (version 2.7.8).
Description of Cohorts
We analyzed banked samples from two previously collected cohorts25,48, for a total of 522 Israeli individuals. Studies were approved by Tel Aviv Sourasky Medical Center Institutional Review Board (IRB), approval numbers TLV-0658-12, TLV-0050-13 and TLV-0522-10; Kfar Shaul Hospital IRB, approval number 0-73. All participants signed written informed consent forms. Full study designs, including inclusion and exclusion criteria were described elsewhere25,48. In brief, participants in both studies were healthy individuals aged between 18 and 70. All participants answered detailed medical, lifestyle and nutritional questionnaires, provided stool and serum samples for metagenomic sequencing and metabolomics, were genotyped, underwent a comprehensive blood test, and for a period of at least one week, recorded all of their daily activities and nutritional intake in real-time using their smartphones with a specialized app provided to them48.
Feature Groups
The “diet” feature group includes answers for a detailed food frequency questionnaire (FFQ) aimed at capturing long term dietary habits, and the daily mean consumption of different food types, computed over a week based on real-time logging. In both cases we kept only items which were reported to be consumed at least once by at least 1% of our participants, resulting in 670 different food types from logging, and 141 different items from the FFQ.
The “macronutrients” feature group includes the daily mean consumption of macronutrients (lipids, proteins, carbohydrates), calories and water, calculated from real-time logging.
The “anthropometrics” feature group includes weight, BMI, waist and hips circumference, and waist to hips ratio (WHR).
The “cardiometabolic” feature group includes systolic and diastolic blood pressure, heart rate in beats per minute and a glycemic status as previously described30.
The “drugs” feature group includes 30 binary features representing the intake of 20 common medications as reported in questionnaires, in addition to 10 medication groups as previously described30. We included only drugs reported to be used by at least 1% of our participants.
The “clinical data” feature group includes the age and sex of the participants, and the following feature groups described above: anthropometrics, cardiometabolic, and drugs.
The “lifestyle” feature group includes smoking status (current, past), stress levels obtained from questionnaires, and the daily mean sleeping time, exercise time and midday sleep time based on real time logging.
The “time of day” feature is a binary feature indicating whether the sample was taken during the first half of the day.
The “seasonal effects” feature is the month in which the sample was taken. In some analyses we also grouped months by season (Winter: December-February; Spring: March-May; Summer: June-August; Fall: September-November).
The “microbiome” feature group includes bacterial relative abundance calculated both by considering coverage (see below), and by MetaPhlAn255, as well as the first 10 principal components computed over the log transformed relative abundance of a bacterial gene catalog56 as previously described30′57. Preprocessing steps are described below.
We further defined a full model that included all of the above.
Metabolomics Profiling and Preprocessing Metabolite concentrations were measured in serum samples by Metabolon, Inc., Durham, N.C., USA, by using an untargeted LC/MS platform as previously described6′58′59. A total of 540 serum samples were profiled, 19 of which were control samples (technical replicate) pooled from several individuals. The other 521 serum samples belonged to 491 participants.
We removed from further analysis 27 metabolites with less than 10 measurements across our cohort, and 54 metabolites that we found to have significantly different distributions in samples collected in two different recruitment centers (Mann-Whitney U p<0.05/1251; Bonferroni corrected). For the remaining 1170 metabolites, we performed robust standardization (subtracting the median and dividing by the standard deviation) over the log (base 10) transformed levels, followed by clipping outlier samples which were farther than 5 standard deviations. We next used two separate normalization schemes, one for single metabolites, which we subsequently used in the feature attribution analysis, and the second for metabolite groups, which we used for global and enrichment analyses.
For single metabolites, we regressed metabolite levels against storage times (only for metabolites present in at least 50 samples), and finally, imputed missing values as the minimum value per metabolite. For the second scheme, metabolites were grouped by correlation with a Spearman rho threshold of 0.85. This is done in order to handle possible bias resulting from uncertainty of metabolite assignments and a high rate of highly correlated mass spectrometry peaks, and resulted in 1067 metabolite groups, 982 of which are singletons. The value of the metabolite group was set to the mean. The category of each metabolite group was assigned based on majority vote, where unknown compounds were excluded from the vote unless all metabolites in the group were unknown.
Microbiome Preprocessing
Sample collection, DNA extraction, and sequencing of the samples in this study was described previously25,30,48 Briefly, we used only samples which were collected using swabs, filtered metagenomic reads containing Illumina adapters, filtered low-quality reads and trimmed low-quality read edges. We detected host DNA by mapping with GEM60 to the human genome (hg19) with inclusive parameters, and removed human reads. We subsampled all samples to have 10 million reads.
Bacterial relative abundance estimation was performed by mapping bacterial reads to species-level genome bins (SGB) representative genomes33. We selected all SGB representatives with at least 5 genomes in group, and for these representative genomes kept only unique regions as a reference data set. Mapping was performed using bowtie261 and abundance was estimated by calculating the mean coverage of unique genomic regions across the 50 percent most densely covered areas as previously described57′62. Feature names include the lowest taxonomy level identified.
Comparing Metabolomics to Lab Tests
We compared the levels of both creatinine and cholesterol which we previously obtained via standard lab tests25 with their metabolomic levels. Since the lab tests were performed by two different labs, we centered the tests by reducing from the value of each sample the mean of all tests taken in the lab in which it was performed. We then performed a standardization of the resulting measurements. The metabolomic profiling and the lab tests were performed on two samples taken at the same blood draw.
Correlation of Metabolic Profiles within and Between Individuals
We compared the levels of both creatinine and cholesterol which we previously obtained via standard lab tests25 with their metabolomic levels. Since the lab tests were performed by two different labs, we centered the tests by reducing from the value of each sample the mean of all tests taken in the lab in which it was performed. We then performed a standardization of the resulting measurements. The metabolomic profiling and the lab tests were performed on two samples taken at the same blood draw.
Predictive Models of Metabolite Groups
We used gradient boosting decision trees from the LightGBM (version 2.1.2) package27, in order to predict the levels of 1067 metabolite groups based on 7 feature groups in held-out subjects. In order to estimate the EV of each metabolite group we ran a 5-fold cross validation (CV) model using each feature group as input, and evaluated the results using Pearson correlation. For all prediction results we computed 95% confidence intervals and p-values via 1000 iterations of bootstrapping63. In each bootstrap iteration, we performed a random 5-fold cross validation, were in each fold we randomly sampled (with replacement) a group of subjects from the training set to have the same size as the current training set. We next used this set in order to train our model and evaluated the model's performance on the set of subjects in the remaining fold. Finally we computed the Pearson correlation between the measured values of the metabolite and the concatenation of the CV's predicted values as obtained from the bootstrapping iteration. We applied the Fisher transformation to the Pearson correlations we got from bootstrapping in order to induce normality64, and then computed a standard error, and estimated the p-values via the normal CDF using the Wald test65, such that our null hypothesis is that the correlations should distribute normally with zero mean. Confidence intervals were computed empirically from the bootstrapping correlations. We corrected p-values of predictions for multiple hypotheses using the Bonferroni procedure within each feature group (p<0.05/1067). In all CV and bootstrapping runs we used a fixed and predetermined set of hyperparameters (Table 5).
Testing for SNP Associations with Metabolites
Genotype processing and imputation of 413 individuals were described previously30. We performed genome wide associations for single metabolites (n=1170) and calculated the p-value and the estimated effect sizes using plink (v1.07). When declaring a genome-wide significance for the SNP-metabolite associations we used a conservative Bonferroni adjustment procedure to control for the false discovery rate due to the large number of SNPs tested (p<(5×10−8)/1170). We performed all genome wide associations using imputed genotypes. Results presented in
For the replication of SNP-metabolite associations from a previous study6 we correlated the EV of each metabolite from a model based on top significantly associated SNPs in the TwinsUK, and the effect size of the single top significantly associated SNP in this study. Only 301 metabolites which were measured in both studies were considered for analysis.
Pathway Category Enrichment Analysis
For each pathway category we used a Mann-Whitney U test comparing the prediction accuracy of metabolites from that category compared to prediction accuracy of metabolites from other categories. Direction of enrichment was determined by the sign of the Mann-Whitney U test statistic. We considered only metabolite groups for which at least one feature group had a significant prediction (after correcting for multiple hypothesis), resulting with 982 metabolite groups.
Validation of Metabolite Predictions
For every feature group, we trained a prediction model based solely on the samples from the main cohort, and evaluated its performance on the independent validation cohort. In all validation analyses we only considered 877 metabolite groups which were present in both the main and the validation cohort. We did not validate the associations of metabolites with time of day as all of our samples in the validation cohort were taken during the same time of the day.
Feature Attribution Analysis
We used SHAP (SHapley Additive exPlanations)34, a recently introduced framework for interpreting predictions, which assigns each feature an importance value for a particular prediction. Briefly, for a specific prediction, a feature's SHAP value is defined as the change in the expected value of the model's output when this feature is observed vs when it is missing. It is computed using a sum that represents the impact of each feature being added to the model averaged over all possible orderings of features being introduced.
Individual SHAP values were computed for held-out subjects in 5-fold CV using the module TreeExplainer (version 0.24.0)35′66, based on models trained only on features from the respective feature group. Before training, we standardized the levels of target metabolites, so that SHAP values from different models would be comparable (they are measured in the same units as the target). In each CV fold we ran a random hyperparameter search consistent of 10 iterations using the module RandomizedSearchCV from sklearn (version 0.20.4), and chose the best model for predicting the held out subjects and computing SHAP values. In all feature attribution analyses we used the ungrouped list of 1170 metabolites.
For every feature, we computed the mean absolute SHAP value across all instances in a specific model, reflecting the mean impact of each feature on the predictions and serving as a feature importance measure. We further used these values to compute directional mean absolute SHAP values, by multiplying them with the sign of the Spearman correlation between the population feature and the target. Here, positive values indicate that higher feature values lead, on average, to higher predicted values, while negative values indicate that lower feature values lead, on average, to lower predicted values.
When performing feature attribution analysis with gut microbiome data as input, we only included the relative abundance of SGB representative genomes as features, taking only features which were present in over 5% of the samples, resulting with 753 bacterial taxa. When using diet as input, we only considered features which were present in at least 5% of the samples, resulting with 398 food types from logging and items from the FFQ.
Comparing Gradient Boosting Decision Trees with a Linear Model
We compared the EV of every single metabolite obtained for a GBDT and a Lasso regression model. The EV of all models were calculated in 5-fold CV, where in each fold we ran a hyperparameter search consistent of 10 iterations as described above. We used LightGBM as the GBDT model, and Lasso regression (sklearn, version 0.20.4) as the linear model, since its regularization scheme is better suited for a large number of features, as in the case of diet and gut microbiome composition. Since GBDT handles missing values well, we first imputed all missing values as the median of each feature to assure a fair comparison. When applying the models on the microbiome data, we used log 10 transformed values.
Estimating Relative Predictive Power of Feature Groups
In order to estimate the relative predictive power of different feature groups we first applied a principal component analysis over the metabolite groups data to get the first 400 PCs which constitute >99% of the total variance in the data (
Identification of Unknown Metabolites by Metabolon
Identification of unknown metabolites was done as previously described29. Briefly, identification of tentative structural features for unknown biochemicals incorporates a detailed analysis of mass spec data, i.e., gathering information such as the accurate monoisotopic mass, the elution time and fragmentation pattern of the primary ion, and correlation to other molecules. The accurate monoisotopic mass is used to identify a likely structural formula for the unknown biochemical, which is then used to search against chemical structure databases. When a candidate structure fits the accurate monoisotopic mass and fragmentation data, an authentic standard is commercially purchased or synthesized (when possible). Conformation of a proposed structure is based on a match to three primary criteria, including co-elution with the unknown molecule of interest, and a high degree match to both the accurate monoisotopic mass and fragmentation pattern.
Interaction Networks
We used a graphical layout in order to visualize the associations of features with the levels of metabolites. The nodes are either metabolites or features, and the edges are the directional mean absolute SHAP values computed from models trained only on features from the respective feature group as described above. All networks were constructed using Cytoscape67. The threshold for presenting SHAP values as edges was determined as 0.12, keeping the network sparse enough for convenience of visualization.
Analysis of Bread Intervention
In order to find the associations between metabolite levels and the consumption of both types of bread in the study cohort we computed the directional mean absolute SHAP values of the reported consumption of both white and whole-wheat bread for all metabolites. The SHAP values were computed in cross validation from models based only on the reported consumption of each type of bread. We ranked the metabolites according to their directional mean absolute SHAP value for each type of bread and used the top 5% positively and negatively driven metabolites for further analysis. The prediction models were constructed using 458 samples of distinct individuals, a subset of our cohort from which we excluded all samples of individuals which participated in the intervention study.
For each metabolite in every individual, we computed the FC of metabolite levels between the samples taken at the end of the first week of intervention and the start of that week. Prior to computing FC we imputed missing values with the minimum per metabolite and standardized their log (base 10) transformed levels. Furthermore, for each intervention group, we computed the mean FC of every metabolite based on the 10 samples from that group. We then compared the mean FC of the top 5% positively and negatively driven metabolites mentioned above within each intervention group by performing a rank sum test (Mann-Whitney U) over the mean FC.
For comparing the FC of betaine and cytosine between the two intervention groups, we used a Mann-Whitney U test.
LMM-Based Estimates of the Explained Variance of Metabolites Using Gut Microbiome
For the in-sample estimation of EV for metabolites based on gut microbiome we used a linear mixed model framework that we had recently developed30. Briefly, we used GCTA68, a tool used in statistical genetics for the estimating of SNP-based genetic kinship. Instead of a matrix of host SNPs, as is commonly used in GCTA, we used a kinship matrix computed over the presence-absence of microbial species which were also used as features in the out-of-sample prediction models. We added the storage time as a covariate to the model. P-values were computed using RL-SKAT69.
Accurate and Reproducible Untargeted Serum Metabolomics from a Deeply Phenotyped Human Cohort
We used mass spectrometry to profile 521 serum samples from 491 healthy individuals for whom we previously collected extensive clinical data, anthropometrics measurements, cardiometabolic parameters, medication data, lifestyle, genetics, gut microbiome, dietary logging and answers to clinical and nutritional questionnaires25 (
To test whether our measurements accurately report metabolite levels, we compared the metabolomic levels of creatinine and cholesterol to measurements of these compounds using standardized lab tests (Methods) performed separately on different blood samples taken from the same individual on a single visit, and found excellent agreement (R=0.87, creatinine; R=0.79, cholesterol,
Diet, Microbiome, and Clinical Data Predict the Levels of Most Serum Metabolites
To estimate the extent to which metabolites can be predicted by the wealth of data we collected, we devised machine learning algorithms that predict the levels of each metabolite in held-out subjects (out-of-sample 5-fold cross validation prediction). One exception was human genetics, for which we considered the explained variance (EV) of each metabolite as that of the single most associated SNP (Methods). For prediction, we used gradient boosting decision trees27 (GBDT; Methods) as these are powerful models which perform well in many different settings and can capture nonlinear interactions which are likely to be present in such a heterogeneous feature space and within the high dimensionality of the diet and microbiome data. We found that GBDT systematically outperformed linear models (Lasso; Methods), with a median and maximum EV gain of 3.3 and 38%, respectively, for prediction with diet data and 4.3 and 13% for prediction with microbiome data. (
To understand whether specific feature groups better predict certain types of metabolites, we checked, for each feature group, whether any type of metabolites was enriched with superior predictions (
We next asked whether different feature groups predict metabolites with similar accuracy, by computing the correlation between the accuracy of metabolite predictions of every pair of input feature groups (
Taken together, our results show that we can devise statistically significant predictions for most serum metabolites using diet, gut microbiome, or other lifestyle and clinical parameters, with each feature group being especially informative with respect to a different set of metabolites. We next wished to estimate the general predictive power of each feature group across all measured serum metabolites. We built models predicting the principal components of the metabolomics data (
Metabolite Predictions Replicate in an Independent Cohort
To test the robustness and reproducibility of our associations, we used the following approaches.
Firstly, we asked whether our cohort replicates significant associations between metabolite levels and body mass index (BMI) that were recently reported28, and found that most of these associations replicated with high accuracy (Pearson R=0.85, p<10−10,
Secondly, we applied the same metabolomic profiling to an independent cohort of 31 individuals for which we also obtained identical measurements to those we had on the main cohort, including diet and gut microbiome data. Data from this additional cohort were not available to us while developing the prediction models. Notably, using our models, trained only on samples from our main cohort, for metabolites significantly predicted in our main cohort, we obtained predictions with similar accuracy on samples from this independent validation cohort. Specifically, for both diet and gut microbiome data, we found high agreement between the prediction accuracy and the overall predictive power of our models in the main cohort and in the replication cohort (Pearson R=0.59, p<10−18, microbiome; R=0.60, diet, p<10−20;
Thirdly, the model of the present embodiments was applied, without modification, to an independent cohort from the United Kingdom [UK Adult Twin Registry, www(dot)twinsuk(dot)ac(dot)uk].
Novel Associations Between Human Genetics and Circulating Blood Metabolites
Several studies found that human genetics affect serum metabolites6,7,29. In this study we measured hundreds of novel molecules which were not yet identified in previously published studies including both serum metabolomics and human genetics, and therefore set to look for novel associations between single nucleotide polymorphisms (SNPs) and serum metabolites levels. Notably, we found 553 statistically significant associations with genetic for 67 metabolites (p<5×10−11), many of which are novel. This includes the unknown metabolite X-24809 which was associated with rs4539242 that alone explained 52% of its variance. To further validate our results, we set to replicate previous reported associations between SNPs and the levels of circulating blood metabolites. Among the 529 metabolites analysed in a previous large study which included 7824 individuals6, 301 were also measured by us using the same MS platform (Metabolon, inc.; Methods), and 111 of them were reported to have significant associations with SNPs. Due to the difference in cohort sizes, we were limited in terms of the statistical power needed for the replication of relatively small effect variants. Overall, we found a high correlation between the EV of a model based on top significantly associated SNPs in the previous study and a model based on the single top associated SNP in our study (Pearson R=0.73, p<10−20;
Diet and Gut Microbiome Data Independently Explain a Wide Range of Metabolites
Diet and gut microbiome had the largest predictive power and there is a significant correlation in the metabolites that they each predicted well (
We next sought to interpret the diet and gut microbiome models and ask which dietary features and bacterial taxa drive the predictions of each metabolite. Our diet data consists of both answers to food frequency questionnaires and one week of dietary logging collected in real-time via a mobile App we devised25, and thus allows us to address the predictive power of both long term and short term nutritional patterns. The gut microbiome composition is represented as relative abundance of bacterial species and we estimated it based on high depth metagenomic sequencing followed by mapping to a unique and comprehensive microbial database that was recently published33 (Methods). In order to explain the output of our machine learning models and find specific associations between features and metabolite levels we used SHAP (SHapley Additive exPlanations)34, a feature attribution analysis tool which assigns each feature an importance value (SHAP value) for a particular prediction35 (Methods). Shapley values based analysis in gut microbiome data was recently demonstrated to be useful, as it allowed for the estimation of complex contributions of gut microbiome taxa to functional shifts, while maintaining global community composition properties36.
We found dozens of diet features and bacterial taxa that were strongly predictive of blood metabolites in our models (
As a more global view, we next asked whether a few bacterial features are important for the prediction of many metabolites, or whether metabolite prediction is specific to several unique important taxa. To this end, for each metabolite we defined its main predictor as the bacterial taxa with the maximal mean absolute SHAP value. We found that 19 bacterial taxa were the main predictors for the top 50 predicted metabolites (Prediction R>0.4; Table 7). One bacterial feature from the Clostridiceae family was the main predictor of 22 of these metabolites which are also strongly associated with coffee consumption in diet-based models. Clostridium sp. CAG:138 was the main predictor of 5 metabolites, including 3 unknown compounds, phenylacetylcarnitine (R=0.47, p<10−20) and p-cresol-glucuronide (R=0.64, p<10−20) which was previously reported to be metabolized by Clostridium43. Furthermore, 6 bacterial features were the main predictors of 2 metabolites each, and each of the other 11 bacterial features was a main predictor of a single metabolite. Hence, in most cases many specific bacteria are required in order to accurately predict the levels of distinct metabolites, but in some cases a single bacteria might underlie the predictions of a broad metabolic pathway involving dozens of metabolites. In terms of higher bacterial taxonomy levels, among the bacterial features that best predicted the top 100 metabolites, 89 belonged to Firmicutes, 4 to Actinobacteria and 7 to an unknown phylum, showing the strong predictive power of Firmicutes. Interestingly, although Bacteroidetes is the second most abundant phylum in our cohort (
We next asked whether these single best predictors are sufficient for the accurate prediction of each metabolite or whether additional information regarding the composition of the gut microbiome is needed. To this end, for each metabolite we compared the results from a full model of the microbiome to a prediction model based only on the strongest predictor (
torques
torques
We also explored which metabolites were best explained by gut microbiome data. For each of the metabolite groups which were significantly predicted using the gut microbiome we computed a score between 0 and 1, representing the fraction of variance that the microbiome data model explains out of that explained by the sum of the microbiome model and the next best model from the feature groups except microbiome. For 80 microbiome predicted metabolite groups, the score was higher than 0.5, indicating that microbiome had the highest predictive power among all feature groups tested (Table 8).
Identification and Candidate Structures of Microbiome-Related Unknown Compounds
Metabolites that are accurately predicted by the gut microbiome are of particular interest as they may be modulated by perturbing the bacterial community. Since many of the metabolites that were predicted by the gut microbiome with high accuracy are unknown, we sought their identification. Here we provide the chemical identification of 11 compounds and candidate structures for 19 other compounds previously tagged as unknown (Table 9). Among these metabolites are some of those that are predicted by the microbiome with the highest accuracy, including X-11850, X-12261 and X-11843. These were all predicted with R2>0.45 using the microbiome, and are likely to be derivatives of aromatic amino acids, a class of molecules known to be metabolized by the gut microbiome. This list constitutes a major step towards mapping the metabolic producing and modulating potential of the human gut microbiome.
In Table 9, names of unknown compounds as provided by Metabolon Inc along with their new identification and candidate structures are provided. Microbiome R2 is the EV of each metabolite as estimated by a prediction model based on gut microbiome data
Networks of Interactions Between Features Explain Diverse Metabolites
As multiple metabolites were significantly predicted using more than one feature group, we next examined how different feature groups interact in explaining the levels of these metabolites. By building separate predictive models each based on a different feature group and using SHAP in order to estimate the impact of each specific feature on the output of the models, we uncovered a dense network of interactions between feature groups in explaining metabolite levels (
As mentioned above, we found that the reported consumption of coffee was linked to a large number of metabolites, most of which are unknown compounds and xenobiotics from the xanthine metabolism pathway. Notably, we found that a specific bacterial species from the Clostridiales order was linked to a large number of these metabolites (
We next focused on metabolites which were significantly explained using seasonal effects, and examined which dietary features interact with them (
Finally, we explored some known examples of associations between metabolites and features to further validate the quality of data in our cohort (
Metabolites Explained by Bread Increase Following a Bread Consumption Intervention
As a proof of concept examining whether some of the feature-metabolite interactions we uncovered may be causal, we profiled the serum metabolome of samples from a randomized cross-over trial that we previously conducted48, in which we compared the effects of consuming artisanal whole-grain sourdough bread (hereinafter, “sourdough bread”) to those of industrial white bread made from refined wheat (“white bread”). Twenty healthy subjects were randomly divided into two groups of 10, who then underwent a 1-week-long dietary intervention of increased bread consumption, where each group received a different type of bread. Following two weeks of washout, the intervention was performed again, switching bread types between the groups. (
We used the healthy cohort of 458 participants for which we had one week of logged normal diet, without any intervention (
Some of the metabolites which increased in levels following the sourdough bread intervention were previously reported to be linked to the consumption of whole-grain wheat flour. A notable example is betaine, an amino acid which has been shown to protect internal organs, improve vascular risk factors49 and is also known to be highly abundant in a wide variety of foods, of which wheat bran and wheat germ are the highest naturally occurring sources50,51. We found that in the group that received sourdough bread the mean fold-change in betaine levels was 6.16, while the mean fold-change in the group that received white bread was 0.82 (Mann-Whitney U p<0.004;
We also performed a similar analysis using metabolites that were associated with white bread consumption in our cohort, but did not find significant changes in these metabolites in the bread intervention study, potentially stemming from high white wheat consumption in the typical diet before the intervention. Overall, these results suggest that some of the associations that we found between the consumption of whole-wheat bread and the levels of metabolites in our larger cohort might be causal, as their levels increase following a dietary intervention that increased the consumption of whole-wheat bread.
Table 10 provides the sequence identifier for the metagenomic sequences of the unknown bacteria.
Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims.
All publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention. To the extent that section headings are used, they should not be construed as necessarily limiting.
In addition, any priority document(s) of this application is/are hereby incorporated herein by reference in its/their entirety.
Number | Date | Country | Kind |
---|---|---|---|
264581 | Jan 2019 | IL | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IL2020/050121 | 1/30/2020 | WO | 00 |