The present invention encompasses methods and arrays associated with body fat and/or weight loss.
A paper copy of the sequence listing and a computer readable form of the same sequence listing are appended below and herein incorporated by reference. Additionally, the sequence listing filed with the provisional application is also hereby incorporated by reference.
According to the Centers for Disease Control (CDC), over sixty percent of the United States population is overweight, and greater than thirty percent are obese. This translates into more than 50 million adults in the United States with a Body Mass Index (BMI) of 30 or above. Obesity is also a worldwide health problem with an estimated 500 million overweight adult humans [body mass index (BMI) of 25.0-29.9 kg/m2] and 250 million obese adults (Bouchard, C (2000) N Engl J Med. 343, 1888-9). This epidemic of obesity is leading to worldwide increases in the prevalence of obesity-related disorders, such as diabetes, hypertension, cardiac pathology, and non-alcoholic fatty liver disease (NAFLD; Wanless, and Lentz (1990) Hepatology 12, 1106-1110. Silverman, et al, (1990). Am. J. Gastroenterol. 85, 1349-1355; Neuschwander-Tetri and, Caldwell (2003) Hepatology 37, 1202-1219). According to the National Institute of Diabetes, Digestive and Kidney Diseases (NIDDK) approximately 280,000 deaths annually are directly related to obesity. The NIDDK further estimated that the direct cost of healthcare in the U.S. associated with obesity is $51 billion. In addition, Americans spend $33 billion per year on weight loss products. In spite of this economic cost and consumer commitment, the prevalence of obesity continues to rise at alarming rates. From 1991 to 2000, obesity in the U.S. grew by 61%.
Although the physiologic mechanisms that support development of obesity are complex, the medical consensus is that the root cause relates to an excess intake of calories compared to caloric expenditure. While the treatment seems quite intuitive, dieting is not an adequate long-term solution for most people; about 90 to 95 percent of persons who lose weight subsequently regain it. Although surgical intervention has had some measured success, the various types of surgeries have relatively high rates of morbidity and mortality.
Pharmacotherapeutic principles are limited. In addition, because of undesirable side effects, the FDA has had to recall several obesity drugs from the market. Those that are approved also have side effects. Currently, two FDA-approved anti-obesity drugs are orlistat, a lipase inhibitor, and sibutramine, a serotonin reuptake inhibitor. Orlistat acts by blocking the absorption of fat into the body. An unpleasant side effect with orlistat, however, is the passage of undigested oily fat from the body. Sibutramine is an appetite suppressant that acts by altering brain levels of serotonin. In the process, it also causes elevation of blood pressure and an increase in heart rate. Other appetite suppressants, such as amphetamine derivatives, are highly addictive and have the potential for abuse. Moreover, different subjects respond differently and unpredictably to weight-loss medications.
Because surgical and pharmacotherapy treatments are problematic, new non-cognitive strategies are needed to prevent and treat obesity and obesity-related disorders.
One aspect of the present invention encompasses an array comprising a substrate. The substrate has disposed thereon at least one nucleic acid indicative of, or modulated in, an obese host microbiome compared to a lean host microbiome. Alternatively, the substrate has disposed thereon at least one nucleic acid indicative of, or modulated in, a lean host microbiome compared to an obese host microbiome.
Another aspect of the present invention encompasses an array comprising a substrate. The substrate has disposed thereon at least one polypeptide indicative of, or modulated in, an obese host microbiome compared to a lean host microbiome. Alternatively, the substrate has disposed thereon at least one polypeptide indicative of, or modulated in, a lean host microbiome compared to an obese host microbiome.
Yet another aspect of the invention encompasses a method for modulating body fat or for modulating weight loss in a subject. The method typically comprises altering the microbiota population in the subject's gastrointestinal tract by modulating the relative abundance of Actinobacteria. In some embodiments, the relative abundance is increased, in other embodiments, the relative abundance is decreased.
Still another aspect of the invention encompasses a composition. The composition usually comprises an antibiotic having efficacy against Actinobacteria but not against Bacteroidetes; and a probiotic comprising Bacteroidetes.
Other aspects and iterations of the invention are described more thoroughly below.
The application file contains at least one photograph executed in color. Copies of this patent application publication with color photographs will be provided by the Office upon request and payment of the necessary fee.
It has been discovered, as demonstrated in the Examples, that there is a relationship between the human gut microbiota and obesity. In particular, an obese human subject typically has fewer Bacteroidetes and more Actinobacteria compared to a lean subject. In some embodiments, an obese human subject has proportionately fewer Bacteroidetes and more Actinobacteria and Firmicutes compared to a lean subject. Taking advantage of these discoveries, the present invention provides compositions and methods to regulate energy balance in a subject. In particular, the invention provides nucleic acid sequences that are associated with obesity in humans. These sequences may be used as diagnostic or prognostic biomarkers for obesity risk, biomarkers for drug discovery, biomarkers for the discovery of therapeutic targets involved in the regulation of energy balance, and biomarkers for the efficacy of a weight loss program.
The energy balance of a subject may be modulated by altering the subject's gut microbiota population. Generally speaking, to decrease energy harvesting, decrease body fat, or promote weight loss, the relative abundance of bacteria within the Bacteroidetes phylum (phylum is also known as a ‘division’) is increased and optionally, the relative abundance of bacteria within the Actinobacteria and/or Firmicutes phylum is decreased. Alternatively, to increase energy harvesting, to increase body fat, or promote weight gain, the relative abundance of Bacteroidetes is decreased and optionally, the relative abundance of Actinobacteria and/or Firmicutes is increased. Additional agents may also be utilized to achieve either weight loss or weight gain. Examples of these agents are detailed in section I(d).
The relative abundance of Bacteroidetes may be altered by increasing or decreasing the presence of one or more Bacteroidetes species that reside in the gut. Additionally, non-limiting examples of species may include B. thetaiotaomicron, B. vulgatus, B. ovatus, P. distasonis, B. uniformis, B. stercoris, B. eggerthii, B. merdae, and B. caccae. In one embodiment, the population of B. thetaiotaomicron is altered. In still another embodiment, the population of B. vulgatus is altered. In an additional embodiment, the population of B. ovatus is altered. In another embodiment, the population of P. distasonis is altered. In yet another embodiment, the population of B. uniformis is altered. In an additional embodiment, the population of B. stercoris is altered. In a further embodiment, the population of B. eggerthii is altered. In still another embodiment, the population of B. merdae is altered. In another embodiment, the population of B. caccae is altered. In a further embodiment, the species within the Bacteroidetes phylum may be as of yet unnamed.
The present invention also includes altering various combinations of Bacteroidetes species, such as at least two species, at least three species, at least four species, at least five species, at least six species, at least seven species, at least eight species, at least nine species, at least ten Bacteroidetes species, or more than ten species of Bacteroidetes. For example, the combination of B. thetaiotaomicron, B. vulgatus, B. ovatus, P. distasonis, and B. uniformis may be altered.
In an exemplary embodiment, the relative abundance of Bacteroidetes is increased to decrease energy harvesting, decrease body fat, or promote weight loss in a subject. Increased abundance of Bacteroidetes in the gut may be accomplished by several suitable means generally known in the art. In one embodiment, a food supplement that increases the abundance of Bacteroidetes may be administered to the subject. By way of example, one such food supplement is psyllium husks as described in U.S. Patent Application Publication No. 2006/0229905, which is hereby incorporated by reference in its entirety. In an exemplary embodiment, a probiotic comprising one or more Bacteroidetes species or strains may be administered to the subject. The amount of probiotic administered to the subject can and will vary depending upon the embodiment. The probiotic may comprise from about one thousand to about ten billion cfu/g (colony forming units per gram) of the total composition, or of the part of the composition comprising the probiotic. In one embodiment, the probiotic may comprise from about one hundred million to about 10 billion organisms. The probiotic microorganism may be in any suitable form, for example in a powdered dry form. In addition, the probiotic microorganism may have undergone processing in order for it to increase its survival. For example, the microorganism may be coated or encapsulated in a polysaccharide, fat, starch, protein or in a sugar matrix. Standard encapsulation techniques known in the art can be used. For example, techniques discussed in U.S. Pat. No. 6,190,591, which is hereby incorporated by reference in its entirety, may be used.
Alternatively, the relative abundance of Bacteroidetes is decreased to increase energy harvesting, increase body fat, or promote weight gain in a subject. Decreased abundance of Bacteroidetes in the gut may be accomplished by several suitable means generally known in the art. In one embodiment, an antibiotic having efficacy against Bacteroidetes may be administered. Generally speaking, antimicrobial agents may target several areas of bacterial physiology: protein translation, nucleic acid synthesis, cell wall synthesis or potentially, the polysaccharide acquisition machinery. In an exemplary embodiment, the antibiotic will have efficacy against Bacteriodetes but not against Firmicutes. The susceptibility of the targeted species to the selected antibiotics may be determined based on culture methods or genome screening.
It is contemplated that the abundance of gut Bacteroidetes within an individual subject may be altered (i.e., increased or decreased) from about a couple fold difference to about a hundred fold difference or more, depending on the desired result (i.e., increased energy harvesting (weight gain) or decreased energy harvesting (weight loss)) and the individual subject. A method for determining the relative abundance of gut Bacteroidetes is described in the examples, alternatively, an array of the invention, described below, may be used to determine the relative abundance.
Stated another way, it is contemplated that the abundance of gut Bacteroidetes within an individual subject may be altered (i.e., increased or decreased) from about 1% to about 100% or more depending on the desired result (i.e., increased energy harvesting (weight gain) or decreased energy harvesting (weight loss)) and the individual subject. For weight loss, the abundance may be altered by an increase of from about 20% to about 100%, from about 30% to about 100%, from about 40% to about 100%, from about 50% to about 100%, from about 60% to about 100%, from about 70% to about 100%, from about 80% to about 100%, or from about 90% to 100%. A method for determining the relative abundance of gut Bacteroidetes is described in the examples, alternatively, an array of the invention, described below, may be used to determine the relative abundance.
The relative abundance of Actinobacteria may be altered by increasing or decreasing the presence of one or more species that reside in the gut. Representative, non-limiting species include B. longum, B. breve, B. catenulatum, B. dentium, B. gallicum, B. pseudocatenulatum, C. aerofaciens, C. stercoris, C. intestinalis, and S. variabile.
In an exemplary embodiment, the relative abundance of Actinobacteria is decreased to decrease energy harvesting, decrease body fat, or promote weight loss in a subject. Decreased abundance of Actinobacteria in the gut may be accomplished by several suitable means generally known in the art. In one embodiment, an antibiotic having efficacy against Actinobacteria may be administered. In an exemplary embodiment, the antibiotic will have efficacy against Actinobacteria but not against Bacteriodetes. The susceptibility of the targeted species to the selected antibiotics may be determined based on culture methods or genome screening.
Alternatively, the relative abundance of Actinobacteria is increased to increase energy harvesting, increase body fat, or promote weight gain in a subject. Increased abundance of Actinobacteria in the gut may be accomplished by several suitable means generally known in the art. In an exemplary embodiment, a probiotic comprising one or more Actinobacteria strains or species may be administered to the subject.
It is contemplated that the abundance of gut Actinobacteria may be altered (i.e., increased or decreased) from about a couple fold difference to about a hundred fold difference or more, depending on the desired result (i.e., increased energy harvesting (weight gain) or decreased energy harvesting (weight loss)). A method for determining the relative abundance of gut Actinobacteria is described in the examples.
Stated another way, it is contemplated that the abundance of gut Actinobacteria may be altered (i.e., increased or decreased) from about 1% to about 100% or more depending on the desired result (i.e., increased energy harvesting (weight gain) or decreased energy harvesting (weight loss)). For weight loss, the abundance may be altered by a decrease of from about 20% to about 100%, from about 30% to about 100%, from about 40% to about 100%, from about 50% to about 100%, from about 60% to about 100%, from about 70% to about 100%, from about 80% to about 100%, or from about 90% to 100%. A method for determining the relative abundance of gut Actinobacteria is described in the examples.
The relative abundance of Firmicutes may be altered by increasing or decreasing the presence of one or more species that reside in the gut. Representative species include species from Clostridia, Bacilli, and Mollicutes. In one embodiment, the relative abundance of one or more Clostridia species is altered. In another embodiment, the relative abundance of one or more Bacilli species is altered. In yet another embodiment, the relative abundance of one or more Mollicutes species is altered. It is also contemplated that the relative abundance of several species of Firmicutes may be altered without departing from the scope of the invention. By way of non-limiting examples, a combination of one or more Clostridia species, one or more Bacilli species, and one or more Mollicutes species may be altered. In a further embodiment, the species within the Firmicutes phylum may be as of yet unnamed.
In some embodiments, the Mollicutes class is altered. For instance, E. dolichum, E. cylindroides, E. biforme, or C. innocuum may be altered. In one embodiment, the species of the Mollicutes class may possess the genetic information to create a cell wall. In another embodiment, the species of the Mollicutes class may produce a cell wall. In a further embodiment, the species within the class Mollicutes may be as of yet unnamed.
In an exemplary embodiment, the relative abundance of Firmicutes is decreased to decrease energy harvesting, decrease body fat, or promote weight loss in a subject. Decreased abundance of Firmicutes in the gut may be accomplished by several suitable means generally known in the art. In one embodiment, an antibiotic having efficacy against Firmicutes may be administered. In an exemplary embodiment, the antibiotic will have efficacy against Firmicutes but not against Bacteriodetes. In another exemplary embodiment, the antibiotic will have efficacy against Mollicutes, but not Bacteriodetes. The susceptibility of the targeted species to the selected antibiotics may be determined based on culture methods or genome screening.
Alternatively, the relative abundance of Firmicutes is increased to increase energy harvesting, increase body fat, or promote weight gain in a subject. Increased abundance of Firmicutes in the gut may be accomplished by several suitable means generally known in the art. In an exemplary embodiment, a probiotic comprising Firmicutes may be administered to the subject.
It is contemplated that the abundance of gut Firmicutes may be altered (i.e., increased or decreased) from about a about a couple fold difference to about a hundred fold difference or more, depending on the desired result (i.e., increased energy harvesting (weight gain) or decreased energy harvesting (weight loss)). A method for determining the relative abundance of gut Firmicutes is described in the examples.
Stated another way, it is contemplated that the abundance of gut Firmicutes may be altered (i.e., increased or decreased) from about 1% to about 100% or more depending on the desired result (i.e., increased energy harvesting (weight gain) or decreased energy harvesting (weight loss)). For weight loss, the abundance may be altered by a decrease of from about 20% to about 100%, from about 30% to about 100%, from about 40% to about 100%, from about 50% to about 100%, from about 60% to about 100%, from about 70% to about 100%, from about 80% to about 100%, or from about 90% to 100%. A method for determining the relative abundance of gut Firmicutes is described in the examples.
Another aspect of the invention encompasses a combination therapy to regulate fat storage, energy harvesting, and/or weight loss or gain in a subject. In an exemplary embodiment, a combination for decreasing energy harvesting, decreasing body fat or for promoting weight loss is provided. For this embodiment, a composition comprising an antibiotic having efficacy against Firmicutes and/or Actinobacteria but not against Bacteroidetes; and a probiotic comprising Bacteroidetes may be administered to the subject. Additionally, an anti-archaeal compound may be included in the aforementioned composition to reduce the representation of gut methanogens and the efficiency of methanogenesis, thereby reducing the efficiency of fermentation of dietary polysaccharides by saccharolytic bacteria, such as Bacteroidetes. Other agents that may be included with the aforementioned composition are detailed below.
The compositions utilized in this invention may be administered by any number of routes including, but not limited to, oral, intravenous, intramuscular, intraarterial, intramedullary, intrathecal, intraventricular, pulmonary, transdermal, subcutaneous, intraperitoneal, intranasal, enteral, topical, sublingual, or rectal means. The actual effective amounts of compounds comprising a weight loss composition of the invention can and will vary according to the specific compounds being utilized, the mode of administration, and the age, weight and condition of the subject. Dosages for a particular individual subject can be determined by one of ordinary skill in the art using conventional considerations. Those skilled in the art will appreciate that dosages may also be determined with guidance from Goodman & Gilman's The Pharmacological Basis of Therapeutics, Ninth Edition (1996), Appendix II, pp. 1707-1711 and from Goodman & Gilman's The Pharmacological Basis of Therapeutics, Tenth Edition (2001), Appendix II, pp. 475-493.
i. Fiaf Polypeptide
A composition of the invention for promoting weight loss may optionally include either increasing the amount of a Fiaf polypeptide or the activity of a Fiaf polypeptide. Typically, a suitable Fiaf polypeptide is one that can substantially inhibit LPL when administered to the subject. Several Fiaf polypeptides known in the art are suitable for use in the present invention. Generally speaking, the Fiaf polypeptide is from a mammal. By way of non-limiting example, suitable Fiaf polypeptides and nucleotides are delineated in Table A.
Homo sapiens
Mus musculus
Rattus norvegicus
Sus scrofa
Bos taurus
Pan troglodytes
In certain aspects, a polypeptide that is a homolog, ortholog, mimic or degenerative variant of a Fiaf polypeptide is also suitable for use in the present invention. In particular, the subject polypeptide will typically inhibit LPL when administered to the subject. A variety of methods may be employed to determine whether a particular homolog, mimic or degenerative variant possesses substantially similar biological activity relative to a Fiaf polypeptide. Specific activity or function may be determined by convenient in vitro, cell-based, or in vivo assays, such as measurement of LPL activity in white adipose tissue. In order to determine whether a particular Fiaf polypeptide inhibits LPL, the procedure detailed in the examples of U.S. Patent Application No. 20050239706, which is hereby incorporated by reference in its entirety, may be followed.
Fiaf polypeptides suitable for use in the invention are typically isolated or pure and are generally administered as a composition in conjunction with a suitable pharmaceutical carrier, as detailed below. A pure polypeptide constitutes at least about 90%, preferably, 95% and even more preferably, at least about 99% by weight of the total polypeptide in a given sample.
The Fiaf polypeptide may be synthesized, produced by recombinant technology, or purified from cells using any of the molecular and biochemical methods known in the art that are available for biochemical synthesis, molecular expression and purification of the Fiaf polypeptides [see e.g., Molecular Cloning, A Laboratory Manual (Sambrook, et al. Cold Spring Harbor Laboratory), Current Protocols in Molecular Biology (Eds. Ausubel, et al., Greene Publ. Assoc., Wiley-Interscience, New York)].
The invention also contemplates use of an agent that increases Fiaf transcription or its activity. For example, an agent may be delivered that specifically activates Fiaf expression: this agent may be a natural or synthetic compound that directly activates Fiaf gene transcription, or indirectly activates expression through interactions with components of host regulatory networks that control Fiaf transcription. Suitable agents may be identified by methods generally known in the art, such as by screening natural product and/or chemical libraries using the gnotobiotic zebrafish model described in the examples of U.S. Patent Application No. 20050239706. In another embodiment, a chemical entity may be used that interacts with Fiaf targets, such as LPL, to reproduce the effects of Fiaf (e.g., in this case inhibition of LPL activity). In an alternative of this embodiment, administering a Fiaf agonist to the subject may increase Fiaf expression and/or activity. In one embodiment, the Fiaf agonist is a peroxisome proliferator-activated receptor (PPARs) agonist. Suitable PPARs include PPARα, PPARβ/δ, and PPARγ. Fenofibrate is another suitable example of a Fiaf agonist. Additional suitable Fiaf agonists and methods of administration are further described in Manards, et al., J. Biol Chem, 279, 34411 (2004), and U.S. Patent Publication No. 2003/0220373, which are both hereby incorporated by reference in their entirety.
ii. Other Compounds
The compositions of the invention that decrease energy harvesting, decrease body fat, or promote weight loss may also include several additional agents suitable for use in weight loss regimes. Generally speaking, exemplary combinations of therapeutic agents may act synergistically to decrease energy harvesting, decrease body fat, or promote weight loss. Using this approach, one may be able to achieve therapeutic efficacy with lower dosages of each agent, thus reducing the potential for adverse side effects. In one embodiment, acarbose may be administered with a composition of the invention. Acarbose is an inhibitor of α-glucosidases and is required to break down carbohydrates into simple sugars within the gastrointestinal tract of the subject. In another embodiment, an appetite suppressant, such as an amphetamine, or a selective serotonin reuptake inhibitor, such as sibutramine, may be administered with a composition of the invention. In still another embodiment, a lipase inhibitor such as orlistat, or an inhibitor of lipid absorption such as Xenical, may be administered with a composition of the invention.
iii. Restricted Calorie Diet
Optionally, in addition to administration of a composition of the invention for weight loss, a subject may also be placed on a restricted calorie diet. Restricted calorie diets maybe helpful for increasing the relative abundance of Bacteroidetes and decreasing the relative abundance of Firmicutes and/or Actinobacteria. Several restricted calorie diets known in the art are suitable for use in combination with the compositions of the invention. Representative diets include a reduced fat diet, reduced protein, or a reduced carbohydrate diet.
iv. Alteration of the Gastrointestinal Archaeon Population
An anti-archaeal compound may be included in a composition of the invention to decrease energy harvesting, decrease fat storage, and/or decrease weight gain. To promote weight loss in a subject, the gut archaeon population is altered such that microbial-mediated carbohydrate metabolism or its efficiency is decreased in the subject, whereby decreasing microbial-mediated carbohydrate metabolism or its efficiency promotes weight loss in the subject.
Accordingly, in one embodiment, the subject's gastrointestinal archaeal population is altered so as to promote weight loss in the subject. Typically, the presence of at least one genera of archaeon that resides in the gastrointestinal tract of the subject is decreased. In most embodiments, the archaeon is generally a mesophilic methanogenic archaea. In one alternative of this embodiment, the presence of at least one species from the genera Methanobrevibacter or Methanosphaera is decreased. In another alternative embodiment, the presence of Methanobrevibacter smithii is decreased. In still another embodiment, the presence of Methanosphaera stadtmanae is decreased. In yet another embodiment, the presence of a combination of archaeon genera or species is decreased. By way of non-limiting example, the presence of Methanobrevibacter smithii and Methanosphaera stadtmanae is decreased.
To decrease the presence of any of the archaeon detailed above, methods generally known in the art may be utilized. In one embodiment, a compound having anti-microbial activities against the archaeon is administered to the subject. Non-limiting examples of suitable anti-microbial compounds include metronidzaole, clindamycin, tinidazole, macrolides, and fluoroquinolones. In another embodiment, a compound that inhibits methanogenesis by the archaeon is administered to the subject. Non-limiting examples include 2-bromoethanesulfonate (inhibitor of methyl-coenzyme M reductase), N-alkyl derivatives of para-aminobenzoic acid (inhibitor of tetrahydromethanopterin biosynthesis), ionophore monensin, nitroethane, lumazine, propynoic acid and ethyl 2-butynoate. In yet another embodiment, a hydroxymethylglutaryl-CoA reductase inhibitor is administered to the subject. Non-limiting examples of suitable hydroxymethylglutaryl-CoA reductase inhibitors include lovastatin, atorvastatin, fluvastatin, pravastatin, simvastatin, and rosuvastatin. Alternatively, the diet of the subject may be formulated by changing the composition of glycans (e.g., polyfructose-containing oligosaccharides) in the diet that are preferred by polysaccharide degrading bacterial components of the microbiota (e.g., Bacteroides spp) when in the presence of mesophilic methanogenic archaeal species such as Methanobrevibacter smithii.
Generally speaking, when the archaeal population in the subject's gastrointestinal tract is decreased in accordance with the methods described above, the polysaccharide degrading properties of the subject's gastrointestinal microbiota is altered such that microbial-mediated carbohydrate metabolism or its efficiency is decreased. Typically, depending upon the embodiment, the transcriptome and the metabolome of the gastrointestinal microbiota is altered. In one embodiment, the microbe is a saccharolytic bacterium. In one alternative of this embodiment, the saccharolytic bacterium is a Bacteroides species. In a further alternative embodiment, the bacterium is Bacteroides thetaiotaomicron. Typically, the carbohydrate will be a plant polysaccharide or dietary fiber. Plant polysaccharides may include starch, fructan, cellulose, hemicellulose, and pectin.
The compounds utilized in this invention to alter the archaeon population may be administered by any number of routes including, but not limited to, oral, intravenous, intramuscular, intra-arterial, intramedullary, intrathecal, intraventricular, pulmonary, transdermal, subcutaneous, intraperitoneal, intranasal, enteral, topical, sublingual, or rectal means.
The actual effective amounts of compound described herein can and will vary according to the specific composition being utilized, the mode of administration and the age, weight and condition of the subject. Dosages for a particular individual subject can be determined by one of ordinary skill in the art using conventional considerations. Those skilled in the art will appreciate that dosages may also be determined with guidance from Goodman & Gilman's The Pharmacological Basis of Therapeutics, Ninth Edition (1996), Appendix II, pp. 1707-1711 and from Goodman & Gilman's The Pharmacological Basis of Therapeutics, Tenth Edition (2001), Appendix II, pp. 475-493.
By way of non-limiting example, weight loss may be promoted by administering an HMG-CoA reductase inhibitor to a subject. In an exemplary embodiment, the inhibitor will selectively inhibit the HMG-CoA reductase expressed by M. smithii and not the HMG-CoA reductase expressed by the subject. In another embodiment, a second HMG CoA-reductase inhibitor may be administered that selectively inhibits the HMG CoA-reductase expressed by the subject in lieu of the HMG-CoA reductase expressed by M. smithii. In yet another embodiment, an HMG-CoA reductase inhibitor that selectively inhibits the HMG-CoA reductase expressed by the subject may be administered in combination with an HMG-CoA reductase inhibitor that selectively inhibits the HMG-CoA reducase expressed by M. smithii. One means that may be utilized to achieve such selectivity is via the use of time-release formulations as discussed below or by otherwise altering the properties of the compounds so that they will not, or will, be efficiently absorbed from the gastrointestinal tract. Alternatively, the compound that selectively inhibits the HMG-CoA reductase expressed by M. smithii may be poorly absorbed by gastrointestinal tract of the subject. Compounds that inhibit HMG-CoA reductase are well known in the art. For instance, non-limiting examples include atorvastatin, pravastatin, rosuvastatin, and other statins.
These compounds, for example HMG-CoA reductase inhibitors, may be formulated into pharmaceutical compositions and administered to subjects to promote weight loss. According to the present invention, a pharmaceutical composition includes, but is not limited to, pharmaceutically acceptable salts, esters, salts of such esters, or any other adduct or derivative which upon administration to a subject in need is capable of providing, directly or indirectly, a composition as otherwise described herein, or a metabolite or residue thereof, e.g., a prodrug.
The pharmaceutical compositions maybe administered by several different means that will deliver a therapeutically effective dose. Such compositions can be administered orally, parenterally, by inhalation spray, rectally, intradermally, intracisternally, intraperitoneally, transdermally, bucally, as an oral or nasal spray, or topically (i.e. powders, ointments or drops) in dosage unit formulations containing conventional nontoxic pharmaceutically acceptable carriers, adjuvants, and vehicles as desired. Topical administration may also involve the use of transdermal administration such as transdermal patches or iontophoresis devices. The term parenteral as used herein includes subcutaneous, intravenous, intramuscular, or intrasternal injection, or infusion techniques. In an exemplary embodiment, the pharmaceutical composition will be administered in an oral dosage form. Formulation of drugs is discussed in, for example, Hoover, John E., Remington's Pharmaceutical Sciences, Mack Publishing Co., Easton, Pa. (1975), and Liberman, H. A. and Lachman, L., Eds., Pharmaceutical Dosage Forms, Marcel Decker, New York, N.Y. (1980).
The amount of an HMG-CoA reductase inhibitor that constitutes an “effective amount” can and will vary. The amount will depend upon a variety of factors, including whether the administration is in single or multiple doses, and individual subject parameters including age, physical condition, size, and weight. Those skilled in the art will appreciate that dosages may also be determined with guidance from Goodman & Goldman's The Pharmacological Basis of Therapeutics, Ninth Edition (1996), Appendix II, pp. 1707-1711 and from Goodman & Goldman's The Pharmacological Basis of Therapeutics, Tenth Edition (2001), Appendix II, pp. 475-493.
As described above, an HMG-CoA reductase inhibitor may be specific for the M. smithii enzyme, or for the subject's enzyme, depending, in part, on the selectivity of the particular inhibitor and the area the inhibitor is targeted for release in the subject. For example, an inhibitor may be targeted for release in the upper portion of the gastrointestinal tract of a subject to substantially inhibit the subject's enzyme. In contrast, the inhibitor may be targeted for release in the lower portion of the gastrointestinal tract of a subject, i.e., where M. smithii resides, then the inhibitor may substantially inhibit M. smithii's enzyme.
In order to selectively control the release of an inhibitor to a particular region of the gastrointestinal tract for release, the pharmaceutical compositions of the invention may be manufactured into one or several dosage forms for the controlled, sustained or timed release of one or more of the ingredients. In this context, typically one or more of the ingredients forming the pharmaceutical composition is microencapsulated or dry coated prior to being formulated into one of the above forms. By varying the amount and type of coating and its thickness, the timing and location of release of a given ingredient or several ingredients (in either the same dosage form, such as a multi-layered capsule, or different dosage forms) may be varied.
In an exemplary embodiment, the coating may be an enteric coating. The enteric coating generally will provide for controlled release of the ingredient, such that drug release can be accomplished at some generally predictable location in the lower intestinal tract below the point at which drug release would occur without the enteric coating. In certain embodiments, multiple enteric coatings may be utilized. Multiple enteric coatings, in certain embodiments, may be selected to release the ingredient or combination of ingredients at various regions in the lower gastrointestinal tract and at various times.
As will be appreciated by a skilled artisan, the encapsulation or coating method can and will vary depending upon the ingredients used to form the pharmaceutical composition and coating, and the desired physical characteristics of the microcapsules themselves. Additionally, more than one encapsulation method may be employed so as to create a multi-layered microcapsule, or the same encapsulation method may be employed sequentially so as to create a multi-layered microcapsule. Suitable methods of microencapsulation may include spray drying, spinning disk encapsulation (also known as rotational suspension separation encapsulation), supercritical fluid encapsulation, air suspension microencapsulation, fluidized bed encapsulation, spray cooling/chilling (including matrix encapsulation), extrusion encapsulation, centrifugal extrusion, coacervation, alginate beads, liposome encapsulation, inclusion encapsulation, colloidosome encapsulation, sol-gel microencapsulation, and other methods of microencapsulation known in the art. Detailed information concerning materials, equipment and processes for preparing coated dosage forms may be found in Pharmaceutical Dosage Forms: Tablets, eds. Lieberman et al. (New York: Marcel Dekker, Inc., 1989), and in Ansel et al., Pharmaceutical Dosage Forms and Drug Delivery Systems, 6th Ed. (Media, Pa.: Williams & Wilkins, 1995).
Another aspect of the invention encompasses use of the gut microbiome as a biomarker for obesity. The biomarker may be utilized to construct arrays that may be used for several applications including as a diagnostic or prognostic tool to determine obesity risk, judge the efficacy of existing weight loss regimes, aid in drug discovery, identify additional biomarkers involved in obesity or an obesity related disorder, and aid in the discovery of therapeutic targets involved in the regulation of energy balance, including but not limited to those that may directly affect the composition of the gut microbiome. Generally speaking, the array may comprise biomolecules modulated in an obese host microbiome or a lean host microbiome.
The array may be comprised of a substrate having disposed thereon at least one biomolecule that is modulated in an obese host microbiome compared to a lean host microbiome. Several substrates suitable for the construction of arrays are known in the art, and one skilled in the art will appreciate that other substrates may become available as the art progresses. The substrate may be a material that may be modified to contain discrete individual sites appropriate for the attachment or association of the biomolecules and is amenable to at least one detection method. Non-limiting examples of substrate materials include glass, modified or functionalized glass, plastics (including acrylics, polystyrene and copolymers of styrene and other materials, polypropylene, polyethylene, polybutylene, polyurethanes, TeflonJ, etc.), nylon or nitrocellulose, polysaccharides, nylon, resins, silica or silica-based materials including silicon and modified silicon, carbon, metals, inorganic glasses and plastics. In an exemplary embodiment, the substrates may allow optical detection without appreciably fluorescing.
A substrate may be planar, a substrate may be a well, i.e. a 364 well plate, or alternatively, a substrate may be a bead. Additionally, the substrate may be the inner surface of a tube for flow-through sample analysis to minimize sample volume. Similarly, the substrate may be flexible, such as a flexible foam, including closed cell foams made of particular plastics.
The biomolecule or biomolecules may be attached to the substrate in a wide variety of ways, as will be appreciated by those in the art. The biomolecule may either be synthesized first, with subsequent attachment to the substrate, or may be directly synthesized on the substrate. The substrate and the biomolecule may be derivatized with chemical functional groups for subsequent attachment of the two. For example, the substrate may be derivatized with a chemical functional group including, but not limited to, amino groups, carboxyl groups, oxo groups or thiol groups. Using these functional groups, the biomolecule may be attached using functional groups on the biomolecule either directly or indirectly using linkers.
The biomolecule may also be attached to the substrate non-covalently. For example, a biotinylated biomolecule can be prepared, which may bind to surfaces covalently coated with streptavidin, resulting in attachment. Alternatively, a biomolecule or biomolecules may be synthesized on the surface using techniques such as photopolymerization and photolithography. Additional methods of attaching biomolecules to arrays and methods of synthesizing biomolecules on substrates are well known in the art, i.e. VLSIPS technology from Affymetrix (e.g., see U.S. Pat. No. 6,566,495, and Rockett and Dix, “DNA arrays: technology, options and toxicological applications,” Xenobiotica 30(2):155-177, all of which are hereby incorporated by reference in their entirety).
In one embodiment, the biomolecule or biomolecules attached to the substrate are located at a spatially defined address of the array. Arrays may comprise from about 1 to about several hundred thousand addresses or more. In one embodiment, the array may be comprised of less than 10,000 addresses. In another alternative embodiment, the array may be comprised of at least 10,000 addresses. In yet another alternative embodiment, the array may be comprised of less than 5,000 addresses. In still another alternative embodiment, the array may be comprised of at least 5,000 addresses. In a further embodiment, the array may be comprised of less than 500 addresses. In yet a further embodiment, the array may be comprised of at least 500 addresses.
A biomolecule may be represented more than once on a given array. In other words, more than one address of an array may be comprised of the same biomolecule. In some embodiments, two, three, or more than three addresses of the array may be comprised of the same biomolecule. In certain embodiments, the array may comprise control biomolecules and/or control addresses. The controls may be internal controls, positive controls, negative controls, or background controls.
The array may be comprised of biomolecules indicative of an obese host microbiome (e.g. the nucleic acid sequences listed in Table 13). Alternatively, the array may be comprised of biomolecules indicative of a lean host microbiome (e.g. the nucleic acid sequences listed in Table 14). A biomolecule is “indicative” of an obese or lean microbiome if it tends to appear more often in one type of microbiome compared to the other. Additionally, the array may be comprised of biomolecules that are modulated in the obese host microbiome compared to the lean host microbiome. As used herein, “modulated” may refer to a biomolecule whose representation or activity is different in an obese host microbiome compared to a lean host microbiome. For instance, modulated may refer to a biomolecule that is enriched, depleted, up-regulated, down-regulated, degraded, or stabilized in the obese host microbiome compared to a lean host microbiome. In one embodiment, the array may be comprised of a biomolecule enriched in the obese host microbiome compared to the lean host microbiome. In another embodiment, the array may be comprised of a biomolecule depleted in the obese host microbiome compared to the lean host microbiome. In yet another embodiment, the array may be comprised of a biomolecule up-regulated in the obese host microbiome compared to the lean host microbiome. In still another embodiment, the array may be comprised of a biomolecule down-regulated in the obese host microbiome compared to the lean host microbiome. In still yet another embodiment, the array may be comprised of a biomolecule degraded in the obese host microbiome compared to the lean host microbiome. In an alternative embodiment, the array may be comprised of a biomolecule stabilized in the obese host microbiome compared to the lean host microbiome.
Generally speaking, an array of the invention may comprise at least one biomolecule indicative of, or modulated in, an obese host microbiome compared to a lean host microbiome. In one embodiment, the array may comprise at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360, 365, 370, 375, 380, 385, 390, 395, or 400 biomolecules indicative of, or modulated in, an obese host microbiome compared to a lean host microbiome. In another embodiment, the array may comprise at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, or at least 900 biomolecules indicative of, or modulated in, an obese host microbiome compared to a lean host microbiome.
As used herein, “biomolecule” may refer to a nucleic acid, an oligonucleic acid, an amino acid, a peptide, a polypeptide, a protein, a lipid, a carbohydrate, a metabolite, or a fragment thereof. Nucleic acids may include RNA, DNA, and naturally occurring or synthetically created derivatives. A biomolecule may be present in, produced by, or modified by a microorganism within the gut.
In one embodiment, the biomolecules of the array may be selected from the biomolecules listed in Table 13. For instance, the biomolecules of the array may be selected from the group comprising nucleic acids corresponding to SEQ ID NO:1 through SEQ ID NO:273. In another embodiment, the biomolecules of the array may be selected from the biomolecules listed in Table 14. For instance, the biomolecules of the array may be selected from the group comprising nucleic acids corresponding to SEQ ID NO:274 through SEQ ID NO:383. In yet another embodiment, the biomolecules of the array may be selected from the biomolecules listed in Table 13 and Table 14, for instance, the nucleic acids corresponding to SEQ ID NO:1 through SEQ ID NO:383.
Additionally, the biomolecule may be at least 70, 75, 80, 85, 90, or 95% homologous to a biomolecule listed in Table 13 or Table 14 above. In one embodiment, the biomolecule may be at least 80, 81, 82, 83, 84, 85, 86, 87, 88, or 89% homologous to a biomolecule derived from an accession number detailed above. In another embodiment, the biomolecule may be at least 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99% homologous to a biomolecule derived from an accession number detailed above.
In determining whether a biomolecule is substantially homologous or shares a certain percentage of sequence identity with a sequence of the invention, sequence similarity may be defined by conventional algorithms, which typically allow introduction of a small number of gaps in order to achieve the best fit. In particular, “percent identity” of two polypeptides or two nucleic acid sequences is determined using the algorithm of Karlin and Altschul (Proc. Natl. Acad. Sci. USA 87:2264-2268, 1993). Such an algorithm is incorporated into the BLASTN and BLASTX programs of Altschul et al. (J. Mol. Biol. 215:403-410, 1990). BLAST nucleotide searches may be performed with the BLASTN program to obtain nucleotide sequences homologous to a nucleic acid molecule of the invention. Equally, BLAST protein searches may be performed with the BLASTX program to obtain amino acid sequences that are homologous to a polypeptide of the invention. To obtain gapped alignments for comparison purposes, Gapped BLAST is utilized as described in Altschul et al. (Nucleic Acids Res. 25:3389-3402, 1997). When utilizing BLAST and Gapped BLAST programs, the default parameters of the respective programs (e.g., BLASTX and BLASTN) are employed. See http://www.ncbi.nlm.nih.gov for more details.
For each of the above embodiments, methods of determining biomolecules that are indicative of, or modulated in, an obese host microbiome compared to a lean host microbiome may be determined using methods detailed in the Examples.
The arrays may be utilized in several suitable applications. For example, the arrays may be used in methods for detecting association between two or more biomolecules. This method typically comprises incubating a sample with the array under conditions such that the biomolecules comprising the sample may associate with the biomolecules attached to the array. The association is then detected, using means commonly known in the art, such as fluorescence. “Association,” as used in this context, may refer to hybridization, covalent binding, or ionic binding. A skilled artisan will appreciate that conditions under which association may occur will vary depending on the biomolecules, the substrate, and the detection method utilized. As such, suitable conditions may have to be optimized for each individual array created.
In yet another embodiment, the array may be used as a tool in a method to determine whether a compound has efficacy for treatment of obesity or an obesity-related disorder in a host. Alternatively, the array may be used as a tool in a method to determine whether a compound increases or decreases the relative abundance of Bacteriodes, Actinobacteria, or Firmicutes in a subject. Typically, such methods comprise comparing a plurality of biomolecules of the host's microbiome before and after administration of a compound, such that if the abundance of biomolecules associated with obesity decreased after treatment, or the abundance of biomolecules indicative of Bacteroides increases, or the abundance of biomolecules indicative of Firmicutes and/or Actinobacteria decreases, the compound may be efficacious in treating obesity in a host.
The array may also be used to quantitate the plurality of biomolecules of the host microbiome before and after administration of a compound. The abundance of each biomolecule in the plurality may then be compared to determine if there is a decrease in the abundance of biomolecules associated with obesity after treatment.
In some embodiments, the array may be used as a diagnostic or prognostic tool to identify subjects that are susceptible to more efficient energy harvesting, and therefore, more susceptible to weight gain and/or obesity. Such a method may generally comprise incubating the array with biomolecules derived from the subject's gut microbiome to determine the relative abundance of nucleic acids or nucleic acid products associated with Bacteroidetes, Actinobacteria, or Firmictues. In some embodiments, the array may be used to determine the relative abundance of Mollicutes, Mollicute-associated nucleic acids, or Mollicute-associated nucleic acid products in a subject's gut microbiome. Methods to collect, isolate, and/or purify biomolecules from the gut microbiome of a subject to be used in the above methods are known in the art, and are detailed in the examples.
The present invention also encompasses use of the microbiome as a biomarker to construct microbiome profiles. Generally speaking, a microbiome profile is comprised of a plurality of values with each value representing the abundance of a microbiome biomolecule. The abundance of a microbiome biomolecule may be determined, for instance, by sequencing the nucleic acids of the microbiome as detailed in the examples. This sequencing data may then be analyzed by known software, as detailed in the examples, to determine the abundance of a microbiome biomolecule in the analyzed sample. The abundance of a microbiome biomolecule may also be determined using an array described above. For instance, by detecting the association between a biomolecules comprising a microbiome sample and the biomolecules comprising the array, the abundance of a microbiome biomolecule in the sample may be determined.
A profile may be digitally-encoded on a computer-readable medium. The term “computer-readable medium” as used herein refers to any medium that participates in providing instructions to a processor for execution. Such a medium may take many forms, including but not limited to non-volatile media, volatile media, and transmission media. Non-volatile media may include, for example, optical or magnetic disks. Volatile media may include dynamic memory. Transmission media may include coaxial cables, copper wire and fiber optics. Transmission media may also take the form of acoustic, optical, or electromagnetic waves, such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or other magnetic medium, a CD-ROM, CDRW, DVD, or other optical medium, punch cards, paper tape, optical mark sheets, or other physical medium with patterns of holes or other optically recognizable indicia, a RAM, a PROM, and EPROM, a FLASH-EPROM, or other memory chip or cartridge, a carrier wave, or other medium from which a computer can read.
A particular profile may be coupled with additional data about that profile on a computer readable medium. For instance, a profile may be coupled with data about what therapeutics, compounds, or drugs may be efficacious for that profile, or about other features of the subject's digestive health when consuming a given diet or set of diets. Conversely, a profile may be coupled with data about what therapeutics, compounds, or drugs may not be efficacious for that profile. Alternatively, a profile may be coupled with known risks associated with that profile. Non-limiting examples of the type of risks that might be coupled with a profile include disease or disorder risks associated with a profile. The computer readable medium may also comprise a database of at least two distinct profiles.
Such a profile may be used, for instance, in a method of selecting a compound for treating obesity or an obesity-related disorder in a host. Generally speaking, such a method would comprise providing a microbiome profile from the host and providing a plurality of reference microbiome profiles, each associated with a compound, and selecting the reference profile most similar to the host microbiome profile, to thereby select a compound for treating obesity or an obesity-related disorder in the host. The host profile and each reference profile may comprise a plurality of values, each value representing the abundance of a microbiome biomolecule.
The microbiome profiles may be utilized in a variety of applications. For example, the microbiome profiles may be used in a method for predicting risk for obesity or an obesity-related disorder in a host. The method comprises, in part, providing a microbiome profile from a host, and providing a plurality of reference microbiome profiles, then selecting the reference profile most similar to the host microbiome profile, such that if the host's microbiome is most similar to a reference obese microbiome, the host is at risk for obesity or an obesity-related disorder. The microbiome profile from the host may be determined using an array of the invention. The reference profiles may be stored on a computer-readable medium such that software known in the art and detailed in the examples may be used to compare the microbiome profile and the reference profiles.
The host microbiome may be derived from a subject that is a rodent, a human, a livestock animal, a companion animal, or a zoological animal. In one embodiment, the host microbiome is derived from a rodent, i.e. a mouse, a rat, a guinea pig, etc. In another embodiment, the host microbiome is derived from a human. In a yet another embodiment the host microbiome is derived from a livestock animal. Non-limiting examples of livestock animals include pigs, cows, horses, goats, sheep, llamas and alpacas. In still another embodiment, the host microbiome is derived from a companion animal. Non-limiting examples of companion animals include pets, such as dogs, cats, rabbits, and birds. In still yet another embodiment, the host microbiome is derived from a zoological animal. As used herein, a “zoological animal” refers to an animal that may be found in a zoo. Such animals may include non-human primates, large cats, wolves, and bears.
The present invention also encompasses a kit for evaluating a compound, therapeutic, or drug. Typically, the kit comprises an array and a computer-readable medium. The array may comprise a substrate, the substrate having disposed thereon at least one biomolecule that is modulated in an obese host microbiome compared to a lean host microbiome. The computer-readable medium may have a plurality of digitally-encoded profiles wherein each profile of the plurality has a plurality of values, each value representing the abundance of a biomolecule in a host microbiome detected by the array. The array may be used to determine a profile for a particular host under particular conditions, and then the computer-readable medium may be used to determine if the profile is similar to known profile stored on the computer-readable medium. Non-limiting examples of possible known profiles include obese and lean profiles for several different hosts, for example, rodents, humans, livestock animals, companion animals, or zoological animals.
The term “abundance” refers to the representation of a given taxonomic group (e.g. phylum, order, family, genera, or species) of microorganism present in the gastrointestinal tract of a subject.
The term “activity of the microbiota population” refers to the microbiome's ability to harvest energy and nutrients.
The term “antagonist” refers to a molecule that inhibits or attenuates the biological activity of a Fiaf polypeptide and in particular, the ability of Fiaf to inhibit LPL, and/or the ability of the microbiota to regulate Fiaf. Antagonists may include proteins such as antibodies, nucleic acids, carbohydrates, small molecules, or other compounds or compositions that modulate the activity of a Fiaf polypeptide either by directly interacting with the polypeptide or by acting on components of the biological pathway in which Fiaf participates.
The term “agonist” refers to a molecule that enhances or increases the biological activity of a Fiaf polypeptide and in particular, the ability of Fiaf to inhibit LPL. Agonists may include proteins, peptides, nucleic acids, carbohydrates, small molecules (e.g., such as metabolites), or other compounds or compositions that modulate the activity of a Fiaf polypeptide either by directly interacting with the polypeptide or by acting on components of the biological pathway in which Fiaf participates.
The term “altering” as used in the phrase “altering the microbiota population” is to be construed in its broadest interpretation to mean a change in the representation of microbes or the functions/activities of microbial communities in the gastrointestinal tract of a subject. The change may be a decrease or an increase in the presence of a particular microbial species, genus, family, order, or class, or change in the expression of microbial community associated nucleic acids or a change in the protein and metabolic products produced by members of the community.
“BMI” as used herein is defined as a human subject's weight (in kilograms) divided by height (in meters) squared.
An “effective amount” is a therapeutically-effective amount that is intended to qualify the amount of agent that will achieve the goal of a decrease in body fat, or in promoting weight loss.
Fas stands for fatty acid synthase.
Fiaf stands for fasting-induced adipocyte factor, also known as angiopoietin like protein 4 (Angpltl4).
LPL stands for lipoprotein lipase.
The term “obesity-related disorder” includes disorders resulting from, at least in part, obesity. Representative disorders include metabolic syndrome, type II diabetes, hypertension, cardiovascular disease, and nonalcoholic fatty liver disease.
The term “metagenomics” refers to the application of modern genomic techniques to the study of the composition and operations of communities of microbial organisms sampled directly in their natural environments, by passing the need for isolation and lab cultivation of individual species.
PPAR stands for peroxisome proliferator-activator receptor.
A “subject in need of treatment for obesity” generally will have at least one of three criteria: (i) BMI over 30; (ii) 100 pounds overweight; or (iii) 100% above an “ideal” body weight as determined by generally recognized weight charts.
As various changes could be made in the above compounds, products and methods without departing from the scope of the invention, it is intended that all matter contained in the above description and in the examples given below, shall be interpreted as illustrative and not in a limiting sense.
The following examples are included to demonstrate preferred embodiments of the invention. It should be appreciated by those of skill in the art that the techniques disclosed in the examples that follow represent techniques discovered by the inventors to function well in the practice of the invention. Those of skill in the art should, however, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments that are disclosed and still obtain a like or similar result without departing from the spirit and scope of the invention. Therefore all matter set forth or shown in the accompanying drawings is to be interpreted as illustrative and not in a limiting sense.
The following examples illustrate various iterations of the invention.
The bacterial lineages of the human gut microbiota are largely unexplored. In this study, the lineages of gut microbiota of 31 monozygotic (MZ) twin pairs, 23 dizygotic (DZ) twin pairs, and where available their mothers (n=46), were characterized. (Tables 1-5). MZ and DZ co-twins and parent-offspring pairs provide an attractive paradigm for assessing the impact of genotype and shared early environment exposures on the gut microbiome. Moreover, genetically ‘identical’ MZ twin pairs gain weight in response to overfeeding in a more reproducible way than do unrelated individuals and are more concordant for body mass index (BMI) than dizygotic twin pairs, suggesting shared features of their energy balance influenced by host genotype.
aID nomenclature: Family number, Twin number or mother, and BMI category (Le = lean; Ov = overweight, Ob = obese; e.g. F1T1Le stands for family 1, twin 1, lean)
aID nomenclature: Family number, Twin number or mother, and BMI category (Le = lean; Ov = overweight, Ob = obese; e.g. F1T1LE stands for family 1, twin 1, lean)
Faecalibacterium
Ruminococcus
Eubacterium rectale
Clostridium
Clostridioforme
Ruminococcus;
Ruminococcus
schinkii
Coprococcus
Ruminococcus
Faecalibacterium
Faecalibacterium
Faecalibacterium
Clostridium nexile
Faecalibacterium
Clostridium nexile
Ruminococcus
Ruminococcus
a1,000 sequences were randomly sampled from a single timepoint for each individual
bBased on the consensus taxonomy of ≧90% sequences within each phylotype (best-BLAST-hit against the Greengenes database)
Clostridium nexile
Clostridium nexile;
Clostridium
fusiformis
Ruminococcus
Eubacterium rectale
Ruminococcus
Clostridium nexile
Ruminococcus
Ruminococcus;
Ruminococcus
schinkii
Faecalibacterium
Clostridium nexile
Ruminococcus
Clostridium
bifermentans
Clostridium bolteae
Ruminococcus
Ruminococcus;
Ruminococcus luti
clostridioforme
Faecalibacterium
aBased on the consensus taxonomy of >90% sequences within each phylotype (best-BLAST-hit against the Greengenes database)
Faecalibacterium
Ruminococcus
Eubacterium rectale
Clostridium
clostridioforme
Clostridium nexile;
Clostridium
fusiformis
Coprococcus
Clostridium nexile
Clostridium nexile
aBased on the consensus taxonomy of >90% sequences within each phylotype (best-BLAST-hit against the Greengenes database)
Twin pairs who had been enrolled in the Missouri Adolescent Female Twin Study (MOAFTS) were recruited for this study (mean period of enrollment, 11.7±1.2 years; range, 4.4-13.0 years). The MOAFTS twin cohort, comprised of female like-sex twin pairs, was identified from Missouri birth records over the period 1994-1999, when the twins were median age 15. A total of 350 twins from the larger MOAFTS cohort completed screening interviews for the present study. Pairs most likely to meet study criteria were identified at the wave five interview of the MOAFTS twin cohort (which has 90% retention of wave four participants). Eligibility was then confirmed at screening interview. All twins were 25-32 years old, of European or African ancestry (EA and AA, respectively), were generally concordant for obesity (BMI>30 kg/m2) or leanness (BMI=18.5-24.9 kg/m2) [1 twin pair was lean/overweight (overweight defined as BMI≧25 and <30) and 6 pairs were overweight/obese], and had not taken antibiotics for at least 5.49±0.09 months. Each participant completed a detailed medical, lifestyle, and dietary questionnaire. Participants were broadly representative of the overall Missouri population with respect to BMI, parity, education, and marital status. Although all were born in Missouri, they currently live throughout the USA: 29% live in the same house, but some live >800 km apart. Since fecal samples are readily attainable and representative of interpersonal differences in gut microbial ecology, they were collected from each individual and frozen immediately. The collection procedure was repeated again with an average interval between sample collections of 57±4 days.
Frozen de-identified fecal samples were stored at −80° C. before processing. In order to homogenize each sample, a 10-20 g aliquot of each sample was pulverized in liquid nitrogen with a mortar and pestle. An aliquot (˜500mg) of each sample was then suspended, while frozen, in a solution containing 500 μl of extraction buffer [200 mM Tris (pH 8.0), 200 mM NaCl, 20 mM EDTA], 210 μl of 20% SDS, 500 μl of a mixture of phenol:chloroform:isoamyl alcohol (25:24:1, pH 7.9), and 500 μl of a slurry of 0.1 mm-diameter zirconia/silica beads (BioSpec Products, Bartlesville, Okla.). Microbial cells were subsequently lysed by mechanical disruption with a bead beater (BioSpec Products) set on high for 2 min at room temperature, followed by extraction with phenol:chloroform:isoamyl alcohol, and precipitation with isopropanol. DNA obtained from three separate 10 mg frozen aliquots of each fecal sample were pooled (≧200 μg DNA) and used for pyrosequencing (see below).
Full-Length 16S rRNA Sequence-Based Surveys
Five replicate PCR reactions were performed for each fecal DNA sample. To generate full length or near full length bacterial 16S rRNA amplicons, each 25 μl reaction contained 100 ng of gel purified DNA (Qiaquick, Qiagen), 10 mM Tris (pH 8.3), 50 mM KCl, 2 mM MgSO4, 0.16 μM dNTPs, 0.4 μM of the bacteria-specific primer 8F (5′-AGAGTTTGATCCTGGCTCAG-3′), 0.4 μM of the universal primer 1391R (5′-GACGGGCGGTGWGTRCA-3′), 0.4 M betaine, and 3 units of Taq polymerase (Invitrogen). Cycling conditions were 94° C. for 2 min, followed by 25 cycles of 94° C. for 1 min, 55° C. for 45 sec, and 72° C. for 2 min. Replicate PCRs were pooled and concentrated (Millipore; Montage PCR filter columns). Full-length 16S rRNA gene amplicons (1.3kb) were then gel-purified using the Qiaquick kit (Qiagen), subcloned into TOPO TA pCR4.0 (Invitrogen), and the ligated DNA transformed into E. coli TOP10 (Invitrogen). For each sample, 384 colonies containing cloned 16S rRNA nucleic acid amplicons were processed for sequencing. Plasmid inserts were sequenced bi-directionally using vector-specific primers plus the internal primer 907R (5′-CCGTCAATTCCTTTRAGTTT-3′).
16S rRNA gene sequences were edited and assembled into consensus sequences using the PHRED and PHRAP software packages within the Xplorseq program. Sequences that did not assemble were discarded and bases with PHRED quality scores <20 were trimmed. Sequences were checked for chimeras using Bellerophon program version 3 with the default parameters (final dataset n=8,941 near full-length 16S rRNA gene sequences; for sequence designations see Table 1). Alignments for reference genome 16S rRNA gene sequences were manually edited in ARB.
V2/3 16S rRNA Sequence-Based Surveys
Four replicate PCR reactions targeting the V2/3 region of bacterial 16S rRNA genes were performed on the same fecal DNA samples used above. Each 20 μl reaction contained 100 ng of gel purified DNA (Qiaquick, Qiagen), 8 μl 2.5× HotMaster PCR Mix (Eppendorf), 0.3 μM of the primer 8F [5′-GCCTTGCCAGCCCGCTCAG-TCAGAGTTTGATCCTGGCTCAG-3′; composite of 454 primer B (underlined), linker nucleotides (TC), and the universal bacterial primer 8F (italics)], and 0.3 μM of the primer 338R [5′-GCCTCCCTCGCGCCATCAGNNNNNNNNCA-TGCTGCCTCCCGTAGGAGT-3′; 454 Life Sciences primer A (underlined), a unique 8 base barcode (Ns), linker nucleotides (CA), and the broad-range bacterial primer 338R (italics)]. Cycling conditions were 95° C. for 2 min, followed by 30 cycles of 95° C. for 20 sec, 52° C. for 20 sec, and 65° C. for 1 min. Replicate PCRs were pooled and purified with Ampure magnetic purification beads (Agencourt).
PCR products were quantified with the bisbenzimide H assay. An aliquot of each PCR product was incubated for 5 min at room temperature in THE reagent [10 mM Trizma HCl pH 8.1, 100 mM NaCl, 1 mM EDTA, and 50 ng/ml freshly prepared bisbenzimide H (Sigma)]. Samples were read on a flurometer or plate reader (excitation at 365 nm, emission at 460 nm) relative to a standard curve constructed using E. coli DNA (Sigma). Multiple pools, each containing approximately equimolar amounts of PCR products, were assembled for 454 FLX amplicon pyrosequencing (n=33-100 barcoded samples/pool). Technical replicates were analyzed from selected representatives of each pool across four different sequencing centers; results were highly reproducible, discriminating between individuals and between samples from the same individual over time (
V6 16S rRNA Sequence-Based Surveys
PCR reactions targeting the V6 region of bacterial 16S rRNA genes were performed on the same fecal DNA samples used above. Each 32 μl reaction contained 100 ng of gel purified DNA (Qiaquick, Qiagen), PCR buffer (PurePeak DNA polymerization mix, Thermo-Fisher), 0.625 mM PurePeak dNTPs (Thermo-Scientific), 0.625 μM Fusion Primer A, 0.625 μM Fusion Primer B, and 5U Pfu polymerase (Stratagene). The primer set included 5 forward primers (Fusion A) and 4 reverse primers (Fusion B) fused to the 454 Life Sciences adaptors A and B respectively. Cycling conditions were 94° C. for 3 min, followed by 30 cycles of 94° C. for 30 sec, 57° C. for 45 sec, and 72° C. for 1 min, with a final extension period of 72° C. for 2 min. PCR products were purified with MinElute columns (Qiagen), and DNA was quantified using a Bioanalyzer (Agilent) and the PicoGreen assay (Invitrogen). Two pools of PCR products were constructed for 454 FLX amplicon pyrosequencing, composed of 18 and 20 samples, respectively (the second run contained 3 samples from the V2/3 region and 3 technical replicates, one additional sample (TS30) was sequenced in a third run, bringing the total number of V6 samples processed to 33). Since technical replicates were highly reproducible (see above and
Pyrosequencing data was pre-processed to remove sequences with low quality scores, sequences with ambiguous characters, or sequences outside of the length bounds (V6<50 nt, V2/3<200 nt) and binned according to sample based on the error-correcting barcodes. Similar sequences were identified using the Megablast software and the following parameters: E-value 1−10; minimum coverage, 99%; and minimum pairwise identity, 97%. Candidate OTUs were identified as sets of sequences connected to each other at this level using the top 4000 hits per sequence. Each candidate OTU was considered valid if the average density of connection was above threshold; otherwise it was broken up into smaller connected components.
A relaxed neighbor-joining tree was built from one representative sequence per OTU using Clearcut, employing the Kimura correction (the PH lanemask was applied to V2/3 data), but otherwise with default comparisons. Unweighted UniFrac was run using the resulting tree and the counts of each sequence in each sample. Principle component analysis (PCA) was performed on the resulting matrix of distances between each pair of samples. To determine if the UniFrac distances were on average significantly different for pairs of samples (i.e. between twin-pairs, between twins and their mother, or between unrelated individuals), a t-test was performed on the UniFrac distance matrix, and a p-value was generated for the t-statistic by permutation of the rows and columns as in the Mantel test, regenerating the t-statistic for 1000 random samples, and using the distribution to obtain an empirical p-value.
Taxonomy was assigned using the best-BLAST-hit against Greengenes (E-value cutoff of 1e−10, minimum 88% coverage, 88% percent identity) and the Hugenholtz taxomony, downloaded May 12, 2008, excluding sequences annotated as chimeric (http://greengenes.lbl.gov/Download/Sequence_Data/Greengenes_format/).
To determine which individuals had the most diverse communities of gut bacteria, rarefaction plots and Phylogenetic Diversity (PD) measurements, as described by Faith (Biological Conservation 1992), were made for each sample. PD is the total amount of branch length in a phylogenetic tree constructed from the combined 16S rRNA dataset, leading to the sequences in a given sample. To account for differences in sampling effort between individuals, and to estimate the thoroughness of sampling of each individual, the accumulation of PD (branch length) with sampling effort was plotted in a manner analogous to rarefaction curves. The PD rarefaction curve for each individual was generated by applying custom python code that can be downloaded from http://bayes.colorado.edu/unifrac, to the Arb parsimony insertion tree.
To characterize the bacterial lineages present in the fecal microbiotas of these 44 individuals, 16S rRNA sequencing was performed, targeting the full-length gene with an ABI 3730xl capillary sequencer. Additionally, multiplex sequencing with a 454 FLX pyrosequencer was used to survey the V2/3 variable region and the V6 hypervariable region (Tables 1, 2 and 3). Complementary phylogenetic and taxon-based methods were used to compare 16S rRNA sequences among fecal communities. Phylogenetic clustering with UniFrac is based on the principle that communities can be compared in terms of their shared evolutionary history, as measured by the degree to which they share branch length on a phylogenetic tree. This approach was complemented with taxon-based methods; these methods disregard some of the information contained in the phylogenetic tree of the taxa in question, but have the advantage that specific taxa unique to, or shared among, groups of samples can be identified (e.g., those from lean or obese individuals). Prior to both types of analyses, 16S rRNA gene sequences were grouped into Operational Taxonomic Units (OTUs/phylotypes) using the furthest-neighbor-like algorithm and a sequence identity threshold of 97%, which is commonly used to define ‘species’-level phylotypes. Taxonomic assignments were made using BLAST and Hugenholtz taxonomy annotations in the Greengenes database.
No matter which region of the 16S rRNA gene was examined (V2/3 or V6 pyrosequencing reads, or the near-complete gene from Sanger reads), individuals from the same family (a twin and her co-twin, or twins and their mother) had a more similar bacterial community structure than unrelated individuals (
Multiplex pyrosequencing of V2/3 and V6 amplicons allowed higher levels of coverage of community diversity compared to what was feasible using Sanger sequencing, reaching on average 3,984±232 (V2/3) and 24,786±1,403 (V6) sequences per sample. To control for differences in coverage between samples, all analyses were performed on an equal number of randomly selected sequences [200 full-length, 1,000 V2/3, and 10,000 V6]. At this level of coverage, there was little overlap between the sampled fecal communities: only 2, 5, and 21 phylotypes were found in >90% of the individuals surveyed (full-length, V2/3, and V6 data respectively). Moreover, the number of 16S rRNA gene sequences belonging to these phylotypes varied greatly between fecal microbiotas (Tables 4, 5 and 6).
Samples taken from the same individual at the initial collection point and 57±4 days later were remarkably consistent with respect to the specific phylotypes found (
After assigning V2/3, V6 and full-length 16S rRNA gene sequences to bacterial taxa (see Example 3 below), it was found that obese individuals generally had a lower relative abundance of the Bacteroidetes and a higher relative abundance of the Firmicutes and Actinobacteria: the statistical significance of these observations varied depending upon the sequencing methods used (Table 7), likely due to differences in PCR conditions (for example, the 8F primer has a known bias against Actinobacteria).
In summary, across all methods, obesity was associated with a significant decrease in the level of diversity (
aA subset of each dataset was included in the analysis: 10,000 sequences/sample (V6), 1,000 sequences/sample (V2/3) and 200 sequences/sample (full-length). Sequences from the same individual across both timepoints were pooled.
bValues are from a Student's t-test of the obese versus lean distribution
cThe AA lean individuals surveyed have significantly more Bacteroidetes and less Firmicutes than the lean EA individuals (p < 0.05)
dBLASTX comparisons between microbiomes and NCBI non-redundant database
All hosts were searched for bacterial phylotypes present at high abundance using a sampling model based on a combination of standard Poisson and binomial sampling statistics.
A sampling model was developed that allows placement of bounds on the maximum abundance of any phylotype found across all samples. The principle here is that if a given phylotype made up not less than some proportion p of the microbiome of all humans, it is then possible to calculate (i) the number of samples of a given size expected to lack that phylotype due to sampling error, and (ii) the probability that an actual proportion p-hat as low as the minimum abundance would be observed in any sample.
The probability P of failing to observe a given microbe at proportion p in a sample of size n is given by Poisson statistics as simply e−pn. For equal sample sizes, the probability of observing the phylotype in at least k samples using binomial sampling with Pr(success)=(1−P) can therefore be calculated. Then, the inverse binomial can be used to ask what value of P, and therefore of p, gives a specified probability (say, 5%) of observing a given phylotype in as few samples as actually observed for the most abundant phylotype. This calculation yields an upper bound for p (i.e. the value of p at which we can reject the idea that we would have seen the phylotype in as few samples as actually observed at the 95% confidence level).
For unequal sizes, there is no analytical solution to the equivalent of the binomial in which Pr(success) differs for each trial. Therefore, numerical optimization must be used to solve for p. Because the function relating p and the probability of observing the phylotype in at least a given number of samples is monotonic, a bisection search (bounded by p=0 and p=1) can be used to find the appropriate value of p for a desired confidence level. In practice, P was calculated for each sample, a vector of random numbers between 0 and 1 was chosen, and the number of times the random number at a given position was less than P was counted. Repeating this procedure for a fixed number of iterations (100,000 for the reported values) gives sufficiently smooth values to approximate the monotonic function and to allow the bisection search to converge on the same value of p to three significant figures across repeated trials.
In the case where a phylotype was found in all samples, a similar procedure could be used to identify the maximum value of p consistent with the observed minimum abundance of the phylotype whose minimum abundance across all samples is highest. In this case, instead of calculating the fraction of samples in which the phylotype was absent, (i) binomial sampling could be used to randomly sample the number of observed counts of a phylotype given the parametric value of p and the sample size of each sample, (ii) the minimum abundance across all samples could be measured, and (iii) this minimum abundance compared to the minimum abundance actually observed. Again, an analytical solution using extreme-value statistics is possible if sample sizes are equal, but the solution must be obtained by numerical methods (in this case, the same type of bisection search used above). The sampling model was implemented in Python using PyCogent.
Using this model the full-length 16S rRNA dataset described in Example 1 was first analyzed. The most abundant ‘species’-level phylotype in each sample made up 11% of that sample on average (range: 4.2%-22.0%), and the most abundant phylotype found across the combined dataset was found in 25 of the 27 fecal microbiotas (taxonomy assignment=Bacteria; Firmicutes; Clostridia; Clostridiales; Ruminococcus). These data are consistent with no phylotype being present at more than 1.3% abundance in all samples.
The deeper pyrosequencing data confirmed this result. In the V6 dataset, using even sampling of 10,000 sequences/sample, the most abundant phylotype in each sample made up 12% of that sample on average (range: 5.0%-36.6%). The overall most abundant phylotype was found in all 33 samples (Bacteria; Firmicutes; Clostridia; Clostridiales; Eubacterium rectale). However, in some samples, this phylotype was present in frequencies as low as 0.01%.
The sampling model allows one to ask what level of abundance in every individual the most abundant phylotype could have before its absence from, or limited representation in some samples becomes surprising. For example, with 1,000 sequences/samples, it would be very surprising if a species at 50% abundance across all samples in any out of 30 samples was missed, but it would not be surprising if a species at 0.00001% abundance were missed.
The sampling model (using 1000 random sequences per sample) indicated that this minimum observed abundance was consistent with a ‘true abundance’ of no more than 0.66%. In the V2/3 dataset, the most abundant phylotype in each sample made up 14.6% of that sample on average (range: 3.8%-47.1%). The overall most abundant phylotype was present in 270 of 274 samples at this depth of coverage (Bacteria;Bacteroidetes;Bacteroidales; Bacteroidaceae). The sampling model indicated that this frequency was consistent with a true abundance of no more than 0.53%. These results were confirmed, with excellent agreement, by the V6 data: at 1,000 sequences/sample, the maximum abundance OTU is found in 32 of 33 samples, consistent with an abundance of no more than 0.66%. However, at a coverage depth of 10,000 sequences/sample, this OTU is found in all 33 samples but at a minimum observed abundance of 0.02%, consistent with a true abundance of no more than 0.1%. Using all the V6 data without controlling for sampling effort, the minimum observed abundance is consistent with a true abundance of no more than 0.07% (the estimate of the true abundance falls with increased sample size because it is less likely that the low frequency would be observed due to sampling error when more total sequences contribute to the result). Thus, we conclude, with 95% confidence, based on the even sampling used for the other analyses in this study (i.e., 1,000 sequences/sample from V2/3, 10,000 sequences/sample for V6) that the maximum abundance of any OTU across all samples cannot exceed the V2/3 result of 0.53%, although the true maximum abundance might be as much as an order of magnitude lower than this based on the greater depth of coverage in the V6 samples.
In summary, the analysis showed that no phylotype is present at more than ˜0.5% abundance in all of the samples in this study, and that although individual microbiotas are dominated by a few abundant phylotypes, these groups vary dramatically in their proportional representation in the sampled gut communities. Also, no phylotypes were detectable in all individuals sampled within this range of coverage (
The International Human Microbiome Project has emphasized the importance of sequencing the genomes of a panel of reference microbial strains. Therefore, shotgun pyrosequencing was used to sample the fecal microbiomes of 18 individuals representing 6 of the families described in Example 1.
Pyrosequencing of total community DNA
Shotgun sequencing runs were performed on the 454 FLX pyrosequencer from total community DNA of 3 lean European American MZ twin-pairs and their mothers plus 3 obese European American MZ twin pairs and their mothers, yielding 8,294,835 reads and 14,730 16S rRNA fragments. Two samples were also analyzed on a single run employing 454/Roche GS FLX Titanium extra long read sequencing technology (Tables 8 and 9). Sequencing reads with degenerate bases (“Ns”) were removed along with all duplicate sequences, as sequences of identical length and content are a common artifact of the pyrosequencing methodology. Finally, human sequences were removed by identifying sequences homologous to the H. sapiens reference genome (BLASTN e-value<10−5, % identity>75, and score>50).
aID nomenclature: Family Number, Twin number or mom, and BMI category (Le = lean, Ov = overweight, Ob = Obese; e.g. F1T1Le Stands for family 1, twin 1, lean)
bSequences used after removing low quality, duplicate, and human sequences
c16S rRNA gene fragments identified in microbiome sequencing reads
aKey: % sequences used = percentage of sequences remaining after removing low quality, duplicate, and human sequences; Hsa = reads matching the H. sapiens genome; % RDP = percentage of reads matching the RDP 16S rRNA database; % KEGG, % STRING, % NR = percentage of reads that were assignable to entries in these various databases; % Gut = percentage of reads assigned to the database of 42 reference genomes
The distributions of taxa, genes, orthologs, metabolic pathways, and high-level gene categories were tallied based on the corresponding annotation of the best-BLAST-hit sequence found in each reference database. For KEGG analysis, the closest matching gene with an annotation was used, since many genes in the database remain unannotated, including all KEGG orthologous groups (KOs) assigned to genes with an identical e-value (commands -e 0.00001 -m 9 -b 100 were used to run NCBI BLASTX). Custom Perl scripts were used for all KEGG, STRING, and NCBI NR analyses. Selected genes from recently sequenced reference genomes were manually annotated using NCBI-BLASTP searches against the KEGG, STRING, and NR database. The 42 reference genome database includes predicted proteins from draft or complete assemblies of Alistipes putredinis, Bacteroides WH2, Bacteroides thetaiotaomicron 3731, Bacteroides thetaiotaomicron 7330, Bacteroides thetaiotaomicron 5482, Bacteroides fragilis, Bacteroides caccae, Bacteroides distasonis, Bacteroides ovatus, Bacteroides stercoris, Bacteroides uniformis, Bacteroides vulgatus, Parabacteroides merdae, Anaerostipes caccae, Anaerotruncus colihominis, Anaerofustis stercorihominis, Bacteroides capillosus, Clostridium bartlettii, Clostridium bolteae, Clostridium eutactus, Clostridium leptum, Clostridium ramosum, Clostridium scindens, Clostridium sp. L2-50, Clostridium spiroforme, Dorea longicatena, Eubacterium dolichum, Eubacterium eligens, Eubacterium rectale, Eubacterium siraeum, Eubacterium ventriosum, Faecalibacterium prausnitzii M212, Peptostreptococcus micros, Ruminococcus gnavus, Ruminococcus obeum, Ruminococcus torques, Collinsella aerofaciens, Bifidobacterium adolescentis, Bifidobacterium longum, Escherichia coli K12, Methanobrevibacter smithii, and Methanobrevibacter stadtmanae (see http://genome.wustl.edu/pub/ and NCBI GenBank). Draft assemblies of Clostridium sp. SS2-1 and Clostridium symbiosum were also used for functional clustering and diversity analyses (http://genome.wustl.edu/pub/). Coverage plots (percent identity plots) were generated using nucmer and mummerplot (part of the MUMmer v3.19 package), and default parameters.
Annotations were validated with simulated datasets (
ABI 3730xl capillary sequencing reads from 9 previously published adult human gut microbiomes were obtained from the NCBI TraceArchive. The full dataset from each sample was annotated by BLASTX comparisons against the KEGG and STRING database (see above; BLASTX e-value<10−5, % identity>50, and score>50). To allow quantitative comparisons between these datasets and pyrosequencing data, all forward sequencing reads was first extracted and then one ‘simulated pyrosequencer read’ from each longer capillary read was generated. Nucleotides spanning positions 100 to 322 were used from all capillary reads of suitable length, to avoid low quality regions that commonly occur at the beginning and end of the reads. These simulated reads were then annotated as described above.
16S rRNA gene fragments were identified in each microbiome through BLASTN searches of the RDP database (version 9.33; e-value<10−5; Bit-score>50; % identity>50; alignment length≧00). Putative 16S rRNA gene fragments were then aligned using the NAST multi-aligner with a minimum template length of 100 bases and minimum % identity of 75%. Taxonomy was assessed after insertion into an ARB neighbor-joining tree.
Microbiomes were clustered based on their profiles after normalizing across all sampled communities (z-score), using the Pearson's correlation distance metric, followed by single-linkage hierarchical clustering in addition to Principal Components Analysis (Cluster3.0). Results were visualized using the Treeview Java applet. Functional diversity (Shannon index and evenness) was calculated using the number of assignements in each microbiome to each of the 254 pathways present in the KEGG database (EstimateS 8.0). The maximum possible index is the natural log of the total number of pathways: In (254) or 5.54. Shannon evenness was calculated by dividing the Shannon index for a given microbiome by the maximum possible index (scale of 0 to 1, with 1 representing a microbiome with all pathways found at an equal abundance). Results were compared to simulated metagenomic reads generated from 36 recently sequenced reference human gut-derived Bacteroidetes and Firmicutes genomes (http://genome.wustl.edu/pub/organism/). Reads were produced by Readsim v0.10, using the following options: -n 10000 -modlr normal -meanlr 223 -stdlr 0.3. The mean and standard deviation for length of the simulated reads was based on the observed read-length distribution of the 18 fecal microbiome datasets (Table 9).
One fundamental parameter that governs the utility of reference genomes is the ability to accurately assign fragmentary reads from metagenomic datasets to these genomes. Therefore, the filtered pyrosequencing reads from the fecal microbiomes of 18 individuals from the 6 different families described in Example 1 (3 lean twin-pairs and their mothers; 3 obese twin pairs and their mothers; Tables 1 and 2) were compared to a custom database of 42 human gut associated bacterial and archaeal genomes (
The custom database of 42 reference genomes included 23 Firmicutes but only 13 Bacteroidetes. Since the Firmicutes dominate the gut microbiotas of subjects (
The effect of technical advances that produce longer reads on improving these assignments was also tested by sequencing fecal community samples from one twin pair using next-generation Titanium pyrosequencing methods [average read length of 341±134 nt (SD) versus 208±68 for the standard FLX platform].
The filtered sequences obtained in Example 3 from the 18 microbiomes were used to conduct a functional analysis of gut microbiomes.
Metagenomic sequence reads described in Example 3 were searched against a library of modules derived from all entries in the Carbohydrate-Active enZymes (CAZy) database (www.cazy.org using FASTY, e-value<10−6). This library consists of ˜180,000 previously annotated modules (catalytic modules, carbohydrate binding modules (CBMs) and other non-catalytic modules or domains of unknown function) derived from 80,000 protein sequences. The number of sequencing reads matching each CAZy family was divided by the number of total sequences assigned to CAZymes and multiplied by 100 to calculate a relative abundance. An R2 value was calculated for each pair of CAZy profiles. The distribution of glycoside hydrolase similarity scores was then compared to the distribution of glycosyltransferase similarity scores.
Xipe (version 2.4) was employed for bootstrap analyses of pathway enrichment and depletion, using the parameters sample size=10,000 and confidence level=0.95. Linear regressions were performed in Excel (version 11.0, Microsoft). Mann-Whitney and Student's t-tests were utilized to identify statistically significant differences between two groups (Prism v4.0, Graph Pad; Excel version 11.0, Microsoft). The Bonferroni correction was used to correct for multiple hypotheses. The Mantel test was used to compare distance matrices: the matrix of each pairwise comparison of the abundance of each reference genome, and the abundance of each metabolic pathway, were compared (Mantel program in Python using PyCogent; 10,000 replicates). Data are represented as mean±SEM unless otherwise indicated.
Odds ratios were used to identify ‘commonly-enriched’ genes in the gut microbiome. In short, all gut microbiome sequences were compared against the custom database of 42 gut genomes (BLASTX e-value<10−5, bitscore>50, and % identity>50). A gene by sample matrix was then screened to identify genes ‘commonly-enriched’ in either the obese or lean gut microbiome (defined by an odds ratio greater than 2 or less than 0.5 when comparing the pooled obese twin microbiomes to the pooled lean twin microbiomes and when comparing each individual obese twin microbiome to the aggregate lean twin microbiome, or vice versa). The statistical significance of enriched or depleted genes was then calculated using a modified t-test (q-value<0.05; calculated with code kindly supplied by Mihai Pop and J. R. White, University of Maryland). To search for genes that were consistently enriched or depleted in all six MZ twin-pairs, a gene-by-sample matrix was generated based on BLASTX comparisons of each microbiome with our custom 42-genome database, and an odds ratio was calculated by directly comparing the frequency of each gene in each twin versus the respective co-twin. The analysis revealed only 49 genes (odds ratio>2 or <0.5): they represent a variety of taxonomic groups, including Firmicutes, Bacteroidetes, and Actinobacteria and did not show any clear functional trends.
Sequences matching 156 total CAZyme families were found within at least one human gut microbiome, including 77 glycoside hydrolase, 21 carbohydrate-binding module, 35 glycosyltransferase, 12 polysaccharide lyase, and 11 carbohydrate-esterase families (Table 10A and B). On average 2.62±0.13% of the gut microbiome could be assigned to CAZymes (a total of 217,615 sequences), a percentage that is greater than the most abundant KEGG pathway in the gut microbiome (‘Transporters’; 1.20±0.06%), and indicative of the abundant and diverse set of microbial genes in the distal gut microbiome directed towards accessing a wide range of polysaccharides.
Category-based clustering of the functions from each microbiome was performed using Principal Components Analysis (PCA) and hierarchical clustering. This analysis revealed two distinct clusters of gut microbiomes based on metabolic profile, corresponding to samples with an increased abundance of Firmicutes and Actinobacteria, and samples with a high abundance of Bacteroidetes (
aGroups found at an average relative abundance 1% are shown
bID nomenclature: Family number, Twin number or mother and BMI category (Le = lean, Ov = overweight, Ob = obese e.g. F1T1Le stands for family 1 twin 1 lean)
aGroups found at an average relative abundance 1% are shown
bID nomenclature: Family number, Twin number or mother and BMI category (Le = lean, Ov = overweight, Ob = obese e.g. F1T1Le stands for family 1 twin 1 lean)
Functional clustering of phylum-wide sequence bins representing reads from the Firmicutes or the Bacteroidetes showed discrete clustering by phylum (
One of the major goals of the international human microbiome project is to determine whether there is an identifiable ‘core microbiome’ of shared organisms, genes, or functional capabilities found in a given body habitat of all or the vast majority of humans. Although all of the 18 gut microbiomes surveyed showed a high level of beta-diversity with respect to the relative abundance of bacterial phyla (
aPathways with an average relative abundance of >0.6% are shown
Overall functional diversity was compared using the Shannon index, a measurement that combines diversity (the number of different types of metabolic pathways) and evenness (the relative abundance of each pathway). The human gut microbiomes surveyed had a stable and high Shannon index value (4.63±0.01), close to the maximum possible level of functional diversity (5.54; See Example 4). Despite the presence of a small number of abundant metabolic pathways (listed in Table 11), the overall functional profile of each gut microbiome is quite even (Shannon evenness of 0.84±0.001 on a scale of 0 to 1), demonstrating that most metabolic pathways are found at a similar level of abundance. Interestingly, the level of functional diversity in each microbiome was significantly linked to the relative abundance of the Bacteroidetes (R2=0.81, p<10−6); microbiomes enriched for Firmicutes/Actinobacteria had a decreased level of functional diversity. This observation is consistent with an analysis of simulated metagenomic reads generated from each of 36 Bacteroidetes and Firmicutes genomes (
At a finer level, 26-53% of ‘enzyme’-level functional groups were shared across all 18 microbiomes, while 8-22% of the groups were unique to a single microbiome (
Metabolic reconstructions of the ‘core’ microbiome revealed significant enrichment for a number of expected functional categories, including those involved in transcription, translation, and amino acid metabolism (
CAZyme profiles of glycoside hydrolases and glycosyltransferases were compared by calculating the R2 value between each pair of microbiomes (see Table 10 for families with a relative abundance >1%). This analysis revealed that all individuals have a similar profile of glycosyltransferases (mean R2=0.96±0.003), while the profiles of glycoside hydrolases were significantly more variable, even between family members (mean R2=0.80±0.01; p<10-30, paired Student's t-test). This suggests that the number and spectrum of glycoside hydrolases is probably affected by external factors such as diet more than the glycosyltransferases.
To identify metabolic pathways associated with obesity, only non-core associated (variable) functional groups were included in a comparison of the gut microbiomes of lean and obese twin pairs. A bootstrap analysis was used to identify metabolic pathways that were enriched or depleted in the variable obese gut microbiome. For example, similar to a mouse model of diet-induced obesity, the obese human gut microbiome was enriched for phosphotransferase systems involved in microbial processing of carbohydrates (Table 12). To identify specific genes that were significantly associated with obesity, all gut microbiome sequences were compared against the custom database of 42 gut genomes described in example 3. A gene-by-sample matrix was then screened to identify genes ‘commonly-enriched’ in either the obese or lean gut microbiome (defined by an odds ratio>2 or <0.5 when comparing all obese twin microbiomes to the aggregate lean twin microbiome or vice versa). The analysis yielded 383 genes that were significantly different between the obese and lean gut microbiome (q-value<0.05; 273 enriched and 110 depleted in the obese microbiome; see Tables 13 and 14). By contrast, only 49 genes were consistently enriched or depleted between all twin-pairs.
These obesity-associated genes were representative of the taxonomic differences described above: 75% of the obesity-enriched genes were from Actinobacteria (vs. 0% of lean-enriched genes; the other 25% are from Firmicutes) while 42% of the lean-enriched genes were from Bacteroidetes (vs. 0% of the obesity-enriched genes). Their functional annotation indicated that many are involved in carbohydrate, lipid, and amino acid metabolism (Tables 13-14). Together, they comprise an initial set of microbial biomarkers of the obese gut microbiome.
BMI category by ethnicity for the entire MOAFTS wave 5 cohort, based on 3326 twins with complete data on height and weight is summarized in Table 15. Dizygotic (DZ) twins had a significantly higher mean BMI than monozygotic (MZ) twins [25.8±6.5 vs. 24.8±5.9, p<0.001, mean±sd], and a higher prevalence of overweight (22.8 vs 20.9%) and obese (20.7 vs 16.1%; χ2=31.6, p<0.001). This may reflect a higher dizygotic twinning rate among obese women (MZ twinning occurs randomly39). BMI was more highly correlated in MZ twins than in DZ twins, both in EA pairs (rMZ=0.80, rDZ=0.48) and in AA pairs (rMZ=0.73, rDZ=0.26), and this remained true when analysis was restricted to pairs concordant for obesity (EA: rMZ=0.61, rDZ=0.27; AA rMZ=0.62, rDZ=−0.11) or concordant for leanness (EA: rMZ=0.43, rDZ=0.14; AA: rMZ=0.55, rDZ=0.39). After age-adjustment, quantitative genetic modeling yielded an estimated additive genetic variance for BMI of 68% (95% Confidence Interval [CI]: 57-79%), shared environmental variance of 14% (95% CI: 2-24%), and non-shared environmental variance of 14% (95% CI: 17-21%). Data from the Behavioral Risk Factor Surveillance System for Missouri women of comparable age in 2006 yield higher rates of overweight and obesity in EA women (23.8% overweight and 25% obese) compared to rates observed in MOAFTS (19.6% overweight EA, 14.8% obese EA).
aAll numbers are percentages. Underwight: , 18.5 kg/m2; Lean 18.5-24.9 kg/m2 25-29.9 kg/m2; Obese I: 30-34..9 kg/m2; Obese II: 35-39.9 kg/m2; Obese III: ≧40 kg/m2
Lean and obese women selected for inclusion in the biospecimen collection project were representative of the entire cohort of lean and obese MOAFTS twins in terms of parity (nulliparous/parous), educational attainment (more than high school education/high school education or less) and marital status (married or living with someone as married/not married; p>0.05 for all comparisons). Obese EA women providing biospecimens had a mean BMI at wave 5 of 36.9±4.7 compared with a mean among EA lean women of 21.4±1.5 (mean±sd). EA twins were selected as being stably lean across all waves of data collection (i.e., baseline at median age 15, one-year follow-up, 5-year follow-up and 7-year follow-up), with a self-reported BMI of 18.5-24.9 kg/m2.
A frequently reported result from any 16S rRNA gene sequence-based survey is the relative abundance of bacterial phyla. Given the broad nature of these phyla and the fact that a relatively few phyla dominate the human distal gut microbiota, it might be expected that the relative abundance of each phylum be consistent regardless of the amplification and sequencing methods used. However, differences were observed between methods in this study (
This application claims the priority of U.S. National application Ser. No. 13/002,137, filed Mar. 29, 2011; which claims the priority of PCT application number PCT/US2009/049253, filed Jun. 30, 2009; which claims the priority of U.S. provisional application No. 61/076,887, filed Jun. 30, 2008; and U.S. provisional application No. 61/101,011, filed Sep. 29, 2008, each of which is hereby incorporated by reference in its entirety.
This invention was made in part with government support under grant DK078669 awarded by the National Institutes of Health. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
61076887 | Jun 2008 | US | |
61101011 | Sep 2008 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 13002137 | Mar 2011 | US |
Child | 14147163 | US |