The present invention encompasses arrays and methods related to the genome of M. smithii.
According to the Center for Disease Control (CDC), over sixty percent of the United States population is overweight, and almost twenty percent are obese. This translates into 38.8 million adults in the United States with a Body Mass Index (BMI) of 30 or above. Obesity is also a world-wide health problem with an estimated 500 million overweight adult humans [body mass index (BMI) of 25.0-29.9 kg/m2] and 250 million obese adults. This epidemic of obesity is leading to worldwide increases in the prevalence of obesity-related disorders, such as diabetes, hypertension, as well as cardiac pathology, and non-alcoholic fatty liver disease (NAFLD).
According to the National Institute of Diabetes, Digestive and Kidney Diseases (NIDDK) approximately 280,000 deaths annually are directly related to obesity. The NIDDK further estimated that the direct cost of healthcare in the U.S. associated with obesity is $51 billion. In addition, Americans spend $33 billion per year on weight loss products. In spite of this economic cost and consumer commitment, the prevalence of obesity continues to rise at alarming rates. From 1991 to 2000, obesity in the U.S. grew by 61%.
Additionally, malnourishment or disease may lead to individuals being under weight. The World Health Organization estimates that one-third of the world is under-fed and one-third is starving. Over 4 million will die this year from malnourishment. One in twelve people worldwide is malnourished, including 160 million children under the age of 5.
Humans are host to a diverse and dynamic population of microbial symbionts, with the majority residing within the distal intestine. The gut microbiota contains representatives from ten known divisions of the domain Bacteria, with an estimated 500-1000 species-level phylogenetic types present in a given healthy adult human; the microbiota is dominated by members of two divisions of Bacteria, the Bacteroidetes and the Firmicutes. Members of the domain Archaea are also represented, most prominently by a methanogenic Euryarchaeote, Methanobrevibacter smithii and occasionally Methanosphaera stadtmanae. The density of colonization increases by eight orders of magnitude from the proximal small intestine (103) to the colon (1011). The distal intestine is an anoxic bioreactor whose microbial constituents help the subject by providing a number of key functions: e.g., breakdown of otherwise indigestible plant polysaccharides and regulating subject storage of the extracted energy; biotransformation of conjugated bile acids and xenobiotics; degradation of dietary oxalates; synthesis of essential vitamins; and education of the immune system.
Dietary fiber is a key source of nutrients for the microbiota. Monosaccharides are absorbed in the proximal intestine, leaving dietary fiber that has escaped digestion (e.g. resistant starches, fructans, cellulose, hemicelluloses, pectins) as the primary carbon sources for microbial members of the distal gut. Fermentation of these polysaccharides yields short-chain fatty acids (SCFAs; mainly acetate, butyrate and propionate) and gases (H2 and CO2). These end products benefit humans. For example, SCFAs are an important source of energy, as they are readily absorbed from the gut lumen and are subsequently metabolized in the colonic mucosa, liver, and a variety of peripheral tissues (e.g., muscle). SCFAs also stimulate colonic blood flow and the uptake of electrolytes and water.
Methanogens are members of the domain Archaea. Methanogens thrive in many anaerobic environments together with fermentative bacteria. These habitats include natural wetlands as well as man-made environments, such as sewage digesters, landfills, and bioreactors. Hydrogen-consuming, mesophilic methanogens are also present in the intestinal tracts of many invertebrate and vertebrate species, including termites, birds, cows, and humans. Using methane breath tests, clinical studies estimate that between 50 and 80 percent of humans harbor methanogens.
Culture- and non-culture-based enumeration studies have demonstrated that members of the Methanobrevibacter genus are prominent gut mesophilic methanogens. The most comprehensive enumeration of the adult human colonic microbiota reported to date found a single predominant archaeal species, Methanobrevibacter smithii. This gram-positive-staining Euryarchaeote can comprise up to 1010 cells/g feces in healthy humans, or ˜10% of all anaerobes in the colons of healthy adults.
A focused set of nutrients are consumed for energy by methanogens: primarily H2/CO2, formate, acetate, but also methanol, ethanol, methylated sulfur compounds, methylated amines and pyruvate. These compounds are typically converted to CO2 and methane (e.g. acetate) or reduced with H2 to methane alone (e.g. methanol or CO2). Some methanogens are restricted to utilizing only H2/CO2 (e.g. Methanobrevibacter arbophilicus), or methanol (e.g. Methanospaera stadtmanae). Other more ubiquitous methanogens exhibit greater metabolic diversity, like Methanosarcina species. In vitro studies suggest that M. smithii is intermediate in this metabolic spectrum, consuming H2/CO2 and formate as energy sources.
Fermentation of dietary fiber is accomplished by syntrophic interactions between microbes linked in a metabolic food web, and is a major energy-producing pathway for members of the Bacteroidetes and the Firmicutes. Bacteroides thetaiotaomicron has previously been used as a model bacterial symbiont for a variety of reasons: (i) it effectively ferments a range of otherwise indigestible plant polysaccharides in the human colon; (ii) it is genetically manipulatable; and, (iii) it is a predominant member of the human distal intestinal microbiota. Its 6.26 Mb genome has been sequenced: the results reveal that B. thetaiotaomicron has the largest collection of known or predicted glycoside hydrolases of any prokaryote sequenced to date (226 in total; by comparison, our human genome only encodes 98 known or predicted glycoside hydrolases). B. thetaiotaomicron also has a significant expansion of outer membrane polysaccharide binding and importing proteins (over 200 paralogs of two starch binding proteins known as SusC and SusD), as well as a large repertoire of environmental sensing proteins [e.g. 50 extra-cytoplasmic function (ECF)-type sigma factors; 25 anti-sigma factors, and 32 novel hybrid two-component systems]. Functional genomics studies of B. thetaiotaomicron in vitro and in the ceca of gnotobiotic mice, indicates that it is capable of very flexible foraging for dietary (and host-derived) polysaccharides, allowing this organism to have a broad niche and contributing to the functional stability of the microbiota in the face of changes in the diet.
In vitro biochemical studies of B. thetaiotaomicron and closely related Bacteroides species (B. fragilis and B. succinogenes) indicate that their major end products of fermentation are acetate, succinate, H2 and CO2. Small amounts of pyruvate, formate, lactate and propionate are also formed.
Anaerobic fermentation of sugars causes flux through glycolytic pathways, leading to accumulation of NADH (via glyceraldehyde-3P dehydrogenase) and the reduced form of ferredoxin (via pyruvate:ferredoxin oxidoreductase). B. thetaiotaomicron is able to couple NAD+ recovery to reduction of pyruvate to succinate (via malate dehydrogenase and fumarase reductase), or lactate (via lactate dehydrogenase). Oxidation of reduced ferredoxin is easily coupled to production of H2. However, H2 formation is, in principle, not energetically feasible at high partial pressures of the gas. In other words, lower partial pressures of H2 (1-10 Pa) allow for more complete oxidation of carbohydrate substrates. The subject removes some hydrogen from the colon by excretion of the gas in the breath and as flatus. However, the primary mechanism for eliminating hydrogen is by interspecies transfer from bacteria by hydrogenotrophic methanogens. Formate and acetate can also be transferred between some species, but their transfer is complicated by their limited diffusion across the lipophilic membranes of the producer and consumer. In areas of high microbial density or aggregation like in the gut, interspecies transfer of hydrogen, formate and acetate is likely to increase with decreasing physical distance between microbes.
Methanogen-mediated removal of hydrogen can have a profound impact on bacterial metabolism. Not only does re-oxidation of NADH occur, but end products of fermentation undergo a shift from a mixture of acetate, formate, H2, CO2, succinate and other organic acids to predominantly acetate and methane with small amounts of succinate. This facilitates disposal of reducing equivalents, and produces a potential gain in ATP production due to increased acetate levels. For example, a reduction in hydrogen allows Clostridium butyricum to acquire 0.7 more ATP equivalents from fermentation of hexose sugars. Co-culture of M. smithii with a prominent cellulolytic ruminal bacterial species, Fibrobacter succinogenes S85, results in augmented fermentation, as manifested by increases in the rate of ATP production and organic acid concentrations. Co-culture of M. smithii association with Ruminococcus albus eliminates NADH-dependent ethanol production from acetyl-CoA, thereby skewing bacterial metabolism towards production of acetate, which is more energy yielding. H2-producing fibrolytic bacterial strains from the human colon exhibit distinct cellulose degradation phenotypes when co-cultured with M. smithii, indicating that some bacteria are more responsive to syntrophy with methanogens.
While there is suggestive evidence that methanogens cooperate metabolically with members of Bacteroides, studies have not elucidated the impact of this relationship on a subject's energy storage or on the specificity and efficiency of carbohydrate metabolism. Colonization of adult germ-free mice with M. smithii and/or B. thetaiotaomicron revealed that the methanogen increased the efficiency and changed the specificity of bacterial digestion of dietary glycans. Moreover, co-colonized mice exhibited a significantly greater increase in adiposity compared with mice colonized with either organism alone.
One aspect of the present invention encompasses an array. The array comprises a substrate having disposed thereon at least one nucleic acid, wherein the nucleic acid comprises a nucleic acid sequence selected from the nucleic acid sequences listed in Table A.
Another aspect of the present invention encompasses an array. The array comprises a substrate having disposed thereon at least one polypeptide, wherein the polypeptide is encoded by a nucleic acid sequence selected from the nucleic acid sequences listed in Table A.
Yet another aspect of the present invention encompasses an array. The array comprises a substrate having diposed thereon at least one nucleic add encoding an adhesin-like protein, wherein the nucleic acid comprises a nucleic acid sequence selected from group consisting of SEQ ID NO: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, and 95.
Yet another aspect of the present invention encompasses an array. The array comprises a substrate having diposed thereon at least one nucleic acid encoding an adhesin-like protein, wherein the nucleic acid comprises a nucleic acid sequence selected from group consisting of SEQ ID NOs: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, and 95. In addition, the array further comprises at least one nucleic acid sequence selected from the group consisting of SEQ ID NOs: 97-2140
Yet another aspect of the present invention encompasses an array. The array comprises a substrate having disposed thereon at least one polypeptide, wherein the polypeptide is encoded by a nucleic acid sequence selected from group consisting of SEQ ID NO: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, and 95.
Yet another aspect of the present invention encompasses an array. The array comprises a substrated having disposed thereon at least one polypeptide, wherein the polypeptide comprises at least one amino acid sequence selected from the group consisting of SEQ ID NO: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, and 96.
Yet another aspect of the present invention encompasses an array. The array comprises a substrate having disposed thereon at least one polypeptide, wherein the polypeptide is encoded by a nucleic acid sequence selected from group consisting of SEQ ID NOs: 97-2140.
Yet another aspect of the present invention encompaases a method of selecting a compound that has efficacy for modulating a gene product of M. smithii present in the gastrointestinal tract of a subject, wherein the gene product correlates with a biomolecule selected from the group consisting of SEQ ID NOs: 1-96. The method comprises comparing a plurality of biomolecules from M. smithii before and after administration of a compound for modulating a gene product of M. smithii, such that if the abundance of a biomolecule that correlates with the gene product is modulated, the compound is efficacious in modulating a gene product of M. smithii, and selecting a compound that modulates a M. smithii gene product.
Yet another aspect of the present invention encompasses a method of selecting a compound that has efficacy for modulating a gene product of M. smithii present in the gastrointestinal tract of a subject. The method comprises comparing an M. smithii gene profile to a gene profile of the subject, identifying a gene product of the M. smithii gene profile that is divergent from a corresponding gene product of the subject gene profile, or absent in the gene profile of the subject, and selecing a compound that modulates the M. smithii gene product but does not substantially modulate the corresponding divergent gene product of the subject.
Still another aspect of the invention encompasses a method for modulating a gene product of M. smithii present in the gastrointestinal tract of a subject. The method comprises administering to the subject an HMG-CoA reductase inhibitor. The inhibitor may be formulated for release in the distal portion of the subject's gastrointestinal tract and thereby substantial inhibit more of the HMG-CoA reductase of M. smithii compared to the subject's HMG-CoA reductase.
Other aspects and iterations of the invention are described more thoroughly below.
The present invention provides arrays and methods utilizing the genome and proteome of the methanogen M. smithii, which is the predominant methanogen present in the human gastrointestinal tract. Modulating the Archea population of the gastrointestinal tract of a subject, of which M. smithii is a major component, modulates the efficiency and selectivity of carbohydrate metabolism. The genome and proteome of M. smithii may be used, according to the methods presented herein, to promote weight loss or weight gain in a subject. In particular, the methods of the present invention may be used to identify compounds that promote weight loss or weight gain in a subject. The method relies on applicants' discovery that certain M. smithii gene products are conserved between M. smithii strains, yet divergent (or absent) from the correlating gene products expressed by the subject's microbiome or genome. This allows the selection of compounds that specifically modulate the M. smithii gene product, while substantially not modulating the subject's gene product.
One aspect of the invention encompasses use of biomolecules in an array. As used herein, biomolecule refers to either nucleic acids derived from a M. smithii genome, or polypeptides derived from a M. smithii proteome. A M. smithii genome or proteome may be utilized to construct arrays that may be used for several applications, including discovery of compounds that modulate one or more M. smithii gene products, judging efficacy of existing weight gain or loss regimes, and for the identification of biomarkers involved in weight gain or loss, or a weight gain or loss related disorder.
The array may be comprised of a substrate having disposed thereon at least one biomolecule. Several substrates suitable for the construction of arrays are known in the art. The substrate may be a material that may be modified to contain discrete individual sites appropriate for the attachment or association of the biomolecule and is amenable to at least one detection method. Alternatively, the substrate may be a material that may be modified for the bulk attachment or association of the biomolecule and is amenable to at least one detection method. Non-limiting examples of substrate materials include glass, modified or functionalized glass, plastics (including acrylics, polystyrene and copolymers of styrene and other materials, polypropylene, polyethylene, polybutylene, polyurethanes, TeflonJ, etc.), nylon or nitrocellulose, polysaccharides, nylon, resins, silica or silica-based materials including silicon and modified silicon, carbon, metals, inorganic glasses and plastics. In an exemplary embodiment, the substrates may allow optical detection without appreciably fluorescing.
A substrate may be planar, a substrate may be a well, i.e. a 1534-, 384-, or 96-well plate, or alternatively, a substrate may be a bead. Additionally, the substrate may be the inner surface of a tube for flow-through sample analysis to minimize sample volume. Similarly, the substrate may be flexible, such as a flexible foam, including closed cell foams made of particular plastics. Other suitable substrates are known in the art.
The biomolecule or biomolecules may be attached to the substrate in a wide variety of ways, as will be appreciated by those in the art. The biomolecule may either be synthesized first, with subsequent attachment to the substrate, or may be directly synthesized on the substrate. The substrate and the biomolecule may both be derivatized with chemical functional groups for subsequent attachment of the two. For example, the substrate may be derivatized with a chemical functional group including, but not limited to, amino groups, carboxyl groups, oxo groups or thiol groups. Using these functional groups, the biomolecule may be attached using functional groups on the biomolecule either directly or indirectly using linkers.
The biomolecule may also be attached to the substrate non-covalently. For example, a biotinylated biomolecule can be prepared, which may bind to surfaces covalently coated with streptavidin, resulting in attachment. Alternatively, a biomolecule or biomolecules may be synthesized on the surface using techniques such as photopolymerization and photolithography. Additional methods of attaching biomolecules to arrays and methods of synthesizing biomolecules on substrates are well known in the art, i.e. VLSIPS technology from Affymetrix (e.g., see U.S. Pat. No. 6,566,495, and Rockett and Dix, Xenobiotica 30(2):155-177, each of which is hereby incorporated by reference in its entirety).
In one embodiment, the biomolecule or biomolecules attached to the substrate are located at a spatially defined address of the array. Arrays may comprise from about 1 to about several hundred thousand addresses. In one embodiment, the array may be comprised of less than 10,000 addresses. In another alternative embodiment, the array may be comprised of at least 10,000 addresses. In yet another alternative embodiment, the array may be comprised of less than 5,000 addresses. In still another alternative embodiment, the array may be comprised of at least 5,000 addresses. In a further embodiment, the array may be comprised of less than 500 addresses. In yet a further embodiment, the array may be comprised of at least 500 addresses.
A biomolecule may be represented more than once on a given array. In other words, more than one address of an array may be comprised of the same biomolecule. In some embodiments, two, three, or more than three addresses of the array may be comprised of the same biomolecule. In certain embodiments, the array may comprise control biomolecules and/or control addresses. The controls may be internal controls, positive controls, negative controls, or background controls.
The biomolecule may be a nucleic acid derived from any M. smithii genome. In some embodiments, a biomolecule may be a nucleic acid derived from the M. smithii genome with the GenBank Accession number CP000678, comprising, in part, nucleic acid sequences labeled MSM001 through MSM1795, inclusive. In other embodiments, a biomolecule may be a nucleic acid derived from a M. smithii genome selected from the group consisting of a M. smithii genome with the GenBank Accession number CP000678, AEKU00000000, AELL00000000, AELM00000000, AELN00000000, AELO00000000, AELP00000000, AELQ00000000, AELR00000000, AELS00000000, AELT00000000, AELU00000000, AELV00000000, AELW00000000, AELX00000000, AELY00000000, AELZ00000000, AEMA00000000, AEMB00000000, AEMC00000000, and AEMD00000000. Such nucleic acids may include RNA (including mRNA, tRNA, and rRNA), DNA, and naturally occurring or synthetically created derivatives. A nucleic acid derived from a M. smithii genome is a nucleic acid that comprises at least a portion of a nucleic acid sequence selected from the nucleic acid sequences listed in Table A, Table B, and/or Table D. The nucleic acid may comprise fewer than 10, at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 150, at least 200, or more than 200 bases of a nucleic acid sequence selected from the nucleic acid sequences listed in Table A, Table B, and/or Table D. One embodiment of the invention is an array comprising a substrate, the substrate having disposed thereon at least one nucleic acid, wherein the nucleic acid comprises a nucleic acid sequence selected from the nucleic acid sequences listed in Table A. In another embodiment, the nucleic acid consists of a nucleic acid sequence selected from the nucleic acid sequences listed in Table A. In another embodiment, the nucleic acid comprises a nucleic acid sequence selected from the nucleic acid sequences listed in Table D. In other exemplary embodiments, the nucleic acid consists of a nucleic acid sequence selected from the nucleic acid sequences listed in Table D. In some exemplary embodiments, the nucleic acid comprises a nucleic acid sequence selected from the nucleic acid sequences listed in Table B. In other exemplary embodiments, the nucleic acid consists of a nucleic acid sequence selected from the nucleic acid sequences listed in Table B. In still other exemplary embodiments, the nucleic acid comprises a nucleic acid sequence selected from the nucleic acid sequences listed in Table B, and further comprises a nucleic acid sequence selected from the nucleic acid sequences listed in Table D.
In one embodiment, the nucleic acid or nucleic acids may be selected from the group of nucleic acids listed in Table A, B and D that are conserved among M. smithii strains, but divergent from a corresponding nucleic acid of the subject. In this context, a “corresponding nucleic acid” refers to a nucleic acid sequence of the subject, or the subject's micobiome, that has greater than 75% identity to a nucleic acid sequence of Table A, B or D. The term, “divergent,” as used herein, refers to a sequence of Table A, B or D that has less than 99% identity, but greater than 75% identity, with a nucleic acid sequence of the subject, or the subject's microbiome. For instance, in some embodiments, divergent refers to less than or equal to about 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 79%, 78%, 77%, or 76%, identity between the nucleic acid sequence of Table A, B or D and the nucleic acid sequence of the subject. Conversely, the term “conserved,” as used herein, refers to a nucleic acid sequence of one M. smithii strain that has greater than about 90% identity to a nucleic acid sequence from another M. smithii strain.
If a subject, or the subject's microbiome, does not comprise a nucleic acid sequence that has greater than 75% identity to a nucleic acid sequence of Table A, B, or D, that nucleic acid sequence of Table A, B, or D is “absent” from the subject. In certain embodiments, the nucleic acid or nucleic acids of the array of the invention are selected from the group comprising nucleic acid sequences that are absent from the subject gut microbiome or genome. For instance, in one embodiment, the nucleic acid may be selected from the group of nucleic acids designated absent or divergent in Table 2. Percent identity may be determined as discussed below.
Alternatively, the nucleic acid or nucleic acids may be selected from the group of nucleic acids listed in Table A, B and D that are not conserved among M. smithii strains, For example, while the genome of a M. smithii strain may comprise at least one nucleic acid that enodes an adhesin-like protein (ALP), the nucleic acid encoding a particular ALP may not be present in all strains. Stated another way, a nucleic acid encoding a particular type of protein (e.g. an ALP) may show strain-specific differences in representation among M. smithii strains.
Alternatively, the nucleic acid or nucleic acids derived from a M. smithii genome may be selected from the group of nucleic acids comprising nucleic acid sequences that are expressed in vivo by M. smithii while residing in the gastrointestinal tract of a subject. In another embodiment, the nucleic acid or nucleic acids may be selected from the group of nucleic acids comprising nucleic acid sequences that are expressed by M. smithii while residing in the gastrointestinal tract of a subject, and whose expression levels are not affected by the presence of actively fermenting bacteria. In another embodiment, the nucleic acid or nucleic acids may be selected from the group of nucleic acids comprising nucleic acid sequences that are expressed by M. smithii while residing in the gastrointestinal tract of a subject, and whose expression levels are affected by the presence of actively fermenting bacteria. The in vivo expression levels of a nucleic acid may be determined by methods known in the art, including RT-PCR. In yet another embodiment, the nucleic acid or nucleic acids may be selected from the group of nucleic acids that encode the M. smithii transcriptome or metabolome. In yet another embodiment, the nucleic acid or nucleic acids may be selected from the group of nucleic acids whose expression level differ between strains of M. smithii when the bacteria are grown in vitro or in vivo under similar conditions.
The biomolecule may also be a polypeptide derived from a M. smithii proteome. A polypeptide derived from the M. smithii proteome is a polypeptide that is encoded by at least a portion of a nucleic acid sequence selected from the nucleic acid sequences listed in Table A, Table B or Table D. The polypeptide may comprise fewer than 10, at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 150, at least 200, or more than 200 amino acids encoded by a nucleic acid sequence selected from the nucleic acid sequences listed in Table A, Table B or Table D. One embodiment of the invention is an array comprising a substrate, the substrate having disposed thereon at least one polypeptide, wherein the polypeptide is encoded by a nucleic acid sequence selected from the nucleic acid sequences listed in Table A. Another embodiment of the invention is an array comprising a substrate, the substrate having disposed thereon at least one polypeptide, wherein the polypeptide is encoded by a nucleic acid sequence selected from the nucleic acid sequences listed in Table B. Still another embodiment of the invention is an array comprising a substrate, the substrate having disposed thereon at least one polypeptide, wherein the polypeptide comprises an amino acid sequence selected listed in Table C. A different embodiment of the invention is an array comprising a substrate, the substrate having disposed thereon at least one polypeptide, wherein the polypeptide is encoded by a nucleic acid sequence selected from the nucleic acid sequences listed in Table D.
In one embodiment, the polypeptide or polypeptides may be selected from the group of polypeptides comprising polypeptide sequences that are conserved amoung M. smithii strains, but divergent from a corresponding polypeptide of the subject. The terms conserved and divergent are used as defined above. In certain embodiments, the polypeptide or polypeptides are selected from the group comprising polypeptides absent from the subject gut microbiome or genome. In another embodiment, the polypeptide or polypeptides may be selected from the group of polypeptides comprising polypeptide sequences with greater than about 75% but less than about 99% identity to a correlating polypeptide from the subject gut microbiome or genome. In yet another embodiment, the polypeptide or polypeptides may be selected from the group of polypeptides comprising polypeptide sequence with greater than about 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, or 98% identity to a correlating polypeptide from the subject gut microbiome or genome. In one embodiment, for instance, the polypeptide may be encoded by a nucleic acid designated absent or divergent in Table 2. Percent identity may be determined as discussed below.
Alternatively, the polypeptide or polypeptides derived from a M. smithii proteome may be encoded by a nucleic acid selected from the group of nucleic acids comprising nucleic acid sequences that are expressed in vivo by M. smithii while residing in the gastrointestinal tract of a subject. In another embodiment, the polypeptide or polypeptides may be encoded by a nucleic acid selected from the group of nucleic acids comprising nucleic acid sequences that are expressed by M. smithii while residing in the gastrointestinal tract of a subject, and whose expression levels are not affected by the presence of actively fermenting bacteria. In still another embodiment, the polypeptide or polypeptides may be encoded by a nucleic acid selected from the group of nucleic acids comprising nucleic acid sequences that are expressed by M. smithii while residing in the gastrointestinal tract of a subject, and whose expression levels are affected by the presence of actively fermenting bacteria. In yet another embodiment, the polypeptide or polypeptides may be encoded by a nucleic acid selected from the group of nucleic acids that encode the M. smithii transcriptome or metabolome.
The array may alternatively be comprised of biomolecules from the genome or proteome of M. smithii that are indicative of an obese subject microbiome. Alternatively, the array may be comprised of biomolecules from the genome or proteome of M. smithii that are indicative of a lean subject microbiome. A biomolecule is “indicative” of an obese or lean microbiome if it tends to appear more often in one type of microbiome compared to the other. Such differences may be quantified using commonly known statistical measures, such as binomial tests. An “indicative” biomolecule may be referred to as a “biomarker.”
Additionally, the array may be comprised of biomolecules from the genome or proteome of M. smithii that are modulated in the obese subject microbiome compared to the lean subject microbiome. As used herein, “modulated” may refer to a biomolecule whose representation or activity is different in an obese subject microbiome compared to a lean subject microbiome. For instance, modulated may refer to a biomolecule that is enriched, depleted, up-regulated, down-regulated, degraded, or stabilized in the obese subject microbiome compared to a lean subject microbiome. In one embodiment, the array may be comprised of a biomolecule enriched in the obese subject microbiome compared to the lean subject microbiome. In another embodiment, the array may be comprised of a biomolecule depleted in the obese subject microbiome compared to the lean subject microbiome. In yet another embodiment, the array may be comprised of a biomolecule up-regulated in the obese subject microbiome compared to the lean subject microbiome. In still another embodiment, the array may be comprised of a biomolecule down-regulated in the obese subject microbiome compared to the lean subject microbiome. In still yet another embodiment, the array may be comprised of a biomolecule degraded in the obese subject microbiome compared to the lean subject microbiome. In an alternative embodiment, the array may be comprised of a biomolecule stabilized in the obese subject microbiome compared to the lean subject microbiome.
Additionally, the biomolecule may be at least 80, 85, 90, or 95% homologous to a biomolecule derived from Tables A-D. In one embodiment, the biomolecule may be at least 80, 81, 82, 83, 84, 85, 86, 87, 88, or 89% homologous to a biomolecule derived from Table A. In another embodiment, the biomolecule may be at least 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% homologous to a biomolecule derived from Table A. In another embodiment, the biomolecule may be at least 80, 81, 82, 83, 84, 85, 86, 87, 88, or 89% homologous to a biomolecule derived from Table B. In another embodiment, the biomolecule may be at least 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% homologous to a biomolecule derived from Table B. In another embodiment, the biomolecule may be at least 80, 81, 82, 83, 84, 85, 86, 87, 88, or 89% homologous to a biomolecule derived from Table C. In another embodiment, the biomolecule may be at least 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% homologous to a biomolecule derived from Table C. In another embodiment, the biomolecule may be at least 80, 81, 82, 83, 84, 85, 86, 87, 88, or 89% homologous to a biomolecule derived from Table D. In another embodiment, the biomolecule may be at least 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% homologous to a biomolecule derived from Table D.
In determining whether a biomolecule is substantially homologous or shares a certain percentage of sequence identity with a sequence of the invention, sequence similarity may be determined by conventional algorithms, which typically allow introduction of a small number of gaps in order to achieve the best fit. In particular, “percent identity” of two polypeptides or two nucleic acid sequences is determined using the algorithm of Karlin and Altschul (Proc. Natl. Acad. Sci. USA 87:2264-2268, 1993). Such an algorithm is incorporated into the BLASTN and BLASTX programs of Altschul et al. (J. Mol. Biol. 215:403-410, 1990). BLAST nucleotide searches may be performed with the BLASTN program to obtain nucleotide sequences homologous to a nucleic acid molecule of the invention. Equally, BLAST protein searches may be performed with the BLASTX program to obtain amino acid sequences that are homologous to a polypeptide of the invention. To obtain gapped alignments for comparison purposes, Gapped BLAST is utilized as described in Altschul et al. (Nucleic Acids Res. 25:3389-3402, 1997). When utilizing BLAST and Gapped BLAST programs, the default parameters of the respective programs (e.g., BLASTX and BLASTN) are employed. See http://www.ncbi.nlm.nih.gov for more details.
Furthermore, the biomolecules used for the array may be labeled. One skilled in the art understands that the type of label selected depends in part on how the array is being used. Suitable labels may include fluorescent labels, chromagraphic labels, chemi-luminescent labels, FRET labels, etc. Such labels are well known in the art.
The arrays may be utilized in several suitable applications. For example, the arrays may be used in methods for detecting association between a biomolecule of the array and a compound in a sample. In this context, compound refers to a nucleic acid, a protein, a lipid, or chemical compound. This method typically comprises incubating a sample with the array under conditions such that the compounds comprising the sample may associate with the biomolecules attached to the array. The association is then detected, using means commonly known in the art, such as fluorescence. “Association,” as used in this context, may refer to hybridization, covalent binding, ionic binding, hydrogen binding, van der Waals binding, and dated binding. A skilled artisan will appreciate that conditions under which association may occur will vary depending on the biomolecules, the compounds, the substrate, and the detection method utilized. As such, suitable conditions may have to be optimized for each individual array created.
In one embodiment, the array may be used as a tool in methods to determine whether a compound has efficacy for modulating a gene product of M. smithii. In certain embodiments, the array may be used as a tool in methods to determine whether a compound has efficacy for modulating a gene product of M. smithii while M. smithii is residing in the gastrointestinal tract of a subject. Typically, such a method comprises comparing a plurality of biomolecules from either the M. smithii genome or proteome before and after administration of a compound for modulating a gene product of M. smithii, such that if the abundance of a biomolecule that correlates with the gene product is modulated, the compound is efficacious in modulating a gene product of M. smithii. The array may also be used to quantitate the plurality of biomolecule's of M. smithii's genome or proteome before and after administration of a compound. The abundance of each biomolecule in the plurality may then be compared to determine if there is a decrease in the abundance of biomolecules associated with the compound. In other embodiments, the array may be used to quantify the levels of M. smithii in an obese subject prior to, during, or after treatment for obesity. Alternatively, the array may be used to quantify the levels of M. smithii in an underfed individual prior to, during, or after implementation of dietary recommendations designed to increase nutrient and energy harvest.
In a further embodiment, the array may be used as a tool in methods to determine the identity of an M. smithii strain present in a subject's microbiome. Typically, such a method comprises collecting a sample from a subject and using an array of the invention to determine the presence, absence or abundance of an ALP gene product in the sample, and determining whether a particular strain is present in the sample based on the presence, absence or abundance of an ALP gene product.
In still a further embodiment, the array may be used as a tool in methods to determine whether a compound has efficacy for treatment of weight gain or a weight gain related disorder in a subject. Typically, such a method comprises comparing a plurality of biomolecules of M. smithii's genome or proteome before and after administration of a compound for the treatment of weight gain or a weight gain related disorder, such that if the abundance of biomolecules associated with weight gain decreased after treatment, the compound is efficacious in treating weight gain in a subject.
In still a further embodiment, the array may be used as a tool in methods to determine whether a compound has efficacy for treatment of weight loss or a weight loss related disorder in a subject. Typically, such a method comprises comparing a plurality of biomolecules of M. smithii's genome or proteome before and after administration of a compound for the treatment of weight loss or a weight loss related disorder, such that if the abundance of biomolecules associated with weight loss decreased after treatment, the compound is efficacious in treating weight loss in a subject.
The present invention also encompasses M. smithii gene profiles. Generally speaking, a gene profile is comprised of a plurality of values with each value representing the abundance of a biomolecule derived from either the M. smithii genome or proteome. The abundance of a biomolecule may be determined, for instance, by sequencing the nucleic acids of the M. smithii genome as detailed in the examples. This sequencing data may then be analyzed by known software to determine the abundance of a biomolecule in the analyzed sample. An M. smithii gene profile may comprise biomolecules from more than one M. smithii strain. The abundance of a biomolecule may also be determined using an array described above. For instance, by detecting the association between compounds comprising an M. smithii derived sample and the biomolecules comprising the array, the abundance of M. smithii biomolecules in the sample may be determined.
A profile may be digitally-encoded on a computer-readable medium. The term “computer-readable medium” as used herein refers to any medium that participates in providing instructions to a processor for execution. Such a medium may take many forms, including but not limited to non-volatile media, volatile media, and transmission media. Non-volatile media may include, for example, optical or magnetic disks. Volatile media may include dynamic memory. Transmission media may include coaxial cables, copper wire and fiber optics. Transmission media may also take the form of acoustic, optical, or electromagnetic waves, such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or other magnetic medium, a CD-ROM, CDRW, DVD, or other optical medium, punch cards, paper tape, optical mark sheets, or other physical medium with patterns of holes or other optically recognizable indicia, a RAM, a PROM, and EPROM, a FLASH-EPROM, or other memory chip or cartridge, a carrier wave, or other medium from which a computer can read.
A particular profile may be coupled with additional data about that profile on a computer readable medium. For instance, a profile may be coupled with data about what therapeutics, compounds, or drugs may be efficacious for that profile. Conversely, a profile may be coupled with data about what therapeutics, compounds, or drugs may not be efficacious for that profile. Alternatively, a profile may be coupled with known risks associated with that profile. Non-limiting examples of the type of risks that might be coupled with a profile include disease or disorder risks associated with a profile. The computer readable medium may also comprise a database of at least two distinct profiles.
Profiles may be stored on a computer-readable medium such that software known in the art and detailed in the examples may be used to compare more than one profile.
Another aspect of the invention is a method for selecting a compound that has efficacy for modulating a gene product of M. smithii present in the gastrointestinal tract of a subject. The method generally comprises comparing an M. smithii gene profile to a gene profile of the subject and identifying a gene product of the M. smithii gene profile that is divergent from a corresponding gene product of the subject gene profile, or absent in the gene profile of the subject. Next the method comprises selecting a compound that modulates the M. smithii gene product, but does not substantially modulate the corresponding gene product of the subject. In a further embodiment, the compound also does not substantially modulate the corresponding gene product of an archaeon other than M. smithii, or a non-archaeal microbe, in the gastrointestinal tract of the subject. The compound may for instance, inhibit or promote the growth of M. smithii. The compound may also decrease or increase the efficiency of carbohydrate metabolism in the subject. Accordingly, the compound may also promote weight loss or weight gain in the subject.
Another further aspect of the invention is a method for selecting a compound that has efficacy for modulating a gene product of M. smithii present in the gastrointestinal tract of a subject. The method comprises comparing an M. smithii gene profile to a gene profile of the subject and identifying a gene product of the M. smithii gene profile that is divergent from a corresponding gene product of the subject gene profile, or absent in the gene profile of the subject. Next the method comprises selecting a compound that can be administered so as to modulate the M. smithii gene product, but not substantially modulate the corresponding gene product of the subject. In a further embodiment, the administered compound also does not substantially modulate the corresponding gene product of an archaeon other than M. smithii, or a non-archaeal microbe, in the gastrointestinal tract of the subject. The compound may be administered, for instance, so as to inhibit or promote the growth of M. smithii. The compound may also be administered so as to decrease or increase the efficiency of carbohydrate metabolism in the subject. Accordingly, the compound may also be administered so as to promote weight loss or weight gain in the subject.
The present invention also encompasses a kit for evaluating a compound, therapeutic, or drug. Typically, the kit comprises an array and a computer-readable medium. The array may comprise a substrate having disposed thereon at least one biomolecule that is derived from the M. smithii genome or proteome. In some embodiments, the array may comprise at least one biomolecule that is derived from the M. smithii metabolome or transcriptome. The computer-readable medium may have a plurality of digitally-encoded profiles wherein each profile of the plurality has a plurality of values, each value representing the abundance of a biomolecule derived from M. smithii detected by the array. The array may be used to determine a profile for a particular subject under particular conditions, and then the computer-readable medium may be used to determine if the profile is similar to known profile stored on the computer-readable medium. Non-limiting examples of possible known profiles include obese and lean profiles for several different subjects.
A further aspect of the invention encompasses a method of promoting weight loss or gain. The method incorporates the discovery that modulating the Archaeon population of the gastrointestinal tract of a subject, of which M. smithii is a major component, modulates the efficiency and selectivity of carbohydrate metabolism. Furthermore, the method relies on applicants' discovery that certain M. smithii gene products are conserved amoung M. smithii strains, yet divergent (or absent) from the correlating gene products expressed by the subject's microbiome or genome. This divergence allows the selection of compounds to specifically modulate the M. smithii gene product, while substantially not modulating the subject's gene product, as described above.
By way of non-limiting example, weight loss may be promoted by administering an HMG-CoA reductase inhibitor to a subject. In an exemplary embodiment, the inhibitor will selectively inhibit the HMG-CoA reductase expressed by M. smithii and not the HMG-CoA reductase expressed by the subject. In another embodiment, a second HMG CoA-reductase inhibitor may be administered that selectively inhibits the HMG CoA-reductase expressed by the subject in lieu of the HMG-CoA reductase expressed by M. smithii. In yet another embodiment, an HMG-CoA reductase inhibitor that selectively inhibits the HMG-CoA reductase expressed by the subject may be administered in combination with an HMG-CoA reductase inhibitor that selectively inhibits the HMG-CoA reducase expressed by M. smithii. One means that may be utilized to achieve such selectivity is via the use of time-release formulations as discussed below. Compounds that inhibit HMG-CoA reductase are well known in the art. For instance, non-limiting examples include atorvastatin, pravastatin, rosuvastatin, and other statins.
These compounds, for example HMG-CoA reductase inhibitors, may be formulated into pharmaceutical compositions and administered to subjects to promote weight loss. According to the present invention, a pharmaceutical composition includes, but is not limited to, pharmaceutically acceptable salts, esters, salts of such esters, or any other adduct or derivative which upon administration to a subject in need is capable of providing, directly or indirectly, a composition as otherwise described herein, or a metabolite or residue thereof, e.g., a prodrug.
The pharmaceutical compositions maybe administered by several different means that will deliver a therapeutically effective dose. Such compositions can be administered orally, parenterally, by inhalation spray, rectally, intradermally, intracisternally, intraperitoneally, transdermally, bucally, as an oral or nasal spray, or topically (i.e. powders, ointments or drops) in dosage unit formulations containing conventional nontoxic pharmaceutically acceptable carriers, adjuvants, and vehicles as desired. Topical administration may also involve the use of transdermal administration such as transdermal patches or iontophoresis devices. The term parenteral as used herein includes subcutaneous, intravenous, intramuscular, or intrasternal injection, or infusion techniques. In an exemplary embodiment, the pharmaceutical composition will be administered in an oral dosage form. Formulation of drugs is discussed in, for example, Hoover, John E., Remington's Pharmaceutical Sciences, Mack Publishing Co., Easton, Pa. (1975), and Liberman, H. A. and Lachman, L., Eds., Pharmaceutical Dosage Forms, Marcel Decker, New York, N.Y. (1980).
The amount of an HMG-CoA reductase inhibitor that constitutes an “effective amount” can and will vary. The amount will depend upon a variety of factors, including whether the administration is in single or multiple doses, and individual subject parameters including age, physical condition, size, and weight. Those skilled in the art will appreciate that dosages may also be determined with guidance from Goodman & Goldman's The Pharmacological Basis of Therapeutics, Ninth Edition (1996), Appendix II, pp. 1707-1711 and from Goodman & Goldman's The Pharmacological Basis of Therapeutics, Tenth Edition (2001), Appendix II, pp. 475-493.
As described above, an HMG-CoA reductase inhibitor may be specific for the M. smithii enzyme, or for the subject's enzyme, depending, in part, on the selectivity of the particular inhibitor and the area the inhibitor is targeted for release in the subject. For example, an inhibitor may be targeted for release in the upper portion of the gastrointestinal tract of a subject to substantially inhibit the subject's enzyme. In contrast, the inhibitor may be targeted for release in the lower portion of the gastrointestinal tract of a subject, i.e., where M. smithii resides, then the inhibitor may substantially inhibit M. smithii's enzyme.
In order to selectively control the release of an inhibitor to a particular region of the gastrointestinal tract for release, the pharmaceutical compositions of the invention may be manufactured into one or several dosage forms for the controlled, sustained or timed release of one or more of the ingredients. In this context, typically one or more of the ingredients forming the pharmaceutical composition is microencapsulated or dry coated prior to being formulated into one of the above forms. By varying the amount and type of coating and its thickness, the timing and location of release of a given ingredient or several ingredients (in either the same dosage form, such as a multi-layered capsule, or different dosage forms) may be varied.
The coating can and will vary depending upon a variety of factors, including, the particular ingredient, and the purpose to be achieved by its encapsulation (e.g., time release). The coating material may be a biopolymer, a semi-synthetic polymer, or a mixture thereof. The microcapsule may comprise one coating layer or many coating layers, of which the layers may be of the same material or different materials. In one embodiment, the coating material may comprise a polysaccharide or a mixture of saccharides and glycoproteins extracted from a plant, fungus, or microbe. Non-limiting examples include corn starch, wheat starch, potato starch, tapioca starch, cellulose, hemicellulose, dextrans, maltodextrin, cyclodextrins, inulins, pectin, mannans, gum arabic, locust bean gum, mesquite gum, guar gum, gum karaya, gum ghatti, tragacanth gum, funori, carrageenans, agar, alginates, chitosans, or gellan gum. In another embodiment, the coating material may comprise a protein. Suitable proteins include, but are not limited to, gelatin, casein, collagen, whey proteins, soy proteins, rice protein, and corn proteins. In an alternate embodiment, the coating material may comprise a fat or oil, and in particular, a high temperature melting fat or oil. The fat or oil may be hydrogenated or partially hydrogenated, and preferably is derived from a plant. The fat or oil may comprise glycerides, free fatty acids, fatty acid esters, or a mixture thereof. In still another embodiment, the coating material may comprise an edible wax. Edible waxes may be derived from animals, insects, or plants. Non-limiting examples include beeswax, lanolin, bayberry wax, carnauba wax, and rice bran wax. The coating material may also comprise a mixture of biopolymers. As an example, the coating material may comprise a mixture of a polysaccharide and a fat.
In an exemplary embodiment, the coating may be an enteric coating. The enteric coating generally will provide for controlled release of the ingredient, such that drug release can be accomplished at some generally predictable location in the lower intestinal tract below the point at which drug release would occur without the enteric coating. In certain embodiments, multiple enteric coatings may be utilized. Multiple enteric coatings, in certain embodiments, may be selected to release the ingredient or combination of ingredients at various regions in the lower gastrointestinal tract and at various times.
The enteric coating is typically, although not necessarily, a polymeric material that is pH sensitive. A variety of anionic polymers exhibiting a pH-dependent solubility profile may be suitably used as an enteric coating in the practice of the present invention to achieve delivery of the active to the lower gastrointestinal tract. Suitable enteric coating materials include, but are not limited to: cellulosic polymers such as hydroxypropyl cellulose, hydroxyethyl cellulose, hydroxypropyl methyl cellulose, methyl cellulose, ethyl cellulose, cellulose acetate, cellulose acetate phthalate, cellulose acetate trimellitate, hydroxypropylmethyl cellulose phthalate, hydroxypropylmethyl cellulose succinate and carboxymethylcellulose sodium; acrylic acid polymers and copolymers, preferably formed from acrylic acid, methacrylic acid, methyl acrylate, ammonio methylacrylate, ethyl acrylate, methyl methacrylate and/or ethyl methacrylate (e.g., those copolymers sold under the trade name “Eudragit”); vinyl polymers and copolymers such as polyvinyl pyrrolidone, polyvinyl acetate, polyvinylacetate phthalate, vinylacetate crotonic acid copolymer, and ethylene-vinyl acetate copolymers; and shellac (purified lac). In one embodiment, the coating may comprise plant polysaccharides that can only be digested in the distal gut by the microbiota. For instance, a coating may comprise pectic galactans, polygalacturonates, arabinogalactans, arabinans, or rhamnogalacturonans. Combinations of different coating materials may also be used to coat a single capsule.
The thickness of a microcapsule coating may be an important factor in some instances. For example, the “coating weight,” or relative amount of coating material per dosage form, generally dictates the time interval between oral ingestion and drug release. As such, a coating utilized for time release of the ingredient or combination of ingredients into the gastrointestinal tract is typically applied to a sufficient thickness such that the entire coating does not dissolve in the gastrointestinal fluids at pH below about 5, but does dissolve at pH about 5 and above. The thickness of the coating is generally optimized to achieve release of the ingredient at approximately the desired time and location.
As will be appreciated by a skilled artisan, the encapsulation or coating method can and will vary depending upon the ingredients used to form the pharmaceutical composition and coating, and the desired physical characteristics of the microcapsules themselves. Additionally, more than one encapsulation method may be employed so as to create a multi-layered microcapsule, or the same encapsulation method may be employed sequentially so as to create a multi-layered microcapsule. Suitable methods of microencapsulation may include spray drying, spinning disk encapsulation (also known as rotational suspension separation encapsulation), supercritical fluid encapsulation, air suspension microencapsulation, fluidized bed encapsulation, spray cooling/chilling (including matrix encapsulation), extrusion encapsulation, centrifugal extrusion, coacervation, alginate beads, liposome encapsulation, inclusion encapsulation, colloidosome encapsulation, sol-gel microencapsulation, and other methods of microencapsulation known in the art. Detailed information concerning materials, equipment and processes for preparing coated dosage forms may be found in Pharmaceutical Dosage Forms: Tablets, eds. Lieberman et al. (New York: Marcel Dekker, Inc., 1989), and in Ansel et al., Pharmaceutical Dosage Forms and Drug Delivery Systems, 6th Ed. (Media, Pa.: Williams & Wilkins, 1995).
The term “activity of the microbiota population” refers to the microbiome's ability to harvest energy.
An “effective amount” is a therapeutically-effective amount that is intended to qualify the amount of agent that will achieve the goal of modulating an M. smithii gene product, promoting weight loss, or promoting weight gain.
As used herein, “gene product” refers to a nucleic acid derived from a particular gene, or a polypeptide derived from a particular gene. For instance, a gene product may be a mRNA, tRNA, rRNA, cDNA, peptide, polypeptide, protein, or metabolite.
“Metabolome” as used herein is defined as the network of enzymes and their substrates and biochemical products, which operate within subject or microbial cells under various physiological conditions.
As used herein, the term “pharmaceutically acceptable salt” refers to those salts which are, within the scope of sound medical judgment, suitable for use in contact with the tissues of humans and other subjects without undue toxicity, irritation, allergic response and the like, and are commensurate with a reasonable benefit/risk ratio. Pharmaceutically acceptable salts are well known in the art. For example, S. M. Berge, et al. describe pharmaceutically acceptable salts in detail in J. Pharmaceutical Sciences, 66: 1 19 (1977), incorporated herein by reference. The salts can be prepared in situ during the final isolation and purification of the composition of the invention, or separately by reacting the free base function with a suitable organic acid. Non-limiting examples of pharmaceutically acceptable, nontoxic acid addition salts are salts of an amino group formed with inorganic acids such as hydrochloric acid, hydrobromic acid, hydroionic acid, nitric acid, carbonic acid, phosphoric acid, sulfuric acid and perchloric acid.
As used herein, the “subject” may be, generally speaking, an organism capable of supporting M. smithii in its gastrointestinal tract. For instance, the subject may be a rodent or a human. In one embodiment, the subject may be a rodent, i.e. a mouse, a rat, a guinea pig, etc. In an exemplary embodiment, the subject is human.
“Transcriptome” as used herein is defined as the network of genes that are being actively transcribed into mRNA in subject or microbial cells under various physiological conditions.
The phrase “weight gain related disorder” includes disorders resulting from, at least in part, obesity. Representative disorders include metabolic syndrome, type II diabetes, hypertension, cardiovascular disease, and nonalcoholic fatty liver disease. The phrase “weight loss related disorder” includes disorders resulting from, at least in part, weight loss. Representative disorders include malnutrition and cachexia.
As various changes could be made in the above compounds, products and methods without departing from the scope of the invention, it is intended that all matter contained in the above description and in the examples given below, shall be interpreted as illustrative and not in a limiting sense.
The following examples illustrate various iterations of the invention.
Methanobrevibacter smithii strain PS (ATCC 35061) was grown as described below for 6d at 37° C. DNA was recovered from harvested cell pellets using the QIAGEN Genomic DNA Isolation kit with mutanolysin (1 unit/mg wet weight cell pellet; Sigma) added to facilitate lysis of the microbe. An ABI 3730xl instrument was used for paired end-sequencing of inserts in a plasmid library (average insert size 5 Kb; 42,823 reads; 11.6×-fold coverage), and a fosmid library (average insert size of 40 Kb; 7,913 reads; 0.6×-fold coverage). Phrap and PCAP (Huang et al. (2003) Genome Res 13:2164-70) were used to assemble the reads. A primer-walking approach was used to fill-in sequence gaps. Physical gaps and regions of poor quality (as defined by Consed; Gordon et al., (1998) Genome Res. 8, 195-202) were resolved by PCR-based re-sequencing. The assembly's integrity and accuracy was verified by clone constraints. Regions containing insufficient coverage or ambiguous assemblies were resolved by sequencing spanning fosmids. Sequence inversions were identified based on inconsistency of constraints for a fraction of read pairs in those regions. The final assembly consisted of 12.6× sequence coverage with a Phred base quality value 40. Open-reading frames (ORFs) were identified and annotated as described below.
All experiments using mice were performed using protocols approved by the animal studies committee of Washington University. Gnotobiotic male mice belonging to the NMRI inbred strain (n=5-6/group/experiment) were colonized with either M. smithii (14d) or B. thetaiotaomicron (28d) alone, or first with B. thetaiotaomicron for 14d followed by co-colonization with M. smithii. All mice were sacrificed at 12 weeks of age. Cecal contents from each mouse were flash frozen, and stored at −80° C. RNA was extracted from an aliquot of the harvested cecal contents (100-300 mg) and used to generate cDNA for qRT-PCR assays. qRT-PCR data were normalized to 16S rRNA (ΔΔCT method) prior to comparing treatment groups. PCR primers are listed in Table 14. All amplicons were 100-150 bp.
Perchloric acid-, hydrochloric acid-, and alkali extracts of freeze dried cecal contents were prepared, and established pyridine nucleotide-linked microanalytic assays (Passonneau et al., (1993) Enzymatic Analysis: A practical guide) used to measure metabolites.
All M. smithii strains [PS (ATCC 35061), ALI (DSMZ 2375), B181 (DSMZ 11975), and F1 (DSMZ 2374)] were cultivated in 125 ml serum bottles containing 15 ml MBC medium supplemented with 3 g/L formate, 3 g/L acetate, and 0.3 mL of a freshly prepared anaerobic solution of filter-sterilized 2.5% Na2S (Samuel et al., (2006) PNAS 103:10011-6). The remaining volume in the bottle (headspace) contained a 4:1 mixture of H2 and CO2: the headspace was replenished every 1-2d for a 6d growth at 37° C.
M. smithii PS was also cultured in a BioFlor-110 batch fermentor with dual 1.5 L fermentation vessels (New Brunswick Scientific). Each vessel contained 750 ml of supplemented MBC medium. One hour prior to inoculation, 7.5 ml of sterile 2.5% Na2S solution was added to the vessel, followed by one half of the contents of a serum bottle culture that had been harvested on day 5 of growth. Microbes were then incubated at 37° C. under a constant flow of H2/CO2 (4:1) (agitation setting, 250 rpm). One milliliter of a sterile solution of 2.5% Na2S was added daily.
Colonization of Germ-Free Mice with M. smithii PS with and without B. thetaiotaomicron VPI-5482
Mice belonging to the NMRI/KI inbred strain (Bry et al., (1996) Science 273:1380-3) were housed in gnotobiotic isolators (Hooper et al., (2002) Mol Cell Micro 31:559-589) where they were maintained under a strict 12 h light cycle (lights on at 0600 h) and fed a standard, autoclaved, polysaccharide-rich chow diet (B&K Universal, East Yorkshire, UK) ad libitum. Each mouse was inoculated at age 8 weeks with a single gavage of 108 microbes/strain [B. thetaiotaomicron was harvested from an overnight culture in TYG medium (Sonnenburg et al., Science 307:1955-9); M. smithii from serum bottles containing MBC medium after a 5d incubation at 37° C. (Samuel et al., (2006) PNAS 103:10011-6)]. For a given experiment, the same preparation of cultured microbes was used for mono-association (single species added) and co-colonization (both species added).
Immediately after animals were sacrificed, cecal contents were recovered for preparation of DNA, RNA and biochemical studies (n=5 mice/treatment group/experiment; n=3 independent experiments). Colonization density was assessed using a qPCR-based assay employing species-specific primers, as described in Samuel et al., (2006) PNAS 103:10011-6.
M. smithii genes were identified by comparing outputs from GLIMMER v.3.01 (Delcher et al., (1999) Nucleic Acids Res 27:4636-41), CRITICA v.1.05b (Badger et al., (1999) Mol Biol Evol 16:512-24), and GeneMarkS v.2.1 (Besemer et al. (2001) Nucleic Acids Res 29:2607-18). WUBLAST (http://blast.wustl.edu/) was then used to identify all ORFs with significant hits to the NR database (as of Dec. 1, 2006). ORFs containing <30 codons and without significant homology (e-value threshold of 10−5) to other proteins, were eliminated. rRNA and tRNA genes were identified using BLASTN and tRNA-Scan (Lowe et al., (1997) Nucleic cids Res 25:955-64). Annotation of the predicted proteome of M. smithii was completed by using BLAST homology searches against public databases, and domain analysis with Pfam (http://pfam.janelia.org/) and InterProScan [release 12.1; (Apweiler et al., Nucleic Acids Res 29:37-40)]. Functional classifications were made based on GO terms assigned by InterProScan and homology searches against COGs (Tatusov et al., (2001) Nucleic Acids Res 29:22-8), followed by manual curation. Metabolic pathways were constructed based on KEGG (Kanehisa et al., (2004) Nucleic Acids Res 32:D277-80) and MetaCyc [(Caspi et al., (2006) Nucleic Acids Res 34:D511-6); http://metacyc.org/)]. Glycosyltransferases (GT) were categorized according to CAZy [http://www.cazy.org; (Coutinho et al., (1999) Recent Advances in Carbohydrate Bioengineering p. 3-12)]. Putative prophage genes were identified using two independent approaches: (i) BLASTN of predicted M. smithii ORFs against a database of all known phage sequences (http://phage.sdsu.edu/phage); and (ii) Hidden Markov Model (HMM)-based analysis using Phage_Finder (Fouts (2006) Nucleic Acids Res 34:5839-51).
GO term assignments—The number of genes in each archaeal genome that were assigned to each GO term, or to its parents in the GO hierarchy [version available on Jun. 6, 2006; (Ashburner et al., (2000) Nat Genet. 25:25-9)] were totaled. All terms assigned to at least five genes in a given genome were then subjected to statistical tests for overrepresentation, and all terms with a total of five genes across all tested genomes for under-representation, using a binomial comparison reference set (see Table 6). Genes that could not be assigned to a GO category were excluded from the reference sets. A false discovery rate of <0.05 was set for each comparison (Benjamini et al., (1995) J of the Royal Statistical Society B 57:289-300). All tests were implemented using the Math::CDF Perl module (E. Callahan, Environmental Statistics, Fountain City, Wis.; available at http://www.cpan.org/), and scripts written in Perl.
Percent identity comparisons—The M. smithii PS genome sequence was compared to the M. stadtmanae genome (Fricke et al., (2006) J Bacteriol 188:642-58) and a 78 Mb metagenomic dataset of the human fecal microbiome (Gill et al., (2006) Science 312:1355-9) using NUCmer (part of MUMmer v.3.19 package; (Kurtz et al., Genome Biol 5:R12), and a percent identity plot was generated using Mummerplot.
Genomic synteny—Comparisons of synteny between M. smithii and M. stadtmanae were completed using the Artemis Comparison Tool (Carver et al., (2005) Bioinformatics 21:3422-3) set to tBLASTX and the most stringent confidence level.
M. smithii interaction network analyses—All M. smithii COGs were submitted to the STRING database (http://string.embl.de/; (von Mering et al., (2003) Nucleic Acids Res 31:258-61) to create predicted interaction networks (0.95 confidence interval). The program Medusa (Hooper et al., (2005) Bioinformatics 21:4432-3) was then used to organize the networks and color the nodes based on their conservation in M. smithii's proteome (mutual best BLASTP hits with e-values<10−20 to the other Methanobacteriales genomes).
Clustering of adhesin-like proteins—M. smithii and M. stadtmanae ALPs were first aligned using CLUSTALW (v.1.83; (Chenna et al., (2003) Nucleic Acids Res 31:3497-500)). To retain the highest level of discrimination between the proteins, the alignment was subsequently converted into a nucleotide alignment using PAL2NAL (Suyama et al., (2006) Nucleic Acids Res 34:W609-12). The resulting alignment was used to create a maximum likelihood tree with RA×ML [Randomized accelerated maximum likelihood for high performance computing [RA×ML-VI-HPC, v2.2.1; (Stamatakis (2006) Bioinformatics 22:2688-90)] first using the GTR+CAT approximation method for rapid generation of tree topology, followed by the GTR+gamma evolutionary model for determination of likelihood values. ModelTest (v3.7; http://darwin.uvigo.es/software/modeltest.html) also identified GTR+gamma as the most appropriate evolutionary model for the dataset. Bootstrap values were determined from 100 neighbor-joining trees in Paup (v. 4.0b10, http://paup.csit.fsu.edu/). Tree visualization was completed with TreeView (Page (1996) Comput Appl Biosci 12:357-8).
Functional Genomic Analysis of M. smithii Gene Expression in Gnotobiotic Mice
RNA isolation—100-300 mg aliquots of frozen cecal contents from each gnotobiotic mouse was added to 2 ml tubes containing 250 μl of 212-300 μm-diameter acid-washed glass beads (Sigma), 500 μl of buffer A (200 mM NaCl, 20 mM EDTA), 210 μl of 20% SDS, and 500 μl of a mixture of phenol:chloroform:isoamyl alcohol (125:24:1; pH 4.5; Ambion). Samples were lysed using a bead beater (BioSpec; ‘high’ setting for 5 min at room temperature) and cellular debris was pelleted by centrifugation (10,000×g at 4° C. for 3 min). The extraction was repeated by adding another 500 μL of phenol:chloroform:isoamyl alcohol to the aqueous supernatant. RNA was precipitated from the pooled aqueous phases, resuspended in 100 μl nuclease-free water (Ambion), 350 μl Buffer RLT (QIAGEN) was added, and RNA further purified using the RNeasy mini kit (QIAGEN).
Analysis of the Sialic Acid Production by M. smithii
Reverse-phase HPLC analysis of cellular extracts—M. smithii was cultured in MBC medium, in a batch fermenter, to stationary phase (6d incubation). Cells were collected by centrifugation, washed three times in PBS, snap frozen in liquid nitrogen, and stored at −80° C. Sialic acid content was assayed using established protocols (Manzi et al., (1995) Current Protocols in Molecular Biology)). Briefly, sialic acids were liberated by homogenization of the cell pellet (˜30-50 mg wet weight) in 0.5 ml of 2M acetic acid with subsequent incubation of the homogenate for 3 h at 80° C. Samples were filtered through Microcon 10 filters (Millipore) and the filtrate, containing free sialic acid, was dried (speed-vacuum). The released sialic acid was derivatized with DMB (1,2-diamino-4,5-methylene-dioxybenzene) to yield a fluorescent adduct, which was analyzed by C18 reverse phase high-pressure liquid chromatography (RP-HPLC; Dionex DX-600 workstation). Sialic acid was quantified by comparison to known amounts of derivatized standards [N-acetylneuraminic acid (Neu5Ac) and Nglycolylneuraminic acid (Neu5Gc)], and blanks (buffer alone).
Histochemical studies—M. smithii strains PS and F1 were grown in MBC as above. Bacteroides thetaiotaomicron VPI-5482, and Bifidobacterium longum NCC2705 were grown under anaerobic conditions in TYG medium to stationary phase and used as negative controls. Escherichia coli strain K92 (ATCC 35860), which is known to produce sialic acid (Egan et al., (1977) Biochemistry 16:3687-92), was incubated in 1419 medium (ATCC) to stationary phase and used as a positive control. All strains were fixed in 1.5 ml conical plastic tubes in either 4% paraformaldehyde or 100% ethanol for at least 8 h at 4° C. Samples were then washed with PBS and stored at −20° C. in 50% ethanol, 20 mM Tris and 0.1% IGEPAL CA-630 (Sigma; prepared in deionized water) until assayed. Samples were diluted in deionized water, placed on coated glass slides (Cel-Line/Erie Scientific Co.), air-dried, dehydrated in graded ethanols (50%, 80%, 100%), treated with blocking buffer (0.3% Triton X-100, 1% BSA in PBS; 30 min at room temperature), and then incubated with 10 μg/ml fluorescein-labeled Sambucus nigra lectin (SNA; Vector Laboratories; specificity, Neu5Acα-2,6Gal/GalNAc epitopes) for 1 h at room temperature. Slides were subsequently washed with PBS, stained with 4′,6-diamidino-2-phenylindole (DAPI, 2 μg/ml; 5 min at room temperature), washed with de-ionized water, and mounted in PBS/glycerol. Slides were visualized with an Olympus BX41 microscope and photographed using a Q Imaging QICAM camera and OpenLab software (Improvision, Inc., v.3.1.5).
Transmission Electron Microscopy (TEM) of M. smithii.
Cells were harvested at day 6 of growth in the batch fermentor, and cellular morphology was defined by TEM using methods identical to those described previously for B. thetaiotaomicron (Sonnenburg et al., (2005) Science 307:1955-9). TEM studies of M. smithii present in the ceca of gnotobiotic mice that had been colonized for 14d with the archaeon were conducted using the same protocol.
Microanalytic Biochemical Analyses of Cecal Samples Recovered from Gnotobiotic Mice
Extraction of metabolites from cecal contents—For measurement of ammonia and urea levels, perchloric acid extracts were prepared from 2 mg of freeze-dried cecal contents. [Contents were collected with a 10 μl inoculation loop, quick frozen in liquid nitrogen, and lyophilized at −35° C.] The lyophilized sample was homogenized in 0.2 ml of 0.3M perchloric acid at 1° C.
For the remaining metabolites, alkali and acid extracts were prepared from 4 mg of dried cecal samples that were homogenized in 0.4 ml 0.2M NaOH at 1° C. For the alkali extract, an 80 μl aliquot was removed, heated for 20 min at 80° C. and then neutralized with 80 μl of 0.25M HCl and 100 mM Tris base. For the acid extract, a 60 μl aliquot was removed and added to 20 μl 0.7M HCl, heated for 20 min at 80° C., and then neutralized with 40 μl 100 mM Tris base. Protein content was determined in the alkali extracts using the Bradford method (Bio Rad).
Metabolite assays—The sample concentrations for ammonium and urea were high enough so that direct fluorometric measurements could be used for detection. However, to measure the low sample concentrations for asparagine, glutamate, glutamine, α-ketoglutarate and ethanol, protocols were adapted from previously established pyridine nucleotide-linked assays, an “oil well” technique, and enzymatic cycling amplification (Passonneau et al., (1993) Enzymatic Analysis: A Practical Guide). All chemicals and enzymes were from Sigma unless otherwise noted.
Ammonium and Urea: For measurement of ammonium, a 20 μl aliquot of a perchloric acid extract of a given sample of cecal contents was added to 1 ml of a solution containing 50 mM imidazole HCl (pH 7.0), 0.2 mM α-ketoglutarate, 0.5 mM EDTA, 0.02% BSA, 10 μM NADH, and 10 μg/ml beef liver glutamate dehydrogenase (in glycerol; specific activity, 40 units/mg protein). Following a 40 min incubation at 24° C., fluorescence was measured using a Ratio-3 system filter fluorometer (Farrand Optical Components and Instruments, Valhalla, N.Y.; excitation at 360 nm; emission at 460 nm). Sample blanks were run that lacked added glutamate dehydrogenase. Ammonium acetate standards were carried throughout all steps.
To measure urea concentrations, 2 μl of a 50 mg/ml solution of Jack bean urease (50 units/mg) was added to the same sample used to determine ammonium levels. Following a 40 min incubation at 24° C., urea levels were defined based on a further reduction in fluorescence. Control sample blanks lacked added urease. Reference urea standards were carried throughout all steps.
Asparagine: A 0.5 μl aliquot of the alkali extract of a given sample of cecal contents was added to 0.5 μl of a solution containing 50 mM Trizma HCl (pH 8.7), 0.04% BSA, and 4 μg/ml E. coli asparaginase (160 units/mg protein). Sample blanks lacked added asparaginase. After a 30 min incubation at 24° C., 2 μl of a solution containing 50 mM Trizma HCl (pH 8.1), 10 μM α-ketoglutarate, 10 μM NADH, 4 mM freshly prepared ascorbic acid, 10 μg/ml of pig heart glutamic-oxalacetic transaminase (220 units/mg protein), plus 5 μg/ml beef heart malic dehydrogenase (2800 units/mg protein) was added, and the resulting mixture was incubated for 30 min at 24° C. One microliter of 0.25M HCl was then introduced. After a 10 min incubation at 24° C., a 2 μl aliquot of the reaction mixture was transferred to 0.1 ml of NAD cycling reagent for 20,000 cycles of amplification and the amplified product measured according to methods described by Passonneau and Lowry ((1993) Enzymatic Analysis: A Practical Guide). Sample blanks lacked added asparaginase. Reference asparagine standards were carried throughout all steps.
Glutamate and Glutamine: A 0.1 μl aliquot from an acid extract of a given sample of cecal contents was added to 0.1 μl of reagent containing 100 mM Na acetate (pH 4.9), 20 mM HCl, 0.4 mM EDTA and 50 μg/ml E. coli glutaminase (780 units/mg protein). Another 0.1 μl aliquot of the cecal contents was added to the same reagent in a parallel reaction that lacked added glutaminase (to measure glutamate alone). Following a 60 min incubation at 24° C., 2 μl of a solution containing 50 mM Tris acetate (pH 8.5), 0.1 mM NAD+, 0.1 mM ADP and 50 μg/ml beef liver glutamate dehydrogenase (120 units/mg protein; Roche) was added to both reaction mixtures, which were subsequently incubated for 30 min at 24° C. The reactions were terminated by addition of 1 μl of 0.2M NaOH and then heated for 20 min at 80° C. A 2 μl aliquot was subsequently transferred to 0.1 ml NAD cycling reagent and subjected to 20,000 cycles of amplification. Reference glutamine and glutamate standards were carried throughout all steps.
α-Ketoglutarate—A 0.5 μl aliquot from an given alkali extract was added to 0.5 μl of reagent containing 100 mM imidazole acetate (pH 6.5), 0.04% BSA, 50 mM ammonium acetate, 0.2 mM ADP, 4 mM ascorbic acid (freshly prepared), 40 μM NADH and 20 μg/ml beef liver glutamate dehydrogenase (120 units/mg protein; Roche). Following a 30 min incubation at 24° C., the reaction was terminated by adding 0.5 μl of 0.2M HCl. A 1 μl aliquot was transferred to 0.1 ml NAD cycling reagent and subjected to 30,000 cycles of amplification. α-Ketoglutarate standards were carried throughout all steps.
Ethanol: A 0.5 μl aliquot of an acid extract from cecal contents was added to 0.5 μl of a solution consisting of 5 mM Tris HCl (pH 8.1), 0.04% BSA, 0.1 mM NAD+, and 20 μg/ml yeast alcohol dehydrogenase (350 units/mg protein). Following a 60 min incubation at 24° C., 1 μl of 0.15M NaOH was added and the mixture heated for 20 min at 80° C. A 0.5 μl aliquot of this reaction mixture was transferred to 0.1 ml of NAD cycling reagent and amplified 5000-fold. Ethanol standards were carried throughout all steps.
Whole Genome Genotyping with Custom M. smithii Gene Chips
GeneChips were manufactured by Affymetrix (http://www.affymetrix.com), based on the sequence of the PS strain genome (see Table 13 for details of the GeneChip design). Duplicate cultures of M. smithii strains PS (ATCC 35061), F1 (DSMZ 2374), ALI (DSMZ 2375) and B181 (DSMZ 11975), were grown in 125 ml serum bottles as described above. Genomic DNA was prepared from each strain using the QIAGEN Genomic DNA Isolation kit: mutanolysin (Sigma; 2.5 U/mg wet wt. cell pellet) was added to facilitate lysis of the microbes. DNA (5-7 μg) was further purified by phenolchloroform extraction and then sheared by sonication to <200 bp, labeled with biotin (Enzo BioArray Terminal Labeling Kit), denatured at 95° C. for 5 min, and hybridized to replicate GeneChips using standard Affymetrix protocols (http://www.affymetrix.com). M. smithii genes represented on the GeneChip were called “Present” or “Absent” by DNA-Chip Analyzer v1.3 (dChip; www.biostat.harvard.edu/complab/dchip/) using modeled (PM/MM ratio) data.
Pairwise comparisons were made using unpaired Student's t-test. One-way ANOVA, followed by Tukey's post hoc multiple comparison test, was used to determine the statistical significance of differences observed between three groups.
A system for culturing M. smithii in 96-well plate format was designed and constructed in the following manner (See
The canisters are heated using Electro-Flex Heat brand Pail Heaters controlled by a custom designed controller consisting of a 16A2120 temperature/process control (Love Controls), an RTD (resistance temperature detector) probe to measure internal tank temperature, and several safety features to prevent overheating or burns.
The system is pressurized with oxygen-free gas that has flowed through a custom-built oxygen scrub. Commercially available gas mixes used for culturing M. smithii contain trace levels of oxygen that would kill the organism: thus, the gas mixture must be passed through an oxygen scrub. This scrub consists of a glass tube filled with copper mesh that is heated to 350° C. with heating tape (HTS/Amptek Duo-Tape), controlled by a benchtop power controller (HTS/Amptek BT-Z). The oxygen scrub is covered with insulating tape and secured behind a heat resistant polyetherimide case. Pressure in each tank is measured and recorded with a digital manometer (LEO record, Omni Instruments).
The system is housed inside an anaerobic chamber (COY laboratories) to allow inspection and manipulation of cultures and plates without exposing M. smithii to oxygen. Each tank can house 30 standard volume 96-well plates, which can be analyzed inside the COY anaerobic chamber with a microplate reader (BioRad) that monitors growth by measuring optical density.
Stock solutions (100×) of atorvastatin were prepared in methanol, pravastatin in ethanol, and rosuvastatin in DMSO (dimethyl sulfoxide) to concentrations of 100 mM, 10 mM and 1 mM. 1.5 μl of the stock solutions were added to wells in 96-well plates and transferred to the COY anaerobic chamber where they were kept for at least 24 hours to become anaerobic. 150 microliters of actively growing Methanobrevibacter smithii cultures were then added to each well (excluding medium+drug blanks) to bring the drug concentrations to 1 mM, 100 μM and 10 μM, respectively. The plates were incubated in the newly developed pressurized heated anaerobic tank system in a 4:1 mixture of oxygen-scrubbed H2 and CO2 at a pressure of 30 psi. Cultures grown in 1% ethanol, methanol and DMSO were used as controls. Growth was measured by determining optical density at 600 nm using the BioRad microplate reader (model 680).
Starting cultures of M. smithii strains [DSMZ 861 (PS), 2374 (F1), 2375 (ALI) and 11975 (B181)] were grown in 96 well plates in 150 μl volume/well of Methanobrevibacter complex medium (MBC) supplemented with 3 g/liter formate, 3 g/liter acetate, and 33 ml/liter of 2.5% Na2S (added just before use). Each condition was tested in triplicate with the average measurement plotted.
The 1,853,160 base pair (bp) genome of the M. smithii type strain PS contains 1,795 predicted protein coding genes (Tables 1-4), 34 tRNAs, and two rRNA clusters. Some observations on the genome itself are as follows:
Elements that Affect Genome Evolution
The M. smithii PS genome contains multiple elements that can influence genome evolution, including 30 transposases, an integrated prophage (˜38 kb; MSM1640-92), eight insertion sequence (IS) elements, 16 genes involved in DNA repair, 9 restriction-modification (R-M) system subunits, and four predicted integrases (Table 4).
Several lytic phages have been reported to infect M. smithii, including a 69 kb linear phage known as PG that belongs to the ψM1-like viruses (Prangishvili et al. (2006) Virus Res 117:52-67), and another 35 kb phage (PMS11; Calendar (2005) The Bacteriophages). The PG phage is AT-rich, heavily nicked, and lytic (burst size, 30-90), with a latent period of 3-4 h (Bertani et al. (1985) EMBO Workshop on Molecular Genetics of Archaebacteria and the International Workshop on Biology and Biochemistry of Archaebacteria, pg. 398). BLAST comparisons of the 52 predicted genes in the integrated prophage of M. smithii PS against known phage genes revealed only a few homologs (Table 15). One of the prophage genes (MSM1691) encodes a pseudomurein endoisopeptidase (PeiW): this enzyme may function to cleave M. smithii's cell wall and contribute to autolysis, as related enzymes in a defective Methanothermobacter wolfeii prophage have been shown to do (Luo et al., FEMS Microbiology Letters 208:47-51). The specific ends of the prophage genome could not be identified, and further studies are needed to determine whether the prophage is active and lytic.
The eight insertion sequence (IS) elements in M. smithii's genome (Table 4) range in length from 137 bp (MSM1519) to 1013 bp (MSM0527) and all are ISM1 (family ISNCY) according to ISfinder (Siguier et al., (2006) Nucleic Acids Res 34:D32-6; http://www-is.biotoul.fr/). ISM1 is a mobile IS element (Hamilton and Reeve (1985) Molecular Genetics and Genomics 200:47-59). IS elements promote genome evolution and plasticity through recombination, gene loss and, potentially, lateral gene transfer (Brugger et al., (2002) FEMS Microbiol Lett 206:131-41).
M. smithii PS contains 60 predicted transcriptional regulators, including homologs of known nutrient sensors [e.g., a HypF family member (maturation of hydrogenases), a PhoU family member (phosphate metabolism), and a NikR family member (nickel)], plus five regulators of amino acid metabolism (Table 3). However, several GO categories related to environmental sensing and regulation (e.g., two-component systems; GO:0000160) are significantly depleted in its proteome compared to the proteomes of methanogens that live in terrestrial or aquatic environments (Table 6). In contrast, B. thetaiotaomicron, which uses complex, structurally diversified glycans as its principal nutrient source, possesses a large and diverse arsenal of nutrient sensors including 32 hybrid two-component systems plus 50 ECF-type sigma factors and 25 anti-sigma factors (Sonnenburg et al, (2006) PNAS 103:8834-9; Xu et al., (2003) Science 299:2074-6). This relative paucity of nutrient sensors may reflect the fact that M. smithii's niche is restricted, and its nutrient substrates are relatively small, readily diffusible molecules that may not require extensive machinery for their recognition.
In humans, cholic and chenodeoxycholic acids are synthesized in the liver and during their enterohepatic circulation undergo transformation by the intestinal microbiota to an array of metabolites (Hylemon and Harder (1998) FEMS Microbiol Rev 22:475-88). Bile acids and their metabolites have microbicidal activity and a genetically engineered deficiency of the bile acid-activated nuclear receptor FXR leads to reduced bile acid pools and bacterial overgrowth (Inagaki et al., (2006) PNAS 103:3920-5). Both M. smithii and M. stadtmanae encode a sodium:bile acid symporter (MSM1078), a conjugated bile acid hydrolase (CBAH; MSM0986), a short chain dehydrogenase with homology to a 7α-hydroxysteroid dehydrogenase (MSM0021). This is consistent with in vitro studies of M. smithii that demonstrate it is not inhibited by 0.1% deoxycholic acid (Miller et al, (1982) Appl Environ Microbiol 43:227-32).
We compared the proteome of M. smithii with the proteomes of (i) Methanosphaera stadtmanae, a methanogenic Euryarchaeote that is a minor and inconsistent member of the human gut microbiota (Eckburg et al., (2005) Science 308:1635-38), (ii) nine ‘non-gut methanogens’ recovered from microbial communities in the environment, and (iii) these non-gut methanogens plus an additional 17 sequenced Archaea (‘all archaea’) (Table 5).
Compared to non-gut methanogens and/or all archaea, M. smithii and M. stadtmanae are significantly enriched (binomial test, p<0.01) for genes assigned to GO (gene ontology) categories involved in surface variation (e.g., cell wall organization and biogenesis, see below), defense (e.g., multi-drug efflux/transport), and processing of bacteria-derived metabolites (Tables 6 and 7).
The M. smithii and M. stadtmanae genomes exhibit limited global synteny (
The ability to vary capsular polysaccharide surface structures in vivo by altering expression of glycosyltransferases (GTs) is a feature shared among sequenced bacterial species that are prominent in the distal human gut microbiota (Sonnenburg et al., (2005) Science 307:1955-59; Sonnenburg et al., (2006) PNAS 103:8834-39; Mazmanian et al., (2005) Cell 122:107-118; Coyne et al., (2005) Science 307:1778-81). Transmission EM studies of M. smithii harvested from gnotobiotic mice after a 14 day colonization revealed that it too has a prominent capsule (
Sialic acids are a family of nine-carbon sugars that are abundantly represented in human mucus- and epithelial cell surface-associated glycans (Vimr et al., (2004) Microbiol Mol Biol Rev 68:132-53). N-acetylneuraminic acid (Neu5Ac) is the predominant type of sialic acid found in our species. Unique among sequenced archaea, M. smithii has a cluster of genes (MSM1535-1540) that encode all enzymes necessary for de novo synthesis of sialic acid from UDP-N-acetylglucosamine (i.e. UDP-GlcNAc epimerase, Neu5Ac synthase, CMP-Neu5Ac synthetase, and a putative polysialtransferase) (
The genomes of both human gut methanogens also encode a novel class of predicted surface proteins that have features similar to bacterial adhesins (48 members in M. smithii and 37 in M. stadtmanae). A phylogenetic analysis indicated that each methanogen has a specific Glade of these Adhesin-Like Proteins (ALPs;
We conducted qRT-PCR assays of cecal RNAs from the mono- and co-colonized gnotobiotic mice described above. The results revealed one ‘sugar-binding’ ALP (MSM1305) that was significantly upregulated in the presence of B. thetaiotaomicron, four that were suppressed (including one with a GAG binding domain), and two that exhibited no statistically significant alterations (
Compared to other sequenced non-gut associated methanogens, M. smithii has significant enrichment of genes involved in utilization of CO2, H2 and formate for methanogenesis (GO:0015948; Table 6). They include genes that encode proteins involved in synthesis of vitamin cofactors used by enzymes in the methanogenesis pathway [methyl group carriers (F430 and corrinoids); riboflavin (precursor for F430 biosynthesis); and coenzyme M synthase (involved in the terminal step of methanogenesis)] (see Table 7 for a list of these genes, and
Our previous qRT-PCR and mass spectrometry studies revealed that co-colonization increased B. thetaiotaomicron acetate production [acetate kinase (BT3963) 9-fold upregulated vs. B. thetaiotaomicron-mono-associated controls; P<0.0005; n=4-5 animals/group (Samuel and Gordon (2006) PNAS 103:10011-10016)]. Although acetate is not converted to methane by M. smithii (Miller et al., (1982) Appl. Environ. Microbiol. 43:227-32), we found that its proteome contains an ‘incomplete reductive TCA cycle’ that would allow it to assimilate acetate [Acs (acetyl-CoA synthase, MSM0330), Por (pyruvate:ferredoxin oxidoreductase, MSM0560), Pyc (pyruvate carboxylase, MSM0765), Mdh (malate dehydrogenase, MSM1040), Fum (fumarate hydratase, MSM0477, MSM0563, MSM0769, MSM0929), Sdh (succinate dehydrogenase, MSM1258), Suc (succinyl-CoA synthetase, MSMO228, MSM0924), and Kor (2-oxoglutarate synthase, MSM0925-8) in
M. smithii also possesses enzymes that in other methanogens facilitate utilization of two other products of bacterial fermentation, methanol and ethanol (Fricke et al, J Bacteriol 188:642-58; Berk et al., (1997) Arch Microbiol 168:396-402). qRT-PCR assays showed that co-colonization significantly increased expression of a methanol:cobalamin methyltransferase (MtaB, MSM0515), an NADP-dependent alcohol dehydrogenase (Adh, MSM1381), and an F420-dependent NADP reductase (Fno, MSM0049) [2.4±0.3, 2.3±0.4 and 3.7±0.4 fold vs. mono-associated controls, respectively; p<0.01; see
Collectively, these findings indicate that M. smithii supports methanogenic and non-methanogenic removal of diverse bacterial end-products of fermentation: this capacity may endow it with a great flexibility to form syntrophic relationships with a broad range of bacterial members of the distal human gut microbiota.
Subject metabolism of amino acids by glutaminases associated with the intestinal mucosa (Wallace (1996) J Nutr 126:1326 S), or deamination of amino acids during bacterial degradation of dietary proteins yields ammonia (Cabello et al., (2004) Microbiology 150:3527-46). The M. smithii proteome contains a transporter for ammonium (AmtB; MSMO234) plus two routes for its assimilation: (i) the ATP-utilizing glutamine synthetase-glutamate synthase pathway which has a high affinity for ammonium and thus is advantageous under nitrogen-limited conditions; and (ii) the ATP-independent glutamate dehydrogenase pathway which has a lower affinity for ammonium (Dumitru et al., (2003) Appl. Environ. Microbiol. 69:7236-41).
Microanalytic biochemical assays revealed a ratio of glutamine to 2-oxoglutarate concentration that was 32-fold lower in the ceca of co-colonized gnotobiotic mice compared to animals colonized with M. smithii alone, and 5-fold lower compared to B. thetaiotaomicron mono-associated subjects (p<0.0001;
Manipulation of the representation of M. smithii in our gut microbiota could provide a novel means for treating obesity. Functional genomics studies in gnotobiotic mice illustrate one way to approach the issue. For example, inhibitors exist for several M. smithii enzymes. A class of N-substituted derivatives of para-aminobenzoic acid (pABA) interfere with methanogenesis by competitively inhibiting ribofuranosylaminobenzene 5′-phosphate synthase [RfaS; MSM0848; (Dumitru et al., (2003) Appl. Environ. Microbiol. 69:7236-41)]. As noted above, this enzyme, which participates in the first committed step in synthesis of methanopterin, is upregulated with co-colonization (4.6±0.9 fold versus mono-associated controls; p<0.01;
Archaeal membrane lipids, unlike bacterial lipids, contain ether-linkages. A key enzyme in the biosynthesis of archaeal lipids is hydroxymethylglutaryl (HMG)-CoA reductase (MSMO227), which catalyzes the formation of mevalonate, a precursor for membrane (isoprenoid) biosynthesis (23). HMG-CoA reductase inhibitors (statins) inhibit growth of Methanobrevibacter species in vitro (23). qRT-PCR revealed that MSMO227 is expressed at high levels in vivo in the presence or absence of B. thetaiotaomicron (P>0.05; Table 11).
We designed a custom GeneChip containing probesets directed against 99.1% of M. smithii's 1795 known and predicted protein-coding genes (see Table 12 for details). This GeneChip was used to perform whole genome genotyping of M. smithii PS (control) plus three other strains recovered from the feces of healthy humans: F1 (DSMZ 2374), ALI (DSMZ 2375) and B181 (DSMZ 11975). Replicate hybridizations indicated that 100% of the open reading frames (ORFs) represented on the GeneChip were detected in M. smithii PS, while 90-94% were detected in the other strains, including the potential drug targets mentioned above (Table 2 and
To further assess the degree of nucleotide sequence divergence among M. smithii strains, we compared the sequenced PS type strain to a 78 Mb metagenomic dataset generated from the aggregate fecal microbial community genome (microbiome) of two healthy humans (Gill et al., (2006) Science 312:1355-59). Their sequenced microbiomes contained 92% of the ORFs in the type strain (Table 2), including the potential drug targets described above. Several R-M system gene clusters (MSM0157-8, MSM1743, MSM1746-7), a number of transposases, a DNA repair gene cluster (MSM0689-95), and all ORFs in the prophage were not evident in the two microbiomes. Sequence divergence was also observed in 33 of the 48 ALP genes plus two ‘surface variation’ gene clusters (MSM1289-1398 and MSM1590-1616) that encode 11 glycosyltransferases and 9 proteins involved in pseudomurein cell wall biosynthesis (
The PHAT system was used to culture 4 strains of M. smithii (DSMZ 861 (PS), 2374 (F1), 2375 (ALI) and 11975 (B181)) in 96-well plate format, and to test their sensitivities to various HMG-CoA reductase inhibitors. Preliminary results indicate that atorvastatin (Lipitor®), pravastatin (Pravachol®) and rosuvastatin (Crestor®) inhibit all strains tested at concentrations of 1 millimolar. Atorvastatin and rosuvastatin also inhibit all strains at 100 micromolar concentrations (
Methanobrevibacter
Methanosphaera
Methanothermobacter
smithii
stadtmanae
thermoautotrophicus
1GeneChip-based genotyping of M. smithii strains done in duplicate; ‘present’ or ‘absent’ calls were determined using a perfect match/mismatch (PM/MM) model in dChip (see Methods). Note that the term ‘absent’ is based on different criteria than those used for the human microbiome dataset (see footnote 2).
2Metagenomic datasets from the microbiomes of two healthy lean adults (Gill et al., 2006) were tested for identity of M. smithii PS ORTs; ORFs with reads that matched with >95% identity are called ‘present,’ 80-95% identity are called ‘divergent’, and <80% identity are called ‘absent’.
iiProbeset for M. smithii gene not represented on GeneChip.
Methanobrevibacter smithii PS (ATCC 35021)
Methanosphaera stadtmanae DSM 3091
Methanothermobacter thermautotrophicus
Methanocaldococcus jannaschii DSM 2661
Methanococcoides burtonii DSM 6242
Methanococcus maripaludis S2
Methanopyrus kandleri AV19
Methanosarcina acetivorans C2A
Methanosarcina barkeri str. Fusaro
Methanosarcina mazei Go1
Methanospirillum hungatei JF-1
Aeropyrum pernix K1
Archaeoglobus fulgidus DSM 4304
Haloarcula marismortui ATCC 43049
Halobacterium sp. NRC-1
Nanoarchaeum equitans Kin4-M
Natronomonas pharaonis DSM 2160
Picrophilus torridus DSM 9790
Pyrobaculum aerophilum str. IM2
Pyrococcus abyssi GE5
Pyrococcus furiosus DSM 3638
Pyrococcus horikoshii OT3
Sulfolobus acidocaldarius DSM 639
Sulfolobus solfataricus P2
Sulfolobus tokodaii str. 7
Thermococcus kodakarensis KOD1
Thermoplasma acidophilum DSM 1728
Thermoplasma volcanium GSS1
M. smithii and M. stadtmanae proteomes compared to the
Abbreviations: ‘non-gut-associated methanogens’ (Meth) or ‘all Archaea’ (Arch) [see SI Table 5]; No., number of genes associated with gene ontology (GO)
M. smithii genes in the significantly enriched GO categories listed in Table 6
M. smithii proteins with homologs in other sequenced Methanobacteriales
Methanothermobacter
M. smithii
Methanosphaera stadtmanae
thermoautotrophicus
M. smithii gene(s)
M. smithii
M. stadtmanae
1
M. smithii gene expression in vivo in the presence of B. thetaiotaomicron vs. alone
1Predictions completed using NetNGlyc and NetOglyc (htt://www.cbs.dtu.dk/services/).
2InterPro domains: Invasin/intimin cell-adhesion (PR008964); Bacterial lg-like (IPR003344); pectin lyase fold (IPR011050); GAGlyase,Chondroitinase B-type (IPR12333); Polymorphic membrane protein, Chlamydia (IPR03368); Parallel beta-helix repeat (IPR006626); Peptidase S8 and S53 (IPR000209); Penicillin-binding protein, transpeptidase fold (IPR012338); Carboxypeptidase regulatory region (IPR008969)
M. smithii GeneChip
1Note that the M. smithii genome contains three 5S rRNA genes, one 7S rRNA gene, two 16S rRNA genes, and two 23S rRNA genes. Due to the high nucleotide sequence identity among rRNA genes of a given type, each is represented by a single probeset (the 16S rRNA probeset is replicated four times on the GeneChip
M. smithii
M. smithii strain PS treated with varying concentrations of statins
M. smithii strain F1 treated with varying concentrations of statins
M. smithii strain ALI treated with varying concentrations of statins
M. smithii strain B181 treated with
Isolation and Culturing of M. smithii from Human Fecal Samples
Two gallon stainless steel paint canisters (Binks; catalog number 83S-210) were modified for incubation of plates at 37° C. in an oxygenfree mixture of 20% CO2/80% H2 at a pressure of 15 psi. Canisters contained a heating element (Electro-Flex Pail Heaters) regulated by a custom designed controller consisting of a 16A2120 temperature/process control (Love Controls; Dwyer Instruments), a resistance temperature detector probe to measure the internal tank temperature, and several safety features to prevent overheating or burns. Pressure in each tank was measured and recorded with a digital manometer (LEO record; Omni Instruments). The apparatus was housed inside an anaerobic chamber (COY Labs). All human fecal samples used in this study were obtained by using protocols approved by the Washington University Human Research Protection Office and its constituent review committees. All samples were deidentified and assigned codes as described in a previous publication (65): Information about the age and BMI of the donors can also be found in this publication. All samples were frozen at −20° C. within 30 min after they had been produced by donors; they were then placed in a standard −80° C. freezer no more than 24 h later and stored at this temperature for at least 1 yr prior to their use in the present study. An ≈2-g aliquot of a given frozen fecal sample was thawed (inside of the Coy anaerobic Chamber) and serially diluted in modified MBC medium (66) within the anaerobic chamber. Aliquots of serial dilutions (10−2 to 10−8) were transferred to 14 mL of MBC supplemented with 5% rumen fluid, 10 μg/mL erythromycin, 1 μg/mL ampicillin, 10 μg/mL vancomycin and 10 mg/mL amphotericin B. The mixture was introduced into 125-mL serum bottles (Bellco Glass). These enrichment cultures were incubated under a fully deoxygenated atmosphere of 20% CO2/80% H2 (30 psi of pressure) at 37° C. After at least 7 d, aliquots were plated onto MBC noble agar and the plates were incubated in the custom pressurized tanks described above for colony isolation. In parallel, the same serial dilutions were spread directly onto MBC noble agar plates with antibiotics. All plates were incubated under an atmosphere of 20% CO2/80% H2 (15 psi of pressure) in our custom PHAT (Pressurized Heated Anaerobic Tank) system at 37° C. Colonies were picked and screened by PCR of their 16S rRNA genes by using bacterial primers 8F (5′-AGAGTTTGATCCTGGCTCAG-3′) and 1391R (5′-GACGGGCGGTGWGTRCA-3′) and archaeal primers 571aF (5′-GCYTAAAGSRICCGTAGC-3′) and 958aR (5′-YCCGGCGTTGAMTCCAATT-3′). Amplicons generated from archaeal-directed primers were sequenced using the method of Sanger (Retrogen).
Pure isolates were then cultured anaerobically in MBC medium in a fully deoxygenated atmosphere of 20% CO2/80% H2 (30 psi of pressure) at 37° C. Cells were harvested by centrifugation, and DNA was isolated by phenol-chloroform and ethanol precipitation, as described (50). The purity of each DNA preparation was verified by gel electrophoresis.
qPCR Assay of mcrA in Human Fecal Samples
Frozen fecal samples were pulverized by manual grinding under liquid nitrogen, and crude DNA was isolated by bead beating and phenol/chloroform extraction. The Qiagen Blood and Tissue kit was used to clean up the crude DNA and remove RNA and protein. Twenty nanograms of purified community DNA was amplified by using an Mx3000 real-time PCR system (Stratagene) in 25-μL reaction mixtures containing SYBR-green and 0.8 μM McrA_MLf/r primers (5′-GGTGGTGTMGGATTCACACARTAYGCWACAGC-3′ and 5′-TTCATTGCRTAGTTWGGRTAGTT-3′; ref. 14), which amplified a ≈450-bp region of mcrA. Cycling conditions were as follows: 40 cycles of denaturing at 94° C. for 45 s, annealing 56° C. for 45 s, extension 72° C. for 30 s, with collections at 79-81° C. A subsequent dissociation curve was used to examine the homogeneity of amplicons, to detect the presence of primer dimers, and to determine the appropriate collection temperature.
A standard curve was constructed with purified M. smithii gDNA at concentrations ranging from 0.01 ng to 10 ng and used to define the concentration of mcrA DNA in each of the fecal DNA samples. Based on the known genome size of M. smithii PS, we expressed the data as number of genome equivalents (GE) per ng of total fecal DNA. Samples that only produced detectable amplification after 37 cycles of PCR were scored as “negative,” as were samples having <40 GE per ng of DNA. Data were not normally distributed; therefore, a log-base 10 transformation was performed.
A subset of samples was selected for amplicon sequencing to determine the identity and diversity of mcrA sequences amplified by these primers, and whether archaeal DNA was present in these samples that was not found by our mcrA-based primers. The latter was determined by using PCR primers directed at archaeal 16S rRNA genes [571aF (5′-GCYTAAAGSRICCGTAGC-3′; ref. 63) and 958aR (5′-YCCGGCGTTGAMTCCAATT-3′; ref. 64)] and the following cycling conditions; 30 cycles of denaturing at 94° C. for 2 min, annealing at 65° C. for 45 s, and extension at 72° C. for 2 min. Amplicons were sequenced using the method of Sanger (Retrogen).
Methanobrevibacter smithii strain PS (ATCC 35061) was grown as described above for 6d at 37° C. DNA was recovered from harvested cell pellets using the QIAGEN Genomic DNA Isolation kit with mutanolysin (1 unit/mg wet weight cell pellet; Sigma) added to facilitate lysis of the microbe. An ABI 3730xl instrument was used for paired end-sequencing of inserts in a plasmid library (average insert size 5 Kb; 42,823 reads; 11.6×-fold coverage), and a fosmid library (average insert size of 40 Kb; 7,913 reads; 0.6×-fold coverage). Phrap and PCAP (Huang et al. (2003) Genome Res 13:2164-70) were used to assemble the reads. A primer-walking approach was used to fill-in sequence gaps. Physical gaps and regions of poor quality (as defined by Consed; Gordon et al., (1998) Genome Res. 8, 195-202) were resolved by PCR-based re-sequencing. The assembly's integrity and accuracy was verified by clone constraints. Regions containing insufficient coverage or ambiguous assemblies were resolved by sequencing spanning fosmids. Sequence inversions were identified based on inconsistency of constraints for a fraction of read pairs in those regions. The final assembly consisted of 12.6× sequence coverage with a Phred base quality value ≧40. Open-reading frames (ORFs) were identified and annotated as described below.
For each gene call, compositional statistics were calculated by using the PyCogent code base (67). The statistics included the GC content at each position, three versions of the dinucleotide use (overlapping, nonoverlapping, or “3-1”), all K-words ranging from length 1 through 6, and codon use (Table 20 and 21). For each M. smithii strain, the composition of each gene was compared against (i) the composition of the genome as a whole and (ii) the composition of highly expressed genes. Genes that mapped to the KEGG orthology (KO) groups for ribosomal proteins were used to calculate the highly expressed test set. The gene and control vectors were compared using either the G-test statistic or Pearson correlation.
The significance of the results was calculated in two ways; first, the Bonferroni corrected P value was calculated for the G-test; second, because the distribution of compositional counts may violate normality, the method of picking significance thresholds based on the rank order of gene scores of Tsirigos et al. (57) was employed.
Because highly expressed genes frequently possess unusual gene compositions, gene transfer was predicted only in cases where the gene did not match the whole-genome model, and the gene also did not match the highly expressed model. Annotated tRNAs and rRNAs were also excluded from the analysis.
Phylogenetic confirmation of gene transfers predicted by compositional means was performed using the RIATA-HGT program of PhyloNet version 1.7 (68). We obtained all available gene sequences for all KO groups that contained one or more M. smithii genes. Annotations for gene family level KEGG assignments were obtained by blasting each protein sequence against version 54 of the KEGG database. The best hit with a KEGG assignment was taken. Multiple assignments were given if the best hit had more than one annotation.
Python scripts were used to generate separate FASTA files for each orthology group containing the amino acid sequences for M. smithii and KEGG proteins. All sequences for each orthology group were then separately aligned in MUSCLE (69) by using maxiters=4, and gene trees for each group were constructed in FASTTREE (70).
PhyloNet requires that no paralogs be present on protein trees. Therefore, multiple members of a KO present in a single KEGG genome were reduced to a single copy by removing sequences that produced the longest branches on the resulting phylogenetic tree. However, for M. smithii genes, we wanted to ensure that the process of paralog resolution did not prevent detection of possible xenologs (extra gene copies introduced by gene transfer). Therefore, all M. smithii genes were retained in each gene tree in the analysis. The species tree used consisted of the KEGG 16S rRNA sequences for each lineage in the tree, gathered by BLAST against the E. coli rrsG gene, and alignment in PyNAST. The location of “msi,” the M. smithii strain present in KEGG, was taken as the tree position for all M. smithii.
Because all multiple copies of gene family members were retained in M. smithii genomes, it was necessary to introduce an artificial polytomy into the species tree at the location of msi, with one tip for each paralog/strain combination. This approach is identical to separately running each gene copy, but is computationally more tractable because it avoids reinferring all transfers not involving M. smithii across the rest of the tree many times.
M. smithii strains were grown in standard MBC medium containing 2.8 or 44.1 mM formate. Medium was prepared anaerobically and aliquoted into 125-mL serum bottles, which were sealed and autoclaved. Triplicate cultures of each strain and condition were grown at 37° C. with agitation (100 rpm), in serum bottles containing 21 mL of medium plus 0.5 mL of 2.5% Na2S, under an atmosphere of 80% H2 and 20% CO2 that was replenished every 6 h to a pressure of 30 psi. Seven milliliters of the culture were harvested at 36 h (
M. smithii 16S and 23s rRNAs.
Comparison of RNA-Seq and Custom Affymetrix M. smithii GeneChips
RNA from four samples of M. smithii PS (two replicates at each formate concentration) were split into aliquots for subsequent GeneChip target preparation, or for rRNA depletion and RNASeq. Nearly 106 million 36-nt Illumina GA-IIx reads were generated from the 4 samples (each sample run on a single lane of the eight-lane flow cell): 7.2 million of these reads mapped to coding regions (6.9%), whereas the remaining reads mapped to rRNA genes or other noncoding regions of the genome. Tables 20-31 were also generated for each replicate sample by using custom M. smithii GeneChips that have been described in an earlier report (50). GeneChip data were processed (see ref. 50 for details), and the resulting datasets were compared with RNA-Seq data (counts per million reads, normalized for gene length). The results obtained with each type of data were highly similar: Pearson's correlation r2 values ranged from 0.86 to 0.89 for each replicate (P<2e−16;
Analyses of familial concordance or correlation for methanogen carriage or levels, and of their associations with overweight/obesity, were conducted by using logistic or linear regression, a robust variance estimator to adjust for the nonindependence of observations on family members.
A putative rearrangement was discovered in the M. smithii PS type strain by aligning draft assemblies of other strains using Mauve (49). This putative rearrangement is further evidenced by flanking transposases (Msm1419, Msm0730). When the type strain was first sequenced (50), a large number of genes predicted to be involved in genome evolution was noted: restriction modification systems, transposases, recombinases, and insertion sequence (IS) elements. IS finder (www-is.biotoul.fr) was able to detect matches to a known M. smithii IS element, ISM1, which is a member of the ISNCY family, and no other significant matches. However, the number of matches varied between strains quite considerably (Table 23).
A recent metagenomic study of the fecal viromes of adult female MZ twins showed that viromes are unique to individuals regardless of their degree of genetic relatedness. Intrapersonal diversity is very low with >95% of virotypes retained over a 1-yr period. Moreover, an individual's virome is dominated by a few temperate phage that exhibit remarkable genetic stability. These results indicated that a predatory Lotka-Volterra (LV)/Kill-the-Winner dynamic manifest in a number of other characterized environmental ecosystems is notably absent in the distal intestine where a more temperate phage lifestyle is evident (51). Therefore, it was of interest to characterize phage diversity in M. smithii as a function of host and family.
Prophages were detected by PhageFinder (52) in 7 of the 20 strains, including 4 of the 5 strains isolating one of the dizygotic twins (TS146), one strain from her co-twin (TS145), and two strains from their mother (TS147). When prophage sequences were blasted against the other strains, prophages were identified in two more strains, one from the mother of the MZ twins (METSMITS96C), and another from TS145 (METSMITS145A) (Table 23).
To identify regions of variation within these prophage, raw 454 Titanium reads for each strain were aligned (nucmer; ref. 53) to the prophage sequence of the PS type strain (coordinates 1705364:1736208). The results were plotted with Mummer (53) and overlayed to create a single plot with the PS type strain prophage gene calls displayed (
A quantitative PCR (qPCR) assay of the mcrA gene was used to measure methanogens present in single fecal samples collected from 40 female MZ and 28 adult female DZ twin pairs (age 21-31 y). All were born in Missouri, although at the time they provided samples, only 29% were living in the same home and some lived >800 km apart (2). Based on a health questionnaire, all were healthy and none had a history of gastrointestinal disease including irritable bowel syndrome. Sixty-one percent were obese (BMI 30) and 7% overweight (BMI 25-30) at the time of sampling (2).
Thirty-two of the 136 individuals (23%) had levels of methanogens above our threshold for confidently calling the fecal sample “positive” (i.e., ≧4×107 genome equivalents per mg of total fecal DNA), and this proportion did not vary significantly by zygosity group (P=0.59). The MZ twin pair concordance rate for carriage of methanogens was 74%, a value significantly higher than the DZ pair concordance rate (15%; P=0.009 by Breslow-Day test). In addition, there was a significantly higher degree of correlation of methanogen levels between MZ pairs by linear regression (r2=0.43, P<0.0001) than DZ pairs (r2=0.04, P=0.32), (
Thirteen samples from the initial timepoint representing 4 MZ twin pairs, 1 DZ twin pair, plus 3 other unrelated individuals that were positive for mcrA were chosen for sequencing of amplicons generated by using the mcrA primers and previously described archaeal 16S rRNA primers (n=5-10 amplicon subclones/primer set/fecal DNA sample). In 12 of the 13 samples, M. smithii was the only sequence detected by mcrA or 16S rRNA-directed PCR. In one MZ co-twin (TS17 in, Tables 24 and 25), 2 of 6 16S rRNA amplicons and 2 of 8 mcrA amplicons matched to Methanosphaera stadtmanae, a mesophilic euryarchaeota known to be present in the gut microbiota of some humans (19); the remaining amplicons generated from her fecal DNA matched to M. smithii. Her co-twin (TS16) had no detectable methanogens.
Fecal samples from 51 mothers in this study were also examined for presence of methanogens and found a similar overall degree of methanogen carriage in this population as found in their daughters (31% and 25%, respectively). Concordance for carriage of methanogens between mother and daughter (i.e., the probability that the daughter of a methanogen carrier was also a carrier, 32%) was nonsignificant (P=0.33).
3.163
2.542
3.053
M. smithii
3.293
3.408
3.901
M. smithii
3.243
M. smithii and
M. stadtmanae
3.053
2.751
2.781
3.790
3.344
3.012
M. smithii
3.132
M. smithii
3.065
2.830
3.002
2.815
3.270
3.083
2.119
3.109
3.076
2.120
1.894
2.995
3.212
3.159
M. smithii
2.793
2.442
M. smithii
1.930
1.622
3.215
0.685
2.381
2.860
2.893
3.150
5.718
M. smithii
4.761
1.890
2.044
M. smithii
M. smithii
2.655
M. smithii
3.004
3.388
M. smithii
3.107
M. smithii
2.955
2.282
2.430
1.996
2.727
The qPCR results suggest that host genetic factors, including factors that influence the representation of potential syntrophic partners, may play a role in carriage of methanogens. In contrast, the study of Florin et al. (17), which used methane breath tests, showed no significant differences in concordance between young adolescent Australian MZ and DZ twin pairs. The difference could be explained if environmental factors play a dominant role in determining whether methanogens are acquired early in life, whereas persistent carriage in later life is determined by a variety of host factors. Such factors range from human genotype to the presence or absence of bacterial taxa that can collaborate or compete with the methanogens.
A role for host factors in determining carriage of methanogens is supported by previous studies of nonhuman primates. Methanogens were present in the gut microbiota of some primate phylogenetic lineages but not others; however, these patterns did not follow any identifiable features of gut physiology or morphology, nor behavior or diet (20). Another study that examined the distribution of methanogens within the guts of 253 vertebrate species found “methanogenic branches” of the host phylogenetic tree [i.e., branches containing ruminants (bovidae, cervidae, giraffidae) and “nonmethanogenic” branches (felidae, canidae, and ursidae)]. As with the primate study, the methane-producing groups could not be distinguished from the methane-negative groups based on their diets or features of their gut structure/physiology (21).
To understand whether methanogen carriage might be determined, in part, by the presence or absence of bacterial taxa that can collaborate or compete with the methanogens, the co-occurrence patterns between methanogens and sulfate-reducing bacteria (SRB) was investigated. SRB, which can use H2 as an electron donor to generate hydrogen sulfide (H2S) through anaerobic sulfate respiration, may show positive associations with methanogens if a hydrogen economy is more important in some individuals than others, or negative associations due to competition for H2. Positive associations between SRB and methanogens might also occur because of syntrophy, because some methanogens and SRB can grow syntrophically on lactate, with the methanogen removing H2 generated by the SRB (22, 23). Therefore, it was determined whether SRB and methanogens had nonrandom codistribution patterns by SRB-directed qPCR assays of 87 fecal samples from the MZ and DZ twin pairs. The aps gene encodes adenosine-5′-phosphosulfate reductase, a key enzyme that catalyzes activation and then reduction of sulfate to sulfite (24). We chose aps as a target for a qPCR assay that used previously described and validated primers (25). Forty-five percent of the samples were positive for SRB (threshold of detection defined as ≈4×107 genome equivalents per mg of fecal DNA). The concordance rate for sulfate reducers was not significant for either MZ or DZ co-twins (31% and 27%, Tables 24 and 25). A logistic regression was performed to determine whether a higher level of mcrA is predictive of the presence of aps or vice versa. No statistically significant relationship was identified in either comparison (P=0.10 and 0.07).
A general search for bacterial Operational Taxonomic Units (OTUs) that had positive or negative associations with M. smithii was also performed, using sequences generated from multiplex pyrosequencing of the V2 variable region of bacterial 16S rRNA genes from these same fecal samples (2). The raw sequences from this prior study were now processed by using the PyroNoise algorithm to remove sequencing noise (26), as implemented in QIIME (27). Using UCLUST (28), the denoised sequences were further divided into OTUs that each shared ≧96% nucleotide sequence identity (a value slightly more permissive than the 97% ID threshold typically used to denote a microbial species). The most abundant sequence within each of the resulting 12,833 OTUs was then selected as a representative of that OTU. Because some of the individuals in the study were sampled multiple times, one sample per individual was randomly selected. For each of the 607 OTUs that were found in at least 10 of the samples for which there was mcrA qPCR data, an ANOVA was performed to determine whether the OTU relative abundance was significantly different in methanogen-positive and -negative individuals. Associated presence/absence patterns were also checked for by using the G-test of independence (an OTU was scored as present if it was observed one or more times). The resulting P values were corrected for multiple comparisons by using the Bonferroni correction (multiplied by 607; the number of comparisons) and the false discovery rate (FDR) method (multiplied by the number of comparisons divided by the P value rank).
Twenty-two OTUs had significantly different relative abundances in mcrA-positive versus negative individuals (P<0.05 using ANOVA with the FDR correction). Of these 22 OTUs, 21 were more abundant in samples where methanogens were present, whereas one OTU was less abundant. The G-test identified five significant OTUs (P<0.05 with FDR correction), and 4 of these 5 were also significant as judged by ANOVA. All G-test-identified associations were positive. Thus, the two statistical tests together identified 22 positively associated OTUs (Table 26) and one negatively associated OTU.
To investigate the phylogenetic relationships of these OTUs to each other, and to bacterial isolates and lineages with known biological properties, parsimony insertion was used to add a representative sequence for each significant OTU into the Greengenes coreset tree (29) in the Arb software package (30). Because the closest relatives of the OTUs were mostly from other culture-independent metagenomic studies, 16S rRNA sequences were also inserted into the tree that were from well-characterized bacteria, including 16S rRNAs from fully sequenced genomes deposited in KEGG or sequenced through the Human Gut Microbiome Initiative (HGMI; http://genome.wustl.edu/genomes/list/human_gut_microbiome/), and 16S rRNA sequences from related organisms with known properties that were identified by using BLAST searches against the National Center for Biotechnology Information nonredundant database. To look for evidence of whether relatives of the OTUs were capable of growing in pure culture, the 16S rRNA sequences were also BLASTed against sequences in the RDP (31) that were marked as being from cultured bacterial isolates.
Remarkably, 20 of the 22 positively associated OTUs were members of the class Clostridiales (Firmicutes phylum). These 20 OTUs binned into five broad groups that were scattered throughout the class, including members of the three main clusters found in the human gut (clusters I, IV, and XIVa).
The group most positively associated with M. smithii was a lineage within Clostridia cluster IV that contains members of the genera Oscillospira and Sporobacter (Table 26; note that this group had the four most significant OTUs according to the ANOVA test). Two of these OTUs are highly related to Oscillospira guilliermondii, an as yet uncultured, large, and morphologically conspicuous organism found in ruminants (32, 33). The most closely related cultured isolate that we could find for any of these OTUs is Sporobacter termitidis, a hydrogen-consuming acetogen from the termite gut (34).
Two of the positively associated OTUs are members of Clostridia cluster
XIVa. The closest isolate with a sequenced genome was Blautia hydrogenotrophica, a hydrogen-consuming homoacetogen from the human gut, although the percent identity across the lanemasked V2 region was low (89-93%) and more closely related organisms to B. hydrogenotrophica are known not to be acetogens. Whether the Sporobacter and B. hydrogenotrophica-related OTUs are acetogens cannot be determined by using 16S rRNA sequences alone, because acetogenesis is only inconsistently associated with 16S rRNA-defined phylotypes (35). However, the relationship suggests that some OTUs may co-occur with methanogens because they are homoacetogens and have a shared preference for hydrogen. Nonetheless, the OTU most related to B. hydrogenotrophica in this analysis (99% ID) did not show significant co-occurrence with M. smithii (uncorrected P value=0.38), indicating that not all homoacetogens in the human co-occur with M. smithii because of this preference for hydrogen.
Because members of the SRB can produce and consume H2, OTUs in the dataset that were in this group were of specific interest. Eighty-two of 281 fecal samples (29%) from the 16S rRNA analysis of these twin pairs (including additional fecal samples for which we did not obtain mcrA data) (2) had OTUs that were within the SRB Glade (
The concentration of H2 in the gut lumen can vary over a wide range in healthy individuals (from 0.17% to 49% in a study of 11 subjects; ref. 36). Levels of H2 in the distal gut reflect the dynamic interplay between microbial production and consumption. One of the co-occurring groups within the Clostridiales may produce abundant amounts of hydrogen. Specifically, two of the positively associated OTUs in the Clostridiales family mapped to a Glade that included isolate Rennanqilyf3, which was recovered from activated sludge by using a procedure designed to retrieve bacteria with particularly high yields of hydrogen (37). This isolate performs ethanol-type fermentation with glucose as an optimal carbon source for hydrogen production; however, its hydrogen production capacity varies with hydrogen concentration and pH. Thus, methanogen (M. smithii) abundance may be in part regulated by the presence of bacterial lineages that are efficient hydrogen producers. To our knowledge, no cultured isolates are available for members of this lineage from the gut.
Some of the OTUs that are positively associated with methanogens are quite distant from any cultured relatives (ribotypes): This observation is intriguing, because it suggests that syntrophic relationships may inhibit them from growing in monoculture. For example, four OTUs grouped in a Glade of the Clostridiales family that is dominated by relatives identified in culture-independent studies of cellulose-degrading gut environments where methanogens also reside (e.g., termite gut and cow rumen) (Gut Clone Group; Table 26 and
Unfortunately, the lack of cultured relatives for these OTUs limits the ability to more fully interpret the co-occurrence results, because knowledge is lacking about their biological properties. Targeted attempts to culture gut bacteria in the presence of M. smithii as well as targeted attempts to obtain and sequence their genomes from mixed populations should help to elucidate their functional relationships with human gut methanogens.
D. piger (87.4)
D. desulfuricans (90)
Alistipes putridinis (91.6)
Oscillospira guilliermondi
Sporobacter termitidis (89)
Oscillospira guilliermondi
Sporobacter termitidis
Oscillospira guilliermondi
Sporobacter termitidis
Oscillospira guilliermondi
Sporobacter termitidis (93)
Oscillospira guilliermondi
Sporobacter termitidis
Oscillospira guilliermondi
Sporobacter termitidis (88)
Clostridium methylpentosum
Anaerotruncus colihominis
Clostridium methylpentosum
Anaerotruncus colihominis
Catabacter sp. YIT12065
Catabacter sp. YIT12065
Catabacter sp. YIT12065
Clostridium cellulovorans
Clostridium cellulovorans
Clostridium cellulovorans
Clostridium cellulovorans
Blautia hydrogenotrophica
Blautia hydrogenotrophica
Coprococcus eutactus
It was reasoned that one approach for further characterizing factors that affect M. smithii colonization of the human gut would be to develop a method for isolating strains from frozen fecal samples obtained from twins and their mothers, sequencing their genomes, and performing RNA-Seq to evaluate strain-level variations in patterns of gene expression during growth under varying levels of hydrogen and formate.
The method that was developed for recovering M. smithii from frozen fecal samples is described above. A total of 20 strains were isolated from two families: one consisting of a MZ twin pair and their mother and the other a DZ twin pair and their mother (n=2-5 strains isolated and sequenced per individual). Deep draft genome assemblies were generated by using reads produced by Illumina GA-IIx and 454 sequencers. Table 23 describes the details of genome coverage and of the assembly statistics. Assembled genomes were aligned by using Mauve (41), which iteratively reordered contigs based on the finished genome sequence of the M. smithii type strain PS (42). Table 23 also provides information about previously generated, deep draft assemblies of the genomes of two other M. smithii type strains obtained from culture collections (42).
On average, any two strains shared 92.96±6.5% of their single nucleotide polymorphisms (SNPs) [129,112±6,322 (mean±SD)]. A binary table of the presence or absence of a SNP was subsequently generated, a distance matrix was calculated, and a principal components analysis (PCA) was performed (
Genes were identified by using Glimmer (v3.02) trained on contigs >500 bp in each of the 20 sequenced M. smithii isolate genomes, plus the PS type strain and the two other M. smithii isolates we had sequenced. Genes in all 23 genomes were binned by using the program CD-HIT and its default parameters (>90% nucleotide sequence identity over of the length of the shorter gene in each pairwise comparison;
Rarefaction analysis to determine the rate at which sequencing the genes of new strains revealed new OGUs showed that the number of new or unique OGUs identified begins to plateau by the time≈6 strains were sequenced (≈10,000 genes) (
PCA of OGU assignments showed clustering of strains based on family of origin: Strains from MZ family members (TS94-96) generally clustered together, whereas strains from the DZ family (TS145-147) split into two groups (
KEGG was used to assign enzyme commission (EC) numbers to genes in all of the isolates' genomes. A total of 412 ECs were identified: 349 were shared by all strains, 63 were variably represented, and 18 had significant differences in their representation between strains as judged by binomial test (
Genes assigned to COG M (cell envelope biogenesis/outer membrane) were prominently represented in the variable component of the pan-genome (
To better understand genomic differences among M. smithii strains, the M. smithii pan-genome was searched for evidence of horizontal gene transfer (HGT). The results, described below in Example 11 and summarized in Table 27, show that HGT has contributed to both the core and variable elements of the M. smithii pan-genome. They include core genes involved in methanogenesis and folate biosynthesis; e.g., both compositional- and phylogenetic-based methods revealed transfer of genes encoding THMP methyltransferase C subunit (EC 2.1.1.86), formate dehydrogenase (EC 1.2.1.2), and formylmethanofuran dehydrogenase subunit F (E.C. 1.2.99.5) (Table 28). Note that the early steps in synthesis of methanopterin, a C1 carrier coenzyme involved in the methanogenesis pathway (
RNA-Seq was used to profile the transcriptomes of five of the M. smithii isolates: One from each member of the MZ family, one from each of the DZ co-twins, plus the PS type strain. The five strains from the two families were chosen because SNP, OGU, and EC analyses indicated that these isolates were representative of the strains from their human hosts, and because they exhibited consistent patterns of growth on MBC medium containing 2.8 or 44.1 mM formate, a substrate for the first enzyme involved in the methanogenesis pathway, formate dehydrogenase (EC 1.2.1.2 in
Next the phenotypes of strains based on normalized expression of each gene encoding each EC were compared. Examining the gene expression data across functional groups allowed the strains to be compared: The results revealed that no gene family was consistently regulated by formate across all strains. To identify genes significantly regulated by formate in each strain, normalized reads with CyberT were first analyzed. Two criteria were used for determining significance in regulation: a posterior probability of differential expression (PPDE) threshold ≧0.97, and a ≧2-fold difference in expression (either direction) when a given strain was incubated in low versus high levels of formate (Table 31).
All of the genes in the methanogenesis pathway illustrated in
Looking beyond the methanogenesis pathway, none of the genes encoding ECs in the M. smithii pan-genome satisfied our criteria for being responsive to differences in formate levels in the medium at midlog phase in all strains. However, as with components of the methanogenesis pathway, some exhibited strain-specific differences in formate sensitivity e.g., in strain METSMITS145B (from DZ co-twin 1) genes encoding the subunits of MtrH (EC 2.1.1.86; tetrahydromethanopterin S-methyltransferase) were up-regulated in high formate, whereas in strain METSMITS146E (from the sister of DZ co-twin 1) they were down-regulated (see Table 31 for additional examples).
M. smithii uses ammonia as a nitrogen source via an energy-dependent glutamine synthetase-glutamate synthase pathway, which has high affinity for ammonia, and a ATP-independent pathway with lower affinity (
Using the threshold criteria for formate-responsive expression, four of the six strains were defined as having genes that were sensitive to levels of this compound. Table 31 lists the 9 genes present in type strain PS, the 340 genes in the strain recovered from the mother of the DZ co-twins (TS145), the 23 genes in the strain isolated from one of her daughters (TS146), and the 81 genes in the strain from the mother of the MZ twins (TS96). Intriguingly, no genes were identified in strains from MZ twins of this mother (TS94, TS95) that exhibited significant formate responsiveness. The core component of M. smithii's pan-genome contained no genes that met our criteria for formate-responsive behavior in every isolate.
The utility of using formate to identify strain-specific phenotypes is best illustrated by ALPs. As noted above, each sequenced strain contained a distinctive repertoire of genes encoding ALPs, with only 6 ALP OGUs shared by all isolates. ALP OGUs 112, 208, 412, and 827 are encoded by genes present in 4-6 of the strains: None of the genes are formate-responsive but members of each OGU exhibit strain-specific differences in their levels of expression (levels of expression are also notably different between ALP OGUs). OGUs 18, 37, 133, and 226 show strain-specific differences in their representation, strain-specific differences in their levels of expression, plus within-OGU differences in their formate sensitivity (
Chlamydia polymorphic membrane protein
Chlamydia polymorphic membrane protein
Chlamydia polymorphic membrane protein
Chlamydia polymorphic membrane protein
Chlamydia polymorphic membrane protein
To better understand genomic differences among M. smithii strains, HGT was detected by using both compositional and phylogenetic methods. Compositional HGT detection was performed by examining the typicality of dinucleotides, codons, and k-words of lengths 4 and 6. Because highly expressed genes are known to contain unusual compositions, genes were scored for typicality against both a whole-genome compositional model and a model built using ribosomal proteins (55, 56). Only genes found to be below the significance threshold when compared against both models were annotated as transferred. To select significance thresholds for transfer, genes in each genome were ordered from most to least atypical. As reported (57), gene typicality was observed to increase rapidly for the most extreme genes, and then to rise only gradually for the rest of the genome (
Among the compositional measures analyzed, the proportion of genes defined as horizontally transferred ranged from 3.3 to 10.1% in the dataset as a whole. However, because the absolute number of horizontally transferred genes predicted can depend on the compositional measure chosen, the stringency of the thresholds selected, the amount of time that has passed since the transfer occurred, and the compositional distinctiveness of gene transfer donors (ref. 58; reviewed in ref. 56), this analysis did not focus on the absolute magnitude of gene transfer in these lineages. Instead, differences in the frequency of HGT events for different classes of genes were of primary interest, in addition to how this process has contributed to the evolution and specialization of the characterized M. smithii strains.
When using compositional methods, it was observed that gene transfer is more frequent in the variable genome than the core. For example, when examining 3-1 dinucleotide use (55) and using the rank order of G scores as the significance threshold, 5.7% of the core genes in the pan-genome show compositional evidence of transfer, compared with fully 16.4% of the variably represented genes, suggesting an approximately threefold enrichment of gene transfer in the variable relative to the core components of the pan-genome.
However, others have observed that phylogenetic methods tend to detect more ancient transfer events than compositional methods (59). Consistent with these observations, 73% of the genes for which PhyloNet found evidence of HGT were part of M. smithii's core genome, indicating transfer before the divergence of strains. By contrast, most putative HGT events predicted by compositional methods were part of the variable genome (59.3-68.0% of transfers, depending on the method) (Tables 20 and 21). This difference may be due in part to the requirement of phylogenetic methods for orthologs of the gene under investigation: Compositional HGT predictions for the subset of genes that could be mapped to KEGG orthology groups were also biased toward the core genome. Genes with both compositional and phylogenetic evidence of transfer tend to be more evenly split between the core and variable genomes than transfers supported by either type of evidence alone (Tables 20 and 21).
Taken together, these findings suggest that gene transfer has shaped both the core genome of M. smithii and differences between strains. External evidence further supports a role for HGT in shaping the core genome of M. smithii: 89.1% of genes within prophage (as detected by PhageFinder) are part of the core genome (Tables 20 and 21).
To test for differences in the functions contributed to the M. smithii pan-genome by the core genome, variable genome, or horizontally transferred genes, each of these three gene sets were annotated to KEGG pathways (level 2). The M. smithii core genome is enriched in genes involved in “translation” while being depleted in “membrane transporters” and “unclassified metabolic” genes (Bonferroni-corrected G-test for significance; P<0.001). The variable genome is enriched in genes for membrane transporters, “glycan biosynthesis and metabolism,” and genes whose functions are poorly characterized, while being depleted for genes involved in translation (Bonferroni-corrected G-test; P<0.001). Horizontally transferred genes, regardless of the detection method used, are most divergent from the pan-genome in their functional profile than either the core or variable components of the M. smithii pan-genome. This finding suggests that gene transfer has contributed significant functional diversity to M. smithii.
To understand in more detail the specific categories of genes that have been most frequently transferred, significant HGT results for 3-1 dinucleotide use were pooled across genomes and categorized according to KEGG pathway and KEGG orthology group, weighting genes with multiple pathway annotations on a per gene (rather than per annotation) basis (Table 32). As previously observed for genomic islands (60), genes of unknown or poorly characterized function dominated the HGT pool. Among genes with known KEGG level 2 pathway annotations, those in the KEGG category for folate biosynthesis were the most frequently transferred (101.7 normalized annotations). Tetrahydromethanopterin (THMP) methyltransferase genes were the most frequently transferred KEGG orthology (KO) within this group (23 putative HGT events for the D subunit). THMP methylransferase (61) participates in both the methanogenesis and folate biosynthesis pathways by transferring a methyl group from 5-Methyl-THMP to coenzyme-M (
Phylogenetic analysis of HGT revealed similar trends. Genes involved in the KEGG folate biosynthesis pathway are the second most frequently transferred functional class (after unclassified metabolic genes). Methanogenesis genes are also among the most abundant transferred functional classes (rank order 22/173 classes). As in the analysis of genes with atypical dinucleotide compositions, phylogenetic HGT detection found transfer in KO groups involved in methyl-coenzyme M recycling, including those for THMP methyltransferase A, B, and C subunits (EC 2.1.1.86), methyl-coenzyme M reductase system component A2, and heterodisulfide reductase (B and D subunits) (EC 1.8.98.1).
In addition to characterizing KEGG functional categories, ALP gene transfer were analyzed given their proposed importance in M. smithii niche specialization. Because the vast majority of ALP genes could not be assigned to KEGG orthology groups, only a small subset could be tested for gene transfer by using phylogenetic methods. Of the ALPs that could be assigned to KO groups, 6/49 (12.2%) were classified as being horizontally transferred using phylogenetic techniques. When analyzed compositionally, 5 or 6 of 6 of these ALPs were compositionally atypical in dinucleotide use, codon use, and k-words of length 4 or 6.
Remarkably, it was found that in the full pool of 854 ALP OGUs, between 52% and 65% show evidence of transfer across a variety of compositional measures, an enrichment of 6.4- to 9.3-fold when normalized to the overall levels of gene transfer predicted by the same methods. ALPs that could be mapped to KO groups were less compositionally atypical than ALPs as a whole (only 30.6-36.7% were compositionally annotated as transferred for this subgroup). Despite the observation that these genes are highly expressed in M. smithii strains, the ALPs annotated as possessing compositional evidence of transfer do not match the model for ribosomal proteins in their genome, meaning that their expression level alone does not account for their compositional atypicality. Large-scale HGT of ALPs would be consistent with their variability among strains.
M. smithii strains
These results lead us to hypothesize that M. smithii strains use their different repertoires of ALPs and the different sensitivities of ALP genes to formate to create diversity in their physical locations and/or their metabolic niches within the gut. Stated another way, these variations in expressed ALP repertoires could have important effects on the ability of different strains to establish syntrophic relationships with bacterial partners that have different abilities to generate formate or other substrates, or that have differing patterns of co-occurrence within an individual over time and between individuals. To further explore this notion, it will be important to define the structures of representative members of different ALP clusters through an M. smithii-directed structural genomics effort: Selection of ALPs could be guided by a number of criteria, including their strain distribution and their patterns of expression, both in vitro in monoculture in the presence of a variety of potential substrates for their metabolic networks, and in vivo in gnotobiotic mice containing various collections of sequenced M. smithii isolates and available cultured co-occurring bacterial taxa. The interactions between isolates and co-occurring bacterial species can also be explored in vitro if cocolonization of gnotobiotic mice proves to be problematic either because of difficulty in identifying suitable host diets or strains that are fit in the mouse gut (e.g., we have not yet been able to achieve persistent colonization of gnotobiotic mice with any of the five strains characterized in vitro by RNA-Seq after inoculating all of them together with a consortium of human gut-derived members of the Firmicutes, Bacteroidetes, and Proteobacteria that include saccharolytic bacteria and hydrogen producers and consumers). A complementary approach will be to select taxa for these in vitro and in vivo studies by predicting potential syntrophic relationships through in silico metabolic reconstructions of the metabolic networks of sequenced co-occurring species and M. smithii isolates, using methods described by Borenstein et al. (47).
This application is a continuation-in-part of U.S. application Ser. No. 12/627,961, filed on Nov. 30, 2008, which is a continuation-in-part of application No. PCT/US2008/065344, filed on May 30, 2008, which claims the priority of U.S. provisional application No. 60/932,457, filed on May 31, 2007, each of which is hereby incorporated by reference in its entirety.
This invention was made with government support under Grant numbers DK30292 and DK70077 awarded by the National Institutes of Health. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
60932457 | May 2007 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12627961 | Nov 2009 | US |
Child | 13764427 | US | |
Parent | PCT/US2008/065344 | May 2008 | US |
Child | 12627961 | US |