Most human diseases and syndromes have long been thought to be solely due to an alteration of the human genome. However, while mutations in human genes may correlate with some pathology, they only explain a small fraction of the clinical cases. One explanation is that a genetic predisposition requires epigenetic stimuli in order to result in an abnormal phenotype. Recent research however points toward another explanation, which is that the human microbiota plays a crucial role in the predispositions of different diseases (Clemente et al., Cell.,148(6):1258-70, 2012).
The human microbiota comprises thousands of bacterial species, among which commensal, beneficial or pathogen bacteria. Humans host microbiota in multiple locations such as skin, lung, vagina, mouth, and gut. These microbiota are different in their location and in their bacterial composition. The gut microbiota is the largest in its composition. It is generally considered that it comprises thousands of bacterial species, weighs about 1.5 kg and constitutes a rich gene repertoire on its own, also called gut microbiome, 100 times larger than the human nuclear genome.
All of them are known to play a role in maturation of the immune system and to impact immune response. The skin microbiota has for example been shown to play a role in the development of immune diseases such as asthma or atopic dermatitis (Hanski et al., Proc Natl Acad Sci USA.,109(21):8334-9, 2012).
The gut microbiota has been shown to play a role in the development of allergies, inflammatory bowel diseases, irritable bowel syndrome and possibly metabolic and degenerative disorders such as obesity, metabolic syndrome, diabetes and cancer. While normbiosis, qualifying the normal state of the microbiota, seems to guaranty homeostasis, disbiosis, which is the distortion from normbiosis, correlates with a long list of diseases.
It has been shown for example that reduced bacterial diversity in the gut microbiota is associated with metabolic diseases such as obesity (Ley et al., Nature., 444(7122):1022-3, 2006), type I diabetes (Wen et al., Nature., 455(7216): 1109-1113, 2008; Giongo et al, ISME J., 5(1):82-91, 2011), metabolic syndrome, hepatic steatosis, necroinflammation and fibrosis nonalcoholic steatohepatitis (Machado et al., Ann Hepatol., 11(4):440-9, 2012). Altered gut microbiota is also observed in inflammatory-related pathologies such as ulcerative colitis (Sasaki et al., J Signal Transduct., 2012:704953, 2012) and paediatric inflammatory bowel diseases (Comito et al., Int J Inflam., 2012:687143, 2012). Moreover, gut microbiota is at least locally affected in colorectal cancer (Sobhani et al., PLoS One., 6(1):e16393), 2011).
As growing evidence points toward the role of reduced bacterial diversity of the gut microbiota in a broad range of diseases, it is becoming critical to be technically able to assess it precisely.
However, assessing gut bacterial diversity proves complex. Indeed, only a small proportion of the bacteria species of the gut microbiota have been identified and sequenced, mostly because most gut bacteria cannot be cultured. In addition, most bacterial species are only present at a low copy number in the gut microbiota, which makes them difficult to detect (Hamady and Knight, Genome Res., 19: 1141-1152, 2009). Therefore, most of the gut bacteria have not been taxonomically assigned yet, which restrains the use as biomarkers to taxonomically known species and genes.
The determination of reduced bacterial diversity has thus been so far generally limited to measuring the relative abundance of known bacterial species or phyla, rather than determining the abundance of all of the species of the gut microbiota. For instance, with respect to type I diabetes, it is known that the proportion of Bacteroidetes bacteria increase over time in unhealthy (i.e., autoantibody positive, “autoimmune”) subjects, while the proportion of the Firmicutes bacteria increases in healthy non-type I diabetes prone subjects (Wen et al., Nature., 455(7216): 1109-1113, 2008). However, the evolution of the proportion of the non-taxonomically assigned species in connection with this disease, as well as many other, remains difficult to assess.
The current methods can therefore only report differences in the proportion of known bacteria. For this reason, they are not sensitive enough and most probably underestimate the actual population of people presenting reduced gut bacterial diversity.
There is therefore still a need for a comprehensive method to accurately determine reduced gut bacterial diversity in a subject that would rely on specific markers. Such a method would then not be limited to specific techniques or equipment and could therefore be implemented broadly.
The present invention is directed to a method for determining whether a subject has reduced gut bacterial diversity. Such a determination is useful, in particular for assessing whether the said subject is at risk of developing a pathology, such as e.g. type II diabetes, hyperglycemic syndrome, heart diseases, insulin resistance and hepatic stasis. The inventors have shown that it is possible to discriminate between individuals having reduced gut bacterial diversity and those having normal gut bacterial diversity by simply assessing the presence of a small number of those bacterial species in the gut.
The inventors have found a set of specific bacterial species, which presence or absence in the bacterial DNA of the faeces of a subject significantly correlates with reduced gut bacterial diversity.
By “reduced gut bacterial diversity”, it is herein referred to a gut microbiota in which the number of bacterial species is reduced compared to the average normal gut microbiota.
For example, the comparison between a test microbiota and a normal gut microbiota can be achieved by the genotyping of sequences obtained from the biological samples for example with massively parallel DNA sequencing. In that case, a subject with reduced bacterial diversity can have a microbiome comprising less than 480 000 bacterial gene counts, wherein said counts were obtained by sequencing gut microbial DNA obtained from a sample of 200 mg of faeces with Illumina-based high throughput sequencing, mapping the sequences obtained onto a reference set of bacterial genome (as described in Arumugam et al., Nature., 473(7346):174-80, 2011), removing human contamination, discarding reads mapping at multiple positions, and based on the total amount of remaining matched reads.
According to the invention, a subject has either a reduced gut bacterial diversity, or a normal bacterial diversity. The skilled person would then understand easily that when the method of the invention does not determine that the overweight subject has a reduced gut bacterial diversity, said subject obviously has a normal gut bacterial diversity. By “normal gut bacterial diversity”, it is herein referred to a gut microbiota in which the number of bacterial species is around the number found in the average normal gut microbiota, that is to say between 10% inferior and 10% superior to the the number of bacterial species found in the average normal gut microbiota.
By “microbiota”, it is herein referred to microflora and microfauna in an ecosystem such as intestines, mouth, vagina, or lungs. In microbiology, flora (plural: floras or floræ) refers to the collective bacteria and other microorganisms in an ecosystem (e.g., some part of the body of an animal host). The “gut microbiota” consists of all the bacterial species constituting the microbiota present in the gut of an individual.
A bacterial species according to the invention encompasses not only known bacterial species but also species which have not yet been taxonomically described. Indeed, whether they already have been taxonomically described or not, bacterial species can be characterized by their genome. For example, methods for characterizing bacteria using genetic information have been described in Vandamme et al. (Microbiol. Rev. 1996, 60(2):407).
It will be obvious to the person skilled in the art that the genes of a bacterial species are physically linked as a unit rather than being independently distributed between individuals, i.e. the genome of said bacterial species comprises gene sequences which are always present or absent together among individuals. Bacterial species can therefore be defined by parts of their genome, and sequencing the entire genome of bacterial species is not necessary for proper bacterial species identification.
For instance, a method for the identification of bacterial species in a microbial composition, based on bacterial DNA sequencing and using marker genes as taxonomic references has been described in Liu et al. (BMC genomics, 12(S2):S4, 2011). The person skilled in the art may further refer to Arumugam et al. (Nature, 473(7346):174-80, 2011) or Qin et al. (Nature, 490(7418):55-60, 2012) for detailed methods for the identification of bacterial species based on bacterial DNA sequencing.
According to the present invention a “bacterial species” is a group of bacterial genes from the gut microbiome, which abundance level varies in the same proportion among different individual samples. In other words, a bacterial species according to the invention is a cluster of bacterial gene sequences which abundance levels in samples from distinct subjects are statistically linked rather than being randomly distributed. It will be immediately apparent to the skilled person that such a cluster thus corresponds to a bacterial species.
Genes of the microbiome can be ascribed to a bacterial species by several statistical methods known to the person skilled in the art. Preferably, a statistical method for testing covariance is used for testing whether two genes belong to the same cluster. To this end, the skilled person may use non-parametrical measures of statistical dependence, such as the Spearman's rank correlation coefficient for example. Most preferably, a bacterial species according to the invention is a cluster that comprises gut bacterial genes and that is determined by the method used in Qin et al. (Nature, 490(7418): 55-60, 2012) for identifying metagenomic linkage groups.
By “subject”, it is herein referred to a vertebrate, preferably a mammal, and most preferably a human. There are several ways to obtain samples of the said subject's gut microbial DNA (Sokol et al., Inflamm Bowel Dis., 14(6): 858-867, 2008). For example, it is possible to prepare mucosal specimens, or biopsies, obtained by coloscopy. However, coloscopy is an invasive procedure which is ill-defined in terms of collection procedure from study to study. Likewise, it is possible to obtain biopies through surgery. However, even more than coloscopy, surgery is an invasive procedure, which effects on the microbial population are not known. Preferred is the fecal analysis, a procedure which has been reliably been used in the art (Bullock et al., Curr Issues Intest Microbiol.; 5(2): 59-64, 2004; Manichanh et al., Gut, 55: 205-211, 2006; Bakir et al., Int J Syst Evol Microbiol, 56(5): 931-935, 2006; Manichanh et al., Nucl. Acids Res., 36(16): 5180-5188, 2008; Sokol et al., Inflamm. Bowel Dis., 14(6): 858-867, 2008). An example of this procedure is described in the Methods section of the Experimental Examples. Feces contain about 1011 bacterial cells per gram (wet weight) and bacterial cells comprise about 50% of fecal mass. The microbiota of the feces represents primarily the microbiology of the distal large bowel. It is thus possible to isolate and analyze large quantities of microbial DNA from the feces of an individual. By “gut microbial DNA”, it is herein understood the DNA from any of the resident bacterial communities of the human gut. The term “gut microbial DNA” encompasses both coding and non-coding sequences; it is in particular not restricted to complete genes, but also comprises fragments of coding sequences. Fecal analysis is thus a non-invasive procedure, which yields consistent and directly-comparable results from patient to patient.
As explained above, “gut microbiome”, as used herein, refers to the set of bacterial genes from the species constituting the microbiota present in the gut of said subject. The sequences of the microbiome of the invention comprise at least gene sequences from the bacterial gene catalogue published by Qin et al. (Nature, 464: 59-65, 2010). The gene sequences from the catalogue are available from the EMBL (http:///www.bork.embl.de/˜arumugam/Qin_et_al—2010/) and BGI (http://gutmeta.genomics.org.cn) websites.
The bacterial species listed in Table 1 are absent from the gut microbiome of a significant proportion of subjects with a reduced bacterial diversity, while the bacterial species listed in Table 2 are present in the gut microbiome of a significant proportion of subjects with a reduced bacterial diversity.
These species are not limited to the ones which have already been known from prior art. Importantly, these specific bacterial species show a high correlation coefficient with reduced gut bacterial diversity. It is thus possible to determine whether a subject has reduced gut bacterial diversity with a high sensitivity. The sensitivity of a method is the proportion of actual positives which are correctly identified as such, and can be estimated by the area under the ROC (Receiver Operating Characteristic) curve, also called AUC. A receiver operating characteristic (ROC), or simply ROC curve, is a graphical plot which illustrates the performance of a binary classifier system as its discrimination threshold is varied. It is created by plotting the fraction of true positives out of the positives (TPR=true positive rate) vs. the fraction of false positives out of the negatives (FPR=false positive rate), at various threshold settings. TPR is also known as sensitivity, and FPR is one minus the specificity or true negative rate. Area Under the Curve (AUC) is a measure of a classifier/test performance across all possible values of the thresholds. The higher the AUC, the better the performance of the test.
The inventors have found that it is not necessary to determine the presence or the absence of every single species in order to assess the diversity of the gut bacterial population. Rather, said diversity can be evaluated with a high degree of confidence and accuracy by examining a very small subset of bacterial species. As shown in the experimental part, a very small number of species is a good marker of the said diversity. Indeed, even when the presence or absence of only one bacterial species is assessed, the method of the invention enables the detection of reduced bacterial diversity in a subject with an AUC of at least 0.69, and can be up to 0.936, depending of the bacterial species chosen for the test.
In comparison, a random method usually has an AUC of 0.5. Moreover, when inflammatory bowel disease, one of the pathologies associated with reduced bacterial diversity, is assessed by 16S rRNA sequencing of fecal samples, the AUC is of only 0.83 (Papa et al; PLoS One. 2012; 7(6):e39242. 2012).
In a first embodiment, the method of the invention is based on the determination of the presence or the absence of at least one bacterial species. Thus, according to this embodiment, the invention is directed to a method for determining whether a subject has reduced gut bacterial diversity, the said method comprising the step of detecting the presence or the absence of at least one bacterial species, preferably among the 58 bacterial species from table 1 and table 2, in the gut of the said subject. By “at least one bacterial species”, it is herein meant that the presence or absence of one unique species or of more than one species is assessed. In a preferred embodiment, the method of the invention includes the detection of the presence or absence of 1, 2, 2, 4, or 5 species. Even more preferably, the said method includes the detection of the presence or absence of more than 5 species. Most preferably, the said method includes detection of the presence or absence of 58 species.
The bacterial species of the invention are chosen from the list consisting in the bacterial species of table 1 and table 2. More precisely, the bacterial species of the invention are chosen from the list consisting in HL-1, HL-2, HL-3, HL-4, HL-5, HL-6, HL-7, HL-8, HL-9, HL-10, HL-11, HL-12, HL-13, HL-14, HL-15, HL-16, HL-17, HL-18, HL-19, HL-20, HL-21, HL-22, HL-23, HL-24, HL-25, HL-26, HL-27, HL-28, HL-29, HL-30, HL-31, HL-32, HL-33, HL-34, HL-35, HL-36, HL-37, HL-38, HL-39, HL-40, HL-41, HL-42, HL-43, HL-44, HL-45, HL-46, HL-47, HL-48, HL-49, HL-50, HL-51, HL-52, HL-53, HL-54, HL-55, HL-56, HL-57, HL-58.
Most intestinal commensals cannot be cultured. Genomic strategies have been developed to overcome this limitation (Hamady and Knight, Genome Res, 19: 1141-1152, 2009). These strategies have allowed the definition of the microbiome as the collection of the genes comprised in the genomes of the microbiota (Turnbaugh et al., Nature, 449: 804-8010, 2007; Hamady and Knight, Genome Res., 19: 1141-1152, 2009). The existence of a small number of species shared by all individuals constituting the human intestinal microbiota phylogenetic core has been demonstrated (Tap et al., Environ Microbiol., 11(10): 2574-2584, 2009). Recently, a metagenomic analysis has led to the identification of an extensive catalogue of 3.3 million non-redundant microbial genes of the human gut, corresponding to 576.7 gigabases of sequence (Qin et al., Nature, 464(7285): 59-65, 2010).
It will be immediately apparent to the person of skills in the art that the presence of a bacterial species can be easily determined by detecting a nucleic acid sequence specific of the said species. The presence of gut bacterial species is usually determined by detecting 16S rRNA gene sequences. However, this method is limited to known bacterial species.
By contrast, in the method of the invention, no prior identification of the bacterial species the said gene belongs to is required. The inventors have determined a minimum set of 50 bacterial gene sequences that are non-redundant sequences for each bacterial species of table 1 and table 2, and that can be used as tracer genes.
It will be obvious to the person skilled in the art that the number of bacteria from a given bacterial species in a sample directly correlate with the number of copies of at least one gene sequence detected in said sample. It is thereby possible to determine the presence of at least one of the bacterial species from table 1, or the absence of at least one of the bacterial species from table 2, simply by detecting the absence of at least one bacterial gene from said species.
The invention therefore enables assessing reduced gut bacterial diversity in a subject, without the need for complex and tedious statistical analysis. Moreover, because the method of the invention can rely on as little as one bacterial gene as a marker, it may be implemented by any known technique of DNA amplification or sequencing, and is not limited to a specific method or apparatus.
According to a preferred embodiment of the invention, the method for determining whether a subject has a reduced gut bacterial diversity comprises a step of detecting from a gut microbial DNA sample obtained from said subject whether at least one gene from at least one bacterial species from Table 1 is absent in said sample. Alternatively, the said method comprises a step of detecting from a gut microbial DNA sample obtained from said subject whether at least one gene from at least one bacterial species from Table 2 is present in said sample. Preferably, the method of the invention comprises a step of detecting from a gut microbial DNA sample obtained from said subject if at least one gene from at least one bacterial species from Table 1 is absent in said sample and at least one gene from at least one bacterial species from Table 2 is present in said sample.
Another preferred embodiment of the invention is a method for determining whether a subject has a reduced gut bacterial diversity, said method comprising:
Yet another preferred embodiment of the invention is a method for determining whether a subject has a reduced gut bacterial diversity, said method comprising:
In a preferred embodiment, the bacterial genes sequences of the bacterial cluster according to the invention are chosen in the list consisting of sequence SEQ ID NO.1 to sequence SEQ ID NO. 2900.
Depending on the size of the sample and of the occurrence of the bacterial genes of interest, certain bacterial genes may be difficult to detect in a sample. The skilled person would thus easily conceive that, to increase the confidence of the results, it is advantageous to determine the absence of a bacterial species by detecting the average abundance of several bacterial genes from a bacterial species.
In an embodiment, detecting whether at least one bacterial gene from at least a bacterial species from table 1 is absent in said sample comprises determining the number of copies of at least 1, 2, 3, 4 or 5 bacterial gene from said bacterial species in the sample. In a preferred embodiment, detecting whether at least one bacterial gene from at least one bacterial species from table 1 is absent in said sample comprises determining the number of copies of at least 10, 20, 30, 40 or at least 50 bacterial genes from said bacterial species in the sample.
Moreover, among all of the bacterial genes, some bacterial species are more significantly correlated with reduced gut bacterial diversity than others. The detection of the presence or absence of the more correlated bacterial species can advantageously enable determining reduced gut bacterial diversity with a much better sensitivity than the methods of the prior art. For example, as shown in the experimental part, the detection of the presence or absence of one of the bacterial species HL-1, HL-57, HL-53, HL-4, HL-54, HL-2, HL-3, HL-8, HL-10, HL-45, HL-22, HL-26, HL-9, HL-5, HL-11, HL-14, HL-13, HL-18, HL-12 or HL-21 enables the detection of reduced bacterial diversity in a subject with an AUC superior to 0.83. It is thereby possible to increase the sensitivity of the method of the invention, simply by assessing the presence or absence of those specific bacterial species, or of a least one gene from the specific bacterial species they belong to.
In an advantageous embodiment, the method of the invention comprises a step of detecting from a gut microbial DNA sample obtained from said subject whether at least one gene from a bacterial species chosen from the list consisting in HL-1, HL-57, HL-53, HL-4, HL-54, HL-2, HL-3, HL-8, HL-10, HL-45, HL-22, HL-26, HL-9, HL-5, HL-11, HL-14, HL-13, HL-18, HL-12 HL-21 from table 1 is absent in said sample.
In a particularly advantageous embodiment, the method of the invention comprises a step of detecting from a gut microbial DNA sample obtained from said subject whether at least one gene from the bacterial species HL-1 from table 1 is absent in said sample. The person skilled in the art knows that the more distinct bacterial species from Table 1 are present in the bacterial DNA from the feces of the subject, and the more distinct bacterial species from Table 2 are absent from the bacterial DNA from the feces of the subject the higher the probability that the subjects has a reduced gut bacterial diversity. It would then be obvious to the skilled person that the sensitivity of the method of the invention can be increased by assessing the presence or absence of bacterial genes from several different bacterial species from Table 1 and/or table 2. It is then possible to increase the sensitivity of the method by using bacterial genes from a linear combination of 2, 3, 4, 5 or more different bacterial species. For exemple, the combinations of 2 bacterial species from Table 1 and/or 2 enable AUC between around 0.736 and 0.955, the combinations of 3 bacterial species from Table 1 and/or 2 enable AUC between around 0.734 and 0.966, and the combinations of 4 bacterial species from Table 1 and/or 2 enable AUC between around 0.734 and 0.975. However, the inventors have surprisingly discovered that the detection of specific combinations of 2, 3 or 4 bacterial species enables for very high AUC. The more advantageous combinations of 2, 3 and 4 bacterial species are indicated in table 7, 8 and 9 respectively.
In a prefered embodiment, the method of the invention comprises a step of detecting from a gut microbial DNA sample obtained from said subject whether at least one gene from each of the bacterial species of any of the bacterial species combinations indicated in table 7, 8 and/or 9 is absent and/or present in said sample.
The person skilled in the art will notice that the bacterial species combinations indicated in table 7, 8 and/or 9 are combinations of bacterial species indicated in tables 1 and 2. Therefore, the person skilled in the art will obviously understand that, detecting whether at least one gene from each of the bacterial species of a bacterial species combination indicated in table 7, 8 and/or 9 is absent and/or present in said sample corresponds to detecting wether
in said sample.
The inventors have additionally selected bacterial species combinations of 2 to 20 bacterial species that enables for particularly important AUC, indicated in table 10, ranging from 0.955 to 0.982. Therefore, it is possible to achieve a great sensitivity by simply assessing the presence or absence of at least one bacterial gene from each of the bacterial species from a specific combination.
In an advantageous embodiment, the method of the invention comprises a step of detecting from a gut microbial DNA sample obtained from said subject whether:
A bacterial gene is absent from the sample when its number of copies in the sample is inferior to a certain threshold value. Accordingly, a bacterial gene is present in the sample when its number of copies in the sample is inferior to a certain threshold value.
According to the present invention, a “threshold value” is intended to mean a value that permits to discriminate samples in which the number of copies of the bacterial gene of interest is low or high.
In particular, if a number of copies of a bacterial gene of interest is inferior or equal to the threshold value, then the number of copies of this bacterial gene in the sample is considered low, whereas if the number of copies is superior to the threshold value, then the number of copies of this bacterial gene in the sample is considered high. A low copy number means that the bacterial gene is absent from the sample, whereas a high number of copies means that the bacterial gene is present in the sample.
For each gene, and depending on the method used for measuring the number of copies of the bacterial gene, the optimal threshold value may vary. However, it may be easily determined by a skilled person based on the analysis of the microbiome of several individuals in which the number of copies (low or high) is known for this particular bacterial gene, and on the comparison thereof with the number of copies of a control gene. Such a comparison may be facilitated by using the same amount of bacterial DNA for each of the analyzed samples, or by dividing the number of copies of the bacterial gene obtained, by the initial amount of bacterial DNA used in the test. Indeed, it is well known from the skilled person that the total amount of bacteria in the gut of a subject, and consequently in its feces, remains the same even in the case of reduced bacterial diversity. It is also possible to use a reference such as a gut bacterial species whose abundance is known not to vary between individuals with reduced and normal bacterial diversity.
According to the invention, determining the number of copies of at least one bacterial gene in a sample obtained from the subject can be achieved by any technique capable of detecting and quantifying nucleic acids sequences, and include inter alia hybridization with a labelled probe, PCR amplification, sequencing, and all other methods known to the person of skills in the art.
In a first embodiment, determining the number of copies of at least one bacterial gene in a sample obtained from the subject is performed using sequencing. Optionally, DNA is be fragmented, for example by restriction nuclease prior to sequencing. Sequencing is done using any technique known in the state of the art, including sequencing by ligation, pyrosequencing, sequencing-by-synthesis or single-molecule sequencing. Sequencing also includes PCR-Based techniques, such as for example quantitative PCR or emulsion PCR.
Sequencing is performed on the entire DNA contained in the biological sample, or on portions of the DNA contained in the biological sample. It will be immediately clear to the skilled person that the said sample contains at least a mixture of bacterial DNA and of human DNA from the host subject. However, though the overall bacterial DNA is likely to represent the major fraction of the total DNA present in the sample, each bacterial species may only represent a small fraction of the total DNA present in the sample.
To overcome this difficulty, the skilled person can use a method that allows the quantitative genotyping of sequences obtained from the biological sample with high precision. In one embodiment of this approach, the precision is achieved by analysis of a large number (for example, millions or billions) of polynucleotides. Furthermore, the precision can be enhanced by the use of massively parallel DNA sequencing, such as, but not limited to that performed by the Illumina Genome Analyzer platform (Bentley et al. Nature; 456: 53-59, 2008), the Roche 454 platform (Margulies et al. Nature; 437: 376-380, 2005), the ABI SOLiD platform (McKernan et al., Genome Res; 19: 1527-1541, 2009), the Helicos single molecule sequencing platform (Harris et al. Science; 320: 106-109, 2008), real-time sequencing using single polymerase molecules (Science; 323: 133-138, 2009), Ion Torrent sequencing (WO 2010/008480; Rothberg et al., Nature, 475: 348-352, 2011) and nanopore sequencing (Clarke J et al. Nat Nanotechnol.; 4: 265-270, 2009).
When the skilled person relies on sequencing methods to detect the presence or absence of certain bacterial genes, the information collected from sequencing is used to determine the number of copies of nucleic acid sequences of interest via bioinformatics procedures. For example, in an embodiment, the nucleic acid sequences of said bacterial species in the gut bacterial DNA sample are identified in the global sequencing data by comparison with the nucleic acid sequences SEQ ID NO.1 to SEQ ID NO. 2900. This comparison is advantageously based on the level of sequence identity with the sequences SEQ ID NO.1 to SEQ ID NO. 2900.
Thus, a nucleic acid sequence displaying at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity with at least one of the nucleic acid sequences SEQ ID NO. 1 to SEQ ID NO. 2900 is identified as a sequence comprised in one of the bacterial species of the invention.
Thus, in a preferred embodiment, detecting whether at least one bacterial species from table 1 is absent and/or at least one species from table 2 is present in said sample comprises determining the number of nucleic acid sequences in the gut bacterial DNA sample having at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identity with at least one of the nucleic acid sequences SEQ ID NO. 1 to SEQ ID NO. 2900.
The term “sequence identity” herein refers to the identity between two nucleic acids sequences. Identity between sequences can be determined by comparing a position in each of the sequences which may be aligned for the purposes of comparison. When a position in the compared sequences is occupied by the same base, then the sequences are identical at that position. A degree of sequence identity between nucleic acid sequences is a function of the number of identical nucleotides at positions shared by these sequences.
To determine the percent identity of two amino acids sequences, the sequences are aligned for optimal comparison. For example, gaps can be introduced in the sequence of a first nucleic acid sequence for optimal alignment with the second nucleic acid sequence. The nucleotides at corresponding nucleotide positions are then compared. When a position in the first sequence is occupied by the same nucleotide as the corresponding position in the second sequence, the molecules are identical at that position. The percent identity between the two sequences is a function of the number of identical positions shared by the sequences. Hence % identity=number of identical positions/total number of overlapping positions×100.
In this comparison the sequences can be the same length or can be different in length. Optimal alignment of sequences for determining a comparison window may be conducted by the local homology algorithm of Smith and Waterman (J. Theor. Biol., 91(2): 370-380, 1981), by the homology alignment algorithm of Needleman and Wunsch (J. Mol. Biol, 48(3): 443-453, 1972), by the search for similarity via the method of Pearson and Lipman (Proc. Natl. Acad. Sci. U.S.A., 85(5): 2444-2448, 1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA and TFASTA in the Wisconsin Genetics Software Package Release 7.0, Genetic Computer Group, 575, Science Drive, Madison, Wis.) or by inspection. The best alignment (i.e. resulting in the highest percentage of identity over the comparison window) generated by the various methods is selected.
The term “sequence identity” thus means that two polynucleotide sequences are identical (i.e. on a nucleotide by nucleotide basis) over the window of comparison. The term “percentage of sequence identity” is calculated by comparing two optimally aligned sequences over the window of comparison, determining the number of positions at which the identical nucleic acid base (e.g. A, T, C, G, U, or I) occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison (i.e. the window size) and multiplying the result by 100 to yield the percentage of sequence identity. The same process can be applied to polypeptide sequences. The percentage of sequence identity of a nucleic acid sequence or an amino acid sequence can also be calculated using BLAST software (Version 2.06 of September 1998) with the default or user defined parameter.
In another preferred embodiment, PCR-based techniques are used to determine the number of copies of at least one bacterial gene. Preferably, the PCR technique used quantitatively measures starting amounts of DNA, cDNA, or RNA. Examples of PCR-based techniques according to the invention include techniques such as, but not limited to, quantitative PCR (Q-PCR), reverse-transcriptase polymerase chain reaction (RT-PCR), quantitative reverse-transcriptase PCR (QRT-PCR), rolling circle amplification (RCA) or digital PCR. These techniques are well known and easily available technologies for those skilled in the art and do not need a precise description. In a preferred embodiment, the determination of the copy number of the bacterial genes of the invention is performed by quantitative PCR.
Amplification primers specific for the genes to be tested are thus also very useful for performing the methods according to the invention. The present invention thus also encompasses primers for amplifying at least one gene selected from the genes of sequence SEQ ID NO. 1-2900.
In another preferred embodiment, the presence or absence of the bacterial genes according to the invention is detected by the use of a nucleic microarray.
According to the invention, a “nucleic microarray” consists of different nucleic acid probes that are attached to a substrate, which can be a microchip, a glass slide or a microsphere-sized bead. A microchip may be constituted of polymers, plastics, resins, polysaccharides, silica or silica-based materials, carbon, metals, inorganic glasses, or nitrocellulose. Probes can be nucleic acids such as cDNAs (“cDNA microarray”) or oligonucleotides (“oligonucleotide microarray”), and the oligonucleotides may be about 25 to about 60 base pairs or less in length.
To determine the copy number of a target nucleic sample, said sample is labelled, contacted with the microarray in hybridization conditions, leading to the formation of complexes between target nucleic acids that are complementary to probe sequences attached to the microarray surface. The presence of labelled hybridized complexes is then detected. Many variants of the microarray hybridization technology are available to the man skilled in the art.
In a specific embodiment, the nucleic microarray is an oligonucleotide microarray comprising at least one oligonucleotide specific for at least one gene having a sequence selected from SEQ ID NOs 1-2900. Preferably, the said microarray comprises at least 58 oligonucleotides, each oligonucleotide being specific for one gene of a distinct cluster of the invention. More preferably, the microarray of the invention consists of 2900 oligonucleotides specific for each of the genes of sequences SEQ ID NOs. 1-2900.
In another embodiment, the nucleic microarray is an oligonucleotide microarray comprising at least one oligonucleotide specific for at least one gene of each of the bacterial species HL-10, HL-1 and HL-5. Preferably, the nucleic microarray is an oligonucleotide microarray comprising or consisting in oligonucleotides specific for at least 2, 3, 4, 5, 10, 20, 30 or 40 genes of each of the bacterial species HL-10, HL-1 and HL-5.
In another embodiment, the nucleic microarray is an oligonucleotide microarray comprising at least one oligonucleotide specific for at least one gene of each of the bacterial species HL-8, HL-3, HL-53 and HL-26. Preferably, the nucleic microarray is an oligonucleotide microarray comprising or consisting in oligonucleotides specific for at least 2, 3, 4, 5, 10, 20, 30 or 40 genes of each of the bacterial species HL-8, HL-3, HL-53 and HL-26.
In another embodiment, the nucleic microarray is an oligonucleotide microarray comprising at least one oligonucleotide specific for at least one gene of each of the bacterial species HL-10, HL-26, HL-8, HL-53 and HL-3. Preferably, the nucleic microarray is an oligonucleotide microarray comprising or consisting in oligonucleotides specific for at least 2, 3, 4, 5, 10, 20, 30 or 40 genes of each of the bacterial species HL-10, HL-26, HL-8, HL-53 and HL-3.
In another embodiment, the nucleic microarray is an oligonucleotide microarray comprising at least one oligonucleotide specific for at least one gene of each of the bacterial species HL-53, HL-8, HL-13, HL-3, HL-26 and HL-37. Preferably, the nucleic microarray is an oligonucleotide microarray comprising or consisting in oligonucleotides specific for at least 2, 3, 4, 5, 10, 20, 30 or 40 genes of each of the bacterial species HL-53, HL-8, HL-13, HL-3, HL-26 and HL-37.
In another embodiment, the nucleic microarray is an oligonucleotide microarray comprising at least one oligonucleotide specific for at least one gene of each of the bacterial species HL-37, HL-26, HL-10, HL-8, HL-21, HL-53 and HL-11. Preferably, the nucleic microarray is an oligonucleotide microarray comprising or consisting in oligonucleotides specific for at least 2, 3, 4, 5, 10, 20, 30 or 40 genes of each of the bacterial species HL-37, HL-26, HL-10, HL-8, HL-21, HL-53 and HL-11.
In another embodiment, the nucleic microarray is an oligonucleotide microarray comprising at least one oligonucleotide specific for at least one gene of each of the bacterial species HL-10, HL-5, HL-26, HL-25, HL-53, HL-22, HL-8 and HL-17. Preferably, the nucleic microarray is an oligonucleotide microarray comprising or consisting in oligonucleotides specific for at least 2, 3, 4, 5, 10, 20, 30 or 40 genes of each of the bacterial species HL-10, HL-5, HL-26, HL-25, HL-53, HL-22, HL-8 and HL-17.
In another embodiment, the nucleic microarray is an oligonucleotide microarray comprising at least one oligonucleotide specific for at least one gene of each of the bacterial species HL-26, HL-37, HL-21, HL-10, HL-5, HL-17, HL-16, HL-8 and HL-3 Preferably, the nucleic microarray is an oligonucleotide microarray comprising or consisting in oligonucleotides specific for at least 2, 3, 4, 5, 10, 20, 30 or 40 genes of each of the bacterial species HL-26, HL-37, HL-21, HL-10, HL-5, HL-17, HL-16, HL-8 and HL-3.
In another embodiment, the nucleic microarray is an oligonucleotide microarray comprising at least one oligonucleotide specific for at least one gene of each of the bacterial species HL-11, HL-15 HL-27, HL-35, HL-8, HL-22, HL-47, HL-26, HL-10 and HL-37. Preferably, the nucleic microarray is an oligonucleotide microarray comprising or consisting in oligonucleotides specific for at least 2, 3, 4, 5, 10, 20, 30 or 40 genes of each of the bacterial species HL-11, HL-15, HL-27, HL-35, HL-8, HL-22, HL-47, HL-26, HL-10 and HL-37.
In another embodiment, the nucleic microarray is an oligonucleotide microarray comprising at least one oligonucleotide specific for at least one gene of each of the bacterial species HL-28, HL-21, HL-5, HL-27, HL-26, HL-17, HL-3, HL-40, HL-37 HL-25 and HL-38. Preferably, the nucleic microarray is an oligonucleotide microarray comprising or consisting in oligonucleotides specific for at least 2, 3, 4, 5, 10, 20, 30 or 40 genes of each of the bacterial species HL-28, HL-21, HL-5, HL-27, HL-26, HL-17, HL-3, HL-40, HL-37 HL-25 and HL-38.
In another embodiment, the nucleic microarray is an oligonucleotide microarray comprising at least one oligonucleotide specific for at least one gene of each of the bacterial species HL-8, HL-45, HL-35, HL-53, HL-17, HL-26, HL-3, HL-18, HL-10, HL-37, HL-40 and HL-15. Preferably, the nucleic microarray is an oligonucleotide microarray comprising or consisting in oligonucleotides specific for at least 2, 3, 4, 5, 10, 20, 30 or 40 genes of each of the bacterial species HL-8, HL-45, HL-35, HL-53, HL-17, HL-26, HL-3, HL-18, HL-10, HL-37, HL-40 and HL-15.
In another embodiment, the nucleic microarray is an oligonucleotide microarray comprising at least one oligonucleotide specific for at least one gene of each of the bacterial species HL-33, HL-13, HL-10, HL-28, HL-36, HL-17, HL-8, HL-3, HL-22, HL-53, HL-35, HL-5 and HL-27. Preferably, the nucleic microarray is an oligonucleotide microarray comprising or consisting in oligonucleotides specific for at least 2, 3, 4, 5, 10, 20, 30 or 40 genes of each of the bacterial species HL-33, HL-13, HL-10, HL-28, HL-36, HL-17, HL-8, HL-3, HL-22, HL-53, HL-35, HL-5 and HL-27.
In another embodiment, the nucleic microarray is an oligonucleotide microarray comprising at least one oligonucleotide specific for at least one gene of each of the bacterial species HL-56, HL-17, HL-21, HL-35, HL-40, HL-26, HL-12, HL-13, HL-45, HL-3, HL-5, HL-10, HL-8 and HL-27. Preferably, the nucleic microarray is an oligonucleotide microarray comprising or consisting in oligonucleotides specific for at least 2, 3, 4, 5, 10, 20, 30 or 40 genes of each of the bacterial species HL-56, HL-17, HL-21, HL-35, HL-40, HL-26, HL-12, HL-13, HL-45, HL-3, HL-5, HL-10, HL-8 and HL-27.
In another embodiment, the nucleic microarray is an oligonucleotide microarray comprising at least one oligonucleotide specific for at least one gene of each of the bacterial species HL-31, HL-11, HL-25, HL-10, HL-35, HL-12, HL-28, HL-37, HL-5, HL-33, HL-17, HL-51, HL-27, HL-40 and HL-15. Preferably, the nucleic microarray is an oligonucleotide microarray comprising or consisting in oligonucleotides specific for at least 2, 3, 4, 5, 10, 20, 30 or 40 genes of each of the bacterial species HL-31, HL-11, HL-25, HL-10, HL-35, HL-12, HL-28, HL-37, HL-5, HL-33, HL-17, HL-51, HL-27, HL-40 and HL-15.
In another embodiment, the nucleic microarray is an oligonucleotide microarray comprising at least one oligonucleotide specific for at least one gene of each of the bacterial species L-33, HL-51, HL-39, HL-27, HL-56, HL-31, HL-23, HL-10, HL-18, HL-4, HL-11, HL-8, HL-21, HL-45, HL-5 and HL-17. Preferably, the nucleic microarray is an oligonucleotide microarray comprising or consisting in oligonucleotides specific for at least 2, 3, 4, 5, 10, 20, 30 or 40 genes of each of the bacterial species L-33, HL-51, HL-39, HL-27, HL-56, HL-31, HL-23, HL-10, HL-18, HL-4, HL-11, HL-8, HL-21, HL-45, HL-5 and HL-17.
In another embodiment, the nucleic microarray is an oligonucleotide microarray comprising at least one oligonucleotide specific for at least one gene of each of the bacterial species HL-45, HL-27, HL-47, HL-5, HL-51, HL-8, HL-26, HL-3, HL-53, HL-37, HL-13, HL-11, HL-17, HL-23, HL-1 and HL-28. Preferably, the nucleic microarray is an oligonucleotide microarray comprising or consisting in oligonucleotides specific for at least 2, 3, 4, 5, 10, 20, 30 or 40 genes of each of the bacterial species HL-45, HL-27, HL-47, HL-5, HL-51, HL-8, HL-26, HL-3, HL-53, HL-37, HL-13, HL-11, HL-17, HL-23, HL-1 and HL-28.
In another embodiment, the nucleic microarray is an oligonucleotide microarray comprising at least one oligonucleotide specific for at least one gene of each of the bacterial species HL-31, HL-11, HL-33, HL-28, HL-36, HL-21, HL-22, HL-4, HL-37, HL-45, HL-27, HL-15, HL-51, HL-8 and HL-17. Preferably, the nucleic microarray is an oligonucleotide microarray comprising or consisting in oligonucleotides specific for at least 2, 3, 4, 5, 10, 20, 30 or 40 genes of each of the bacterial species HL-31, HL-11, HL-33, HL-28, HL-36, HL-21, HL-22, HL-4, HL-37, HL-45, HL-27, HL-15, HL-51, HL-8 and HL-17.
In another embodiment, the nucleic microarray is an oligonucleotide microarray comprising at least one oligonucleotide specific for at least one gene of each of the bacterial species HL-18, HL-56, HL-28, HL-36, HL-45, HL-17, HL-35, HL-33, HL-11, HL-5, HL-8, HL-10, HL-12, HL-25, HL-22, HL-39, HL-49, HL-7 and HL-15. Preferably, the nucleic microarray is an oligonucleotide microarray comprising or consisting in oligonucleotides specific for at least 2, 3, 4, 5, 10, 20, 30 or 40 genes of each of the bacterial species HL-18, HL-56, HL-28, HL-36, HL-45, HL-17, HL-35, HL-33, HL-11, HL-5, HL-8, HL-10, HL-12, HL-25, HL-22, HL-39, HL-49, HL-7 and HL-15.
In another embodiment, the nucleic microarray is an oligonucleotide microarray comprising at least one oligonucleotide specific for at least one gene of each of the bacterial species HL-47, HL-5, HL-36, HL-37, HL-35, HL-44, HL-11, HL-8, HL-17, HL-31, HL-18, HL-13, HL-21, HL-51, HL-4, HL-28, HL-45, HL-33, HL-3 and HL-15. Preferably, the nucleic microarray is an oligonucleotide microarray comprising or consisting in oligonucleotides specific for at least 2, 3, 4, 5, 10, 20, 30 or 40 genes of each of the bacterial species HL-47, HL-5, HL-36, HL-37, HL-35, HL-44, HL-11, HL-8, HL-17, HL-31, HL-18, HL-13, HL-21, HL-51, HL-4, HL-28, HL-45, HL-33, HL-3 and HL-15.
Said microarray may further comprise at least one oligonucleotide for detecting at least one gene of at least one control bacterial species. A convenient bacterial species may be e.g. a bacterial species whose abundance does not vary between individuals with a reduced bacterial diversity and individuals with normal bacterial diversity. Preferably, the oligonucleotides are about 50 bases in length.
Suitable microarray oligonucleotides specific for any gene of SEQ ID NOs. 1-2900 may be designed, based on the genomic sequence of each gene, using any method of microarray oligonucleotide design known in the art. In particular, any available software developed for the design of microarray oligonucleotides may be used, such as, for instance, the OligoArray software (available at http://berry.engin.umich.edu/oligoarray/), the GoArrays software (available at http://www.isima.fr/bioinfo/goarrays/), the Array Designer software (available at http://www.premierbiosoft.com/dnamicroarray/index.html), the Primer3 software (available at http://frodo.wi.mit.edu/primer3/primer3_code.html), or the Promide software (available at http://oligos.molgen.mpg.de/).
The invention further concerns a kit for the in vitro determination of the reduced gut bacterial diversity phenotype, comprising at least one reagent for the determination of the copy number of at least one gene having a sequence selected from SEQ ID NOs. 1-2900. By “a reagent for the determination of the copy number of at least one gene”, it is meant a reagent which specifically allows for the determination of the copy number of the said gene, i.e. a reagent specifically intended for the specific determination of the copy number of at least one gene having a sequence selected from SEQ ID NOs. 1-2900. This definition excludes generic reagents useful for the determination of the expression level of any gene, such as Taq polymerase or an amplification buffer, although such reagents may also be included in a kit according to the invention. Such a reagent for the determination of the copy number of at least one gene can be for example a dedicated microarray as described above or amplification primers specific for at least one gene having a sequence selected from SEQ ID NOs. 1-2900. The present invention thus also relates to a kit for the in vitro determination of the reduced gut bacterial diversity phenotype, said kit comprising a dedicated microarray as described above or amplification primers specific for at least one gene having a sequence selected from SEQ ID NOs. 1-2900. Here also, when the kit comprises amplification primers, while said kit may comprise amplification primers specific for other genes, said kit preferably comprises at most 100, at most 75, 50, at most 40, at most 30, preferably at most 25, at most 20, at most 15, more preferably at most 10, at most 8, at most 6, even more preferably at most 5, at most 4, at most 3 or even 2 or one or even zero couples of amplification primers specific for other genes than the genes of sequences SEQ ID NOs 1-2900. For example, said kit may comprise at least a couple of amplification primers for at least one gene in addition to the primers for at least one gene having a sequence selected from SEQ ID NOs. 1-2900.
Such a kit for the in vitro determination of the reduced gut bacterial diversity phenotype may further comprise instructions for detection of the presence or absence of a responsive phenotype.
The inventors have also discovered that a low gut microbiome profile is associated with traits underlying metabolic disorders.
The inventors have compared the features of normal subjects, having a high gene count, and subjects with a reduced gut bacterial diversity, having a low gene count, and identified that the low gene count individuals, who accounted for 23% of the total study population, included a significantly higher proportion of the obese and were characterized by a more marked adiposity, as reflected by an increase in body mass index (BMI) and fat percentage. The inventors have discovered that the adiposity phenotype of low gene count individuals was associated with elevated serum leptin, decreased serum adiponectin, insulin resistance, hyperinsulinaemia, elevated levels of triglycerides and free fatty acids and a more marked inflammatory phenotype (increased C reactive protein (CRP) and elevated white blood cell count) than seen in high gene count individuals. Circulating fasting induced adipose factor (FIAF) was significantly elevated in the low gene count group—also when adjusting for BMI. These analyses indicate that the phenotypic differences of low gene count and high gene count individuals are consistent with a series of risk markers for metabolic disorders such as type II diabetes, hyperglycemic syndrome, heart diseases, insulin resistance or hepatic stasis.
The invention therefore also relates to a method to assess the risk of a subject of developing metabolic disorders, preferentially type II diabetes, hyperglycemic syndrome, heart diseases, insulin resistance or hepatic stasis, and comprising the steps of:
Moreover, it is known from the art that a reduced gut bacterial diversity is correlated to immune disorders. In particular, it has been shown that a reduced gut bacterial diversity is associated with sensitivity to nosocomial pathogens in elderly, allergic asthma in neonatal subjects, atopic dermatitis and type I diabetes.
The invention therefore also relates to a method to assess the risk of a subject of developing immune disorders, preferentially sensitivity to nosocomial pathogens in elderly, allergic asthma in neonatal subjects, atopic dermatitis or type I diabetes, and comprising the steps of:
The practice of the invention employs, unless other otherwise indicated, conventional techniques or protein chemistry, molecular virology, microbiology, recombinant DNA technology, and pharmacology, which are within the skill of the art. Such techniques are explained fully in the literature. (See Ausubel et al., Current Protocols in Molecular Biology, Eds., John Wiley & Sons, Inc. New York, 1995; Remington's Pharmaceutical Sciences, 17th ed., Mack Publishing Co., Easton, Pa., 1985; and Sambrook et al., Molecular cloning: A laboratory manual 2nd edition, Cold Spring Harbor Laboratory Press—Cold Spring Harbor, N.Y., USA, 1989). The nomenclatures used in connection with, and the laboratory procedures and techniques of, molecular and cellular biology, protein biochemistry, enzymology and medicinal and pharmaceutical chemistry described herein are those well known and commonly used in the art.
Having generally described this invention, a further understanding of characteristics and advantages of the invention can be obtained by reference to certain specific examples and figures which are provided herein for purposes of illustration only and are not intended to be limiting unless otherwise specified.
The abundance of known intestinal bacteria was assessed by mapping of a large number of sequencing reads from total fecal DNA onto a reference set of their genomes. The abundance of genes from the reference catalog of 292 non-obese and obese individuals was assessed.
Study participants were recruited from the Inter99 study population. The Inter99 study is a randomized, non-pharmacological intervention study for the prevention of ischemic heart disease, and was conducted at the Research Centre for Prevention and Health in Glostrup, Denmark between 1999-2006 (clinicalTrials.gov: NCT00289237)1. The participants in the Inter99 study were examined at baseline, after 1, 3 and 5 years depending on the type of intervention.
For the study individuals with body mass index (BMI) below 25 kg/m2 or BMI above 30 kg/m2 at year 5 in the Inter99 study were randomly selected from track records. They had no known gastro-intestinal disease, no previously bariatric surgery, no medications known to affect the immune system and no antibiotics two months prior to fecal sample collection. Individuals with type 2 diabetes at the day of examination where excluded. All together 292 non-diabetic individuals were included in the protocol. All had North European ethnicity. At the time of the current physical examination 96 (33%) of study volunteers were lean with BMI <25 kg/m2, 27 (9%) were overweight with BMI between 25 and 30 kg/m2, and 169 (58%) were obese with BMI >30 kg/m2 according to World Health Organisation (WHO) definition2. The study was approved by the local Ethical Committees of the Capital Region of Denmark (HC-2008-017), and was in accordance with the principals of the Declaration of Helsinki. All individuals gave written informed consent before participation in the study.
The participants were examined on two different days approximately 14 days apart. On the first day participants were examined in the morning after an over-night fast. Height was measured without shoes to the nearest 0.5 cm, and weight was measured without shoes and wearing light clothes to the nearest 0.1 kg. Hip and waist circumference were recorded using a non-expandable measuring tape to the nearest 0.5 cm. Waist circumference was measured midway between the lower rib margin and the iliac crest. Hip circumference was measured as the largest circumference between the waist and the thighs. On the second day of examination all participants delivered a stool sample collected at home and Dual-emission X-ray Absorptiometry (DXA) was performed. Analyses of data from DXA scan were conducted with the integrated software (Hologic Discovery A, Santax, USA). Sagittal height was measured at the time of the DXA scan with the use of the Holtain-Kahn abdominal Caliper at the highest point of the abdomen with the participant supine and while breathing out. Participant receiving statins, fibrates and/or ezetimibe were reported as receiving lipid lowering medication.
Intra-abdominal adipose tissue (IAAT, cm2) was calculated using data from DXA scans and anthropometry using the equation3: y=−208.2+4.62 (sagittal diameter, cm)+0.75 (age, years)+1.73 (waist, cm)+0.78 (trunk fat, %)3. Homeostatic model assessment of insulin resistance (HOMA-IR) was calculated as: (fasting plasma glucose (mmol/l)*fasting serum insulin (mU/l))/22.54.
All analyses were performed on blood samples drawn in the morning after an over-night fast from at least 10.00 p.m. the previous evening.
Plasma glucose was analyzed by a glucose oxidase method (Granutest, Merck, Darmstadt, Germany) with a detection limit of 0.11 mmol/l and intra- and interassay coefficients of variation (CV) of <0.8 and <1.4%, respectively. HbAlc was measured on TOSOH G7 by ion-exchange high performance liquid chromatography.
Serum insulin (excluding intact proinsulin) was measured using the AutoDELFIA insulin kit (Perkin-Elmer, Wallac, Turku, Finland) with a detection limit of 3 pmol/l and with intra- and interassay CV of <3.2% and <4.5%, respectively. Plasma total cholesterol, plasma HDL-cholesterol and plasma triglycerides were all measured on Vitros 5600 using reflect-spectrophotometrics. Blood leucocytes and white blood cell differential count were measured on Sysmex XS 1000i using flow cytometrics. Plasma alanin aminotransferase (ALT) and plasma total free fatty acids were analyzed using standard biochemical methods (Modular Evo). Plasma high sensitive C− reactive protein (hs-CRP) was analyzed by a particle-enhanced immunoturbidmetric assay on MODULAR Evo using CRPL3 kit (Roche, Mannheim, Germany) with a detection limit of 0.3 mg/l and intra- and inter CV of <4.0% and 6.2%, respectively
Plasma adiponectin was analyzed using a two-site-sandwich ELISA kit for measuring total human adiponectin (TECO, Sissach, Switzerland). Detection limit was 0.6 ng/ml and interassay and intraassay CV were <6.72% and <4.66%, respectively. Fasting induced adipose factor (FIAF), also termed human angiopoietin like 4 (ANGPLT4) was measured using a quantitative sandwich ELISA (Adipo Bioscience, Santa Clara, USA). Detection limit was 0.6 μg/l and the inter-assay and intra-assay CV were 8% and 4%, respectively. Lipopolysaccharide binding protein was analyzed by a solid phase sandwich ELISA kit (Abnova) with an interassay CV of <17.8% and an intraassay CV of <6.1%. Serum IL-6 and serum TNF-alfa were analysed by Luminex using the Bio-Plex Pro cytokine assay (Bio-Rad), whereas serum leptin was measured using the Bio-Plex Pro diabetes assay.
Stool samples were obtained at the homes of each participant and samples were immediately frozen by storing them in their home freezer. Frozen samples were delivered to Steno Diabetes Center using insulating polystyrene foam containers, and stored at −80° C. until analysis. The time span from sampling to delivery at the Steno Diabetes Center was aimed to be as short as possible and no more than 48 hours.
A frozen aliquot (200 mg) of each fecal sample was suspended in 250 μl of guanidine thiocyanate, 0.1 M Tris (pH 7.5) and 40 μl of 10% N-lauroyl sarcosine. Then, DNA extraction was conducted as previously described4,5 . The DNA concentration and its molecular size were estimated by nanodrop (Thermo Scientific) and on agarose gel electrophoresis.
DNA library preparation followed the manufacturer's instruction (Illumina) The workflow indicated by the provider was used to perform cluster generation, template hybridization, isothermal amplification, linearization, blocking and denaturing and hybridization of the sequencing primers. The base-calling pipeline (version IlluminaPipeline-0.3) was used to process the raw fluorescent images and call sequences.
One library (clone insert size 200 bp) was constructed for each of the first batch of 15 samples; two libraries with different clone insert sizes (135 by and 400 bp) for each of the second batch of 70 samples, and one library (350 bp) for each of the third batch of 207 samples.
After sequencing, quality control was performed and human genome contaminant was screened. Finally, 26.0-186.1 million high-quality reads were generated for the 292 samples, with an average of 68.2 million high-quality reads. Sequencing read length of the first batch of 15 samples was 44 bp, the second batch was 75 bp, and the third batch was 75 by and 90 bp.
The high-quality short reads were aligned against the gene catalog using SOAP2.216 by allowing at most two mismatches in the first 35-bp region and 90% identity over the read sequence. The alignment result was filtered and the uniquely -mapped pairs (paired-end reads) were counted for each gene for each sample. To reasonably and sufficiently utilize the alignment result, some of paired-end reads, one end of which was mapped on the end of a gene and the other end was missed but expected to locate on the unassembled gene region or no coding region, would be treated as correct paired-end alignment.
Based on the pair-oriented counting result of each samples, the threshold of 1 read was selected for gene identification, to include the rare genes into the analysis. 91,032-1,005,488 genes were identified for the 292 samples, with an average of 670,528 genes.
To eliminate the influence of sequencing fluctuation, the alignment results were sampled and the number of mapped pairs was downsized to 11 million for each sample. After that, 59,147-878,816 genes were found for the 292 samples, with an average of 578,512 genes.
Genes belonging to the orthologous groups COG0085, COG0525, and COG0090 from 3,515 prokaryotic genomes were clustered to operational taxonomic units (OTUs) at 95% identity using UCLUST (Edgar, 2010) and used as a reference database. Paired-end Illumina reads from 292 metagenomic samples were mapped at 95% identity cut-off using soap2.216. The numbers of fragments that were assigned to the reference sequences were counted so that each fragment's weight equals 1, i.e. a fragment assigned to N different reference sequences contributes 1/N to each reference sequence. Fragment counts of reference sequences were grouped to yield OTU counts. Samples with low sampling effort, i.e., with less than 3,000 fragments mapped to reference genes were removed leaving 229 samples for comparative analyses. OTU counts were normalized by gene length, scaled by the maximum count across all marker genes, and down-sampled using the vegan package7 to the minimum sum of OTU counts across all samples in order to compare species richness between high gene and low gene content groups.
HITChip microarray analyses were performed as described previously8. In short, 16S rRNA genes were amplified the T7prom-Bact-27-for and Uni-1492-rev primers from 10 ng from fecal DNA extracts. On these amplicons an in vitro transcription and subsequent labeling with Cy3 and Cy5 dyes were performed. Labeled RNA was fragmented and hybridized on the arrays at 62.5° C. for 16 h in a rotation oven (Agilent Technologies, Amstelveen, The Netherlands). The arrays were washed, dried, scanned, and the signal intensity data was extracted as described (http://www.agilent.com). Microarray data normalization and analysis were carried out with a set of R-based scripts (http://r-project.org), while making use of a custom designed database, which operates under the MySQL database management system (http://www.mysgl.com).
From the 3,699 unique HITChip probes, the probes that accounted for the top 99.9% of the total signal were selected. These probes were counted for each sample to measure richness, which was between 713 and 1,597 probes per sample. The probes that accounted for the lowest 0.1% of the total signal were regarded as background noise and were not taken into account for further analysis. Probe signal values were used to calculate the inverse Simpson's Diversity index for each sample.
HITChip probes specificity can be assigned to three phylogenetic levels based on 16S rRNA gene sequence similarity: order-like groups, genus-like groups (sequence similarity >90%), and phylotype-like groups (sequence similarity >98%)8. Relative abundances were calculated for each specificity level by summing all signal values of the probes targeting a group and dividing by the total of all probe signals for the corresponding sample. All comparisons between the HGC and LGC individuals were assessed with dependent 2-group Wilcoxon signed rank tests. When statistical tests were performed on a large number of variables the obtained p-values were adjusted by a Bonferroni correction. To place the gene count and BMI marker species (HL and oble, respectively) in HITChip phylogeny, Spearman correlation coefficients were calculated between the metagenomic profiling frequencies and relative abundances of the phylotype-like across 251 samples. A threshold of 0.7 was used to associate 16S to a species.
A 2.1 million-feature custom Roche NimbleGen microarray targeting a 700,000 genes subset of the MetaHit human gut gene catalog9 was designed and manufactured. The subset of genes was prioritized for genes that were observed in more than 20 of the 124 gene catalog samples. DNA extracted from fecal samples were labeled and hybridized according to standard NimbleGen protocols. Data was preprocessed and Shannon diversity index calculated using the RMA implementation under the “oligo” package and the “vegan”7 package, respectively, both available in the statistical programming environment R.
In order to validate the observed biomarkers for low/high gene counts found by sequencing, the data was compared to DNA microarray signals for the same samples and individuals. Thus, the tracer genes for known and unknown species indicated in
For each gut microbial sample, Illumina reads were mapped to a set of 1,506 reference genomes to record genus abundances based on Bergey's taxonomy. A principal coordinate analysis was performed using JSD distance and enterotypes were assigned to each sample as described in10.
Taxonomic assignment of predicted genes for global analysis was carried out using BLASTN to assign reads to a reference genome database at a cut-off of 95% sequence identity and >100 by overlap, unless indicated otherwise. This assignment was used as high confidence assignment on species level. As reference database we used 1,869 available reference genomes from NCBI and the set of draft gastrointestinal genomes from the DACC (http://hmpdacc.org/), both as of the 15.7.2011. The assigned reads to each taxonomic group per sample were rarefied to 5.5 million genes (the size of the smallest sample), on this rarefied matrix taxonomic groups were tested for significant differences in abundance using a Wilcoxon Ranks-Sum test. Multiple testing correction was done by controlling the False Discovery Rate (q<0.05) using the Benjamini-Hochberg method11.
BLASTP was used to search the protein sequences of the predicted genes in the eggNOG database12 and KEGG database13 with e-value≦1×10−5 as described in9, and the NOG/KEGG OG of the best hit was assigned to each gene. The genes annotated by COG were classified into the 25 COG categories, and genes that were annotated by KEGG were assigned to a set of manually determined gut metabolic modules [Falony et al, in prep]. The relative pathway/module abundance of higher order functional categories were calculated from rarefied KO abundances. Modules were deemed present when >=30% of the enzymes were recovered, after manual removing of overly “promiscuous” enzymes (i.e. present in multiple modules) prior to abundance calculation. For higher-level functional assignments, KO abundances were summed and distributed evenly when KOs appeared in multiple categories. Functional differences were calculated with a Wilcoxon Ranks-Sum test and multiple testing correction was done by controlling the False Discovery Rate (q<0.05) using the Benjamini-Hochberg method11.
Genes significantly different in groups of individuals were identified by the Wilcoxon rank sum test coupled to a bootstrapping approach.
70% of the whole cohort (204 individuals) were randomly chosen and genes differentially abundant between LGC and HGC individuals were identified at p=<0.0001 as threshold. This test was repeated 30 times. 30 groups of randomly chosen “extreme” individuals that had <400,000 genes or >600,000 genes were composed and the same test was applied thereto. Genes common to all 60 tests were analyzed further.
For lean and obese individuals of the whole population or stratified by enterotypes, asimilar approach was used by randomly choosing 70% of individuals 30 times and using Wilcoxon rank sum test at p=<0.05.
As only a small part (<10%) of the genes recovered as significantly different in two groups of individuals could be assigned taxonomically by sequence similarity to known reference genomes, an alternative strategy was used to cluster genes of the same species. Such genes are expected to be present at a similar abundance in an individual but at very different abundances in different individuals. The genes that vary in abundance in a coordinated way are thus likely to be from the same species. The genes were clustered according to a profile based binning strategy, using the covariance of their count profiles among the 292 individuals of the cohort. Spearman correlations coefficients were determined pairwise and all the genes that correlated above a given threshold were assigned to the same cluster.
Abundance of a given species in each individual was estimated as a mean abundance of 50 ‘tracer’ genes of each cluster. The values were very close to the mean frequency of all the genes of a cluster.
The analyses were carried out to distinguish between HGC and LGC individuals or lean and obese individuals by a combination of bacterial species. For each combination, only a single decision model was considered. In this very specific regression model weights are only allowed to take the values in. More precisely, the weight of each species in a given combination that belong to the set of the species more frequent in one group is equal to 1 while that of the species that belong to the set of species more frequent in the other group is equal to −1. The weight of each species that is outside of the combination is 0. For each individual, this model yields a score that is called the decisive-bacterial-abundance score. As opposed to the infinite number of regression models, such ternary models are finite and can be exhaustively explored. To select the best models, the cross-validated area under the ROC curve (CV-AUC) criterion14 was used, for it is well adapted to classification models for binary outcome data.
Species Correlated with the BMI Change
For the entire cohort of 292 individuals, 40 individuals (14%) having the highest abundance of a species were compared with at least 125 individuals (42%) having the lowest abundance (all individuals lacking a species were included, when more numerous than 125); these numbers were chosen to allow contrasting the extremes of the distribution while keeping the sample size high enough to reduce the probability of a fortuitous difference in BMI change. For the 169 obese individuals, 30 (18%) having the highest abundance of a species were compared with at least 60 individuals (36%) having the lowest abundance (all individuals lacking a species were included, when more numerous than 60). The differences were calculated with a Student t test, the BMI changes being normally distributed, and multiple testing correction was done by controlling the False Discovery Rate (q<0.05) using the Benjamini-Hochberg method11.
We analyzed the association of 1) the high gene and low gene group and 2) gene count as a continuous trait to quantitative traits applying a linear model adjusting for age and sex.
Correlations between the quantitative traits are shown in
The data was corrected for multiple testing by the Benjamini-Hochberg method11 setting the false discovery rate (FDR) at 10%. The results are displayed in Table 14.
For pair-wise analyses of the enterotype with phenotypes a linear model adjusting for age and sex was applied. The Benjamini-Hochberg method11 was used to correct for multiple testing applied to the three pair-wise comparisons, again setting the false discovery rate (FDR) at10%. The results are displayed in Table 15.
The intestinal bacterial gene content of the enrolled individuals was determined by high throughput Illumina-based sequencing of total fecal DNA. An average of 34.1 million paired-end reads were produced for each sample and, after removing human contamination (˜0.1%, on average), 19.9±6.7 (s.d.) million reads were mapped at a unique position of the reference catalog of 3.3 million genes, requiring >90% identity22; reads mapping at multiple positions (13.4%, on average) were discarded. The abundance of a gene in a sample was estimated by dividing the number of reads that uniquely mapped to that gene by the gene length and by the total number of reads from the sample that uniquely mapped to any gene in the catalog. The resulting set of gene abundances, termed a microbial gene profile of an individual, was used for further analyses.
Comparison of gene profiles across the total study sample of 292 individuals showed a bimodal distribution of bacterial genes (
Low richness of gut microbiota has been reported in patients with inflammatory bowel disorder (IBD) 22,25,26 and in obese individuals17, but the differences of richness within these groups or among non-obese individuals was not previously detected. As the composition of gut microbiota appears to be rather stable over long periods of adulthood27 its richness may well be a characteristic feature of an individual. In mice, the richness appears to be affected by repeated antibiotic treatments (M. J. Blaser, personal communication); host genetics could also play a role, as exemplified by the knockout of the toll-like receptor 5 resulting in altered gut microbiota and the metabolic syndrome, a phenotype transmissible by fecal transplantation of the altered microbiota28. Further studies, focusing specifically on the richness of the gut microbiota across broad cohorts as function of behavior, including food intake, exercise, smoking habits, other pollutants and medication over sufficiently long periods of time might help to elucidate the causes for its variation.
We determined the enterotype of the individuals in our cohort and found that enterotype distribution greatly varies with the gene count (
Both the difference in gene number and the stratification by enterotypes indicate that the LGC and HGC individuals harbor different microbial communities. In order to assess the difference in phylogenetic composition between the two, we combined reference genome mapping with gene abundance data at phylum, genus and species level.
We first examined the general phylogenetic composition at higher taxonomic levels based upon genome size-normalized read abundances that were mapped on publicly available reference genomes and binned at genus and phylum level. 39 genera differed significantly in abundance between the HGC and LGC individuals. While Bacteroides, Parabacteroides, Ruminococcus (specifically R. torques and R. gnavus, of the Clostridium cluster XIV), Campylobacter, and Anaerostipes were more dominant in LGC, 31 genera, including Butyrivibrio, Alistipes, Akkermansia, Coprococcus, and Methanobrevibacter, were significantly linked to HGC. At the phylum level, this phylogenetic shift resulted in a higher abundance of Proteobacteria and Bacteroidetes in LGC individuals versus increased populations of Verrucomicrobia and Euryarchaeota in HGC individuals. An increased abundance of Bacteroides in the LGC individuals is congruent with the dominance of the Bacteroides-driven enterotype in this group. For clarity, it should be mentioned that the prevalent Ruminococcus in the HGC individuals and Ruminococcus/Methanobrevibacter smithii enterotype appears to be of the R.bromii-like group of the Clostridium cluster IV (HITChip results, data not shown).
Next, we studied the specific species that were differentially abundant between LGC and HGC individuals. To this aim, we used a novel, gene-centric approach that enables the visualization of individual-based patterns and avoids artifacts from incomplete genome coverage. In this approach, we identified the genes that were significantly different between the LGC and HGC individuals by the Wilcoxon rank sum test, comparing 204 (70% of total) randomly chosen individuals 30 times. We similarly compared 126 “extreme” individuals, harboring <400 K genes or >600 K genes. 120,723 genes were found in all 60 tests at p<0.0001 and were analyzed further.
We searched for genes that could belong to the same species, by comparing them to all sequenced genomes. At a threshold of 95% identity over at least 90% of the gene length, 10,225 genes (8.5%) were assigned to a total of 97 genomes representing some 73 species (Table 5). However, a vast majority (93.4%) belonged to only 9 species, which were all Firmicutes with a single exception of the main human methanogen, M. smithii. The corresponding species varied significantly in abundance between the LGC and HGC individuals, as illustrated in
Taken together, the analyses highlight the contrast between the distribution of anti-inflammatory species, such as Faecalibacterium prausnitzii, which are more prevalent in HGC individuals and potentially pro-inflammatory, Bacteroides and R. gnavus, associated with IBD and found to be more frequent in LGC individuals.
However, a vast majority (>90%) of the 120,723 genes with significantly differing abundances in the LGC and HGC gene individuals could not be assigned to a sequenced bacterial genome, as the reference gut genome database is not yet complete. These genes must also belong to bacterial species that are present at different abundances in the two types of individuals. We thus attempted to cluster the genes from the same species by a gene abundance-based approach.
We hypothesized that the genes of a given bacterial species should be present at a similar abundance in an individual but should display large variations across a cohort, as species abundance is known to vary immensely among individuals (10- to 10,000-fold)22. The genes that vary in abundance in a coordinated way are thus likely to be from the same species. We tested this hypothesis for the 10,225 taxonomically assigned genes that differ significantly between LGC and HGC individuals, by computing the Spearman correlation coefficients for each gene with all the other genes and grouping those that were correlated above a given threshold. Ninety-two clusters containing at least 2 genes and including collectively 8,594 genes (84% of the total) were found at a Spearman threshold of 0.75. A vast majority of these (8,125; 94.5%) clustered into only 8 groups that included the 9 most highly represented species shown in
76,564 genes (63% of 120,723) were grouped into 1,440 clusters of 2 genes or more at a threshold of 0.85, used to favor the specificity of clustering, but a vast majority (68,952, 90%) was found in only 58 clusters that contained >75 genes. They included 6 of the 9 taxonomically characterized species shown in
Distribution of unknown species across LGC and HGC individuals of the cohort was clearly biased, as illustrated for 7 of them with 50 tracer genes (
To test whether LGC and HGC individuals could be distinguished by bacterial species they harbour we performed a receiver-operator characteristic (ROC) analysis. First, we estimated the abundance of 58 species that were significantly different between LGC and HGC individuals (Table 4a and Table 4b). For each individual, we used these values to compute a score, named Decisive-Bacterial-Abundance (DBA) score, equal to the sum of abundances of the species more frequent in HGC individuals subtracted by the sum of the abundances of species more frequent in LGC individuals. The DBA scores were calculated exhaustively for all combinations of up to 23 species and were used in the ROC analysis; the area under curve (AUC) values for the best combinations are shown in
Characteristics of study materials are given in Table 10. We performed an anthropometric and biochemical phenotyping of multiple interrelated features of LGC and HGC individuals, and identified significant differences between them at a false discovery rate34 of up to 10% (Table 3). This value was used to avoid missing significant associations; a less stringent level, up to 25%, was chosen in a recent and comparable study design. The LGC individuals, who represented 23% of the total study population, included a significantly higher proportion of obese participants and were as a group characterized by a more marked adiposity, as reflected by an increase in fat mass percentage and body weight (Table 3). The adiposity phenotype of LGC people was associated with elevated serum leptin, decreased serum adiponectin, insulin resistance, hyperinsulinaemia, elevated levels of triglycerides and free fatty acids (FFA), decreased HDL-cholesterol and a more marked inflammatory phenotype (increased hsCRP and higher white blood cell counts) than seen in HGC individuals (Table 3). We further tested the significance of our observations by treating the gene counts as a continuous variable and examining its correlation with the anthropometric and biochemical variables. All but two (BMI and weight) of the observed differences between LGC and HGC individuals were found significantly associated with the gene counts (Table 3). Together, these analyses suggest that the LGC individuals are featured by metabolic disturbances known to bring them at increased risk of prediabetes, type 2 diabetes and ischaemic cardiovascular disorders. Similar abnormalities were found in the accompanying paper (Cotillard et al.).
Based upon these results we hypothesize that an imbalance of potentially pro- and anti-inflammatory bacterial species triggers low-grade inflammation and insulin resistance. In parallel, we suggest that an altered gut microbiota of LGC individuals induces the noted increase in serum FIAF levels, eliciting an elevated release of triglycerides and FFA (Table 3), as evidenced by studies in rodent models.
An almost perfect stratification of LGC and HGC individuals can be achieved with a very few bacterial species, suggesting that simple molecular diagnostic tests, based on our other genome, can be developed to identify individuals at risk of common morbidities. Therefore focus on our other genome, which in some respects appears to be more informative than our own, may spearhead development of stratified approaches for treatment and prevention of widespread chronic disorders.
Beyond metabolic dysfunctions, low-grade inflammation as seen in LGC individuals with and without obesity is associated with a plethora of other chronic diseases, which are steadily rising (Bach, 2002). Whether a low gut bacterial richness is common to many or even all of those, as already reported for IBD, could be revealed by exploring gut microbiota at a deep metagenomic level in a broad variety of these afflictions.
Bacteroides
Ruminococcus
Ruminococcus
Ruminococcus gnavus
gnavus
Methanobrevibacter
smithii
Coprococcus
Coprococcus eutactus
eutactus
Clostridium
symbiosum
Clostridium
Clostridium clostridioforme
clostridioforme
Clostridium
ramosum
Sporobacter
termitidis et rel.
Clostridium cluster IV
Oscillospira
guillermondii et rel.
Clostridium cluster IV
Anaerovorax
odorimutans et rel.
Clostridium cluster XI
Butyrivibrio
crossotus et rel.
Clostridium cluster XIVa
Anaerotruncus
colihominis et rel.
Clostridium cluster IV
Sporobacter
termitidis et rel.
Clostridium cluster IV
Bacteroides
splachnicus et rel.
Bacteroidetes
Bacteroidetes
Oscillospira
guillermondii et rel.
Clostridium cluster IV
Ruminococcus
gnavus et rel.
Clostridium cluster XIVa
Butyrivibrio
crossotus et rel.
Clostridium cluster XIVa
Coprococcus
eutactus et rel.
Clostridium cluster XIVa
Coprococcus
eutactus et rel.
Clostridium cluster XIVa
Sporobacter
termitidis et rel.
Clostridium cluster IV
Clostridium
cellulosi et rel.
Clostridium cluster IV
Ruminococcus
obeum et rel.
Clostridium cluster XIVa
Butyrivibrio
crossotus et rel.
Clostridium cluster XIVa
Clostridium
symbiosum et rel.
Clostridium cluster XIVa
Sporobacter
termitidis et rel.
Clostridium cluster IV
Sporobacter
termitidis et rel.
Clostridium cluster IV
1 Jorgensen, T. et al. A randomized non-pharmacological intervention study for prevention of ischaemic heart disease: baseline results Inter99. Eur J Cardiovasc Prey Rehabil 10, 377-386, doi:10.1097/01.hjr.0000096541.30533.82 (2003).
2 WHO. Obesity: preventing and managing the globalepidemic. Report of a WHO consultation. Tech. Rep. Ser. 894 (World Health Organisation, Geneva, 2000).
3 Treuth, M. S., Hunter, G. R. & Kekes-Szabo, T. Estimating intraabdominal adipose tissue in women by dual-energy X-ray absorptiometry. Am J Clin Nutr 62, 527-532 (1995).
4 Matthews, D. R. et al. Homeostasis model assessment: insulin resistance and beta-cell function from fasting plasma glucose and insulin concentrations in man. Diabetologia 28, 412-419 (1985).
5 Manichanh, C. et al. Reduced diversity of faecal microbiota in Crohn's disease revealed by a metagenomic approach. Gut 55, 205-211, doi:gut.2005.073817 [pii] 10.1136/gut.2005.073817 (2006).
6 Li, R. et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966-1967, doi:btp336 [pii] 10.1093/bioinformatics/btp336 (2009).
7 Oksanen, J. et al. vegan: Community Ecology Package. (2012).
8 Rajilic-Stojanovic, M. et al. Development and application of the human intestinal tract chip, a phylogenetic microarray: analysis of universally conserved phylotypes in the abundant microbiota of young and elderly adults. Environ Microbiol 11, 1736-1751, doi:EMI1900 [pii] 10.1111/j.1462-2920.2009.01900.x (2009).
9 Qin, J. et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59-65, doi:nature08821 [pii] 10.1038/nature08821 (2010).
10 Arumugam, M. et al. Enterotypes of the human gut microbiome. Nature 473, 174-180, doi:nature09944 [pii] 10.1038/nature09944 (2011).
11 Benjamini, Y. H., Y. Controlling the false discovery rate: a practical and powerful approach to multiple testning. Journal of the Royal Statistical Society 57, 289-300 (1995).
12 Jensen, L. J. et al. eggNOG: automated construction and annotation of orthologous groups of genes. Nucleic Acids Res 36, D250-254, doi:gkm796 [pii] 10.1093/nar/gkm796 (2008).
13 Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res 40, D109-114, doi:gkr988 [pii] 10.1093/nar/gkr988 (2012).
14 Jiang, D., Huang, J. & Zhang, Y. The cross-validated AUC for MCP-logistic regression with high-dimensional data. Stat Methods Med Res, doi:0962280211428385 [pii] 10.1177/0962280211428385 (2011).
Number | Date | Country | Kind |
---|---|---|---|
12306282.0 | Oct 2012 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2013/071765 | 10/17/2013 | WO | 00 |