This invention refers to a method for determining the genotype of an individual at the 5p13.1 Crohn's disease risk locus, by determining DNA sequence polymorphism located between coordinated 40,300,000 and 40,600,000 of human chromosome, allowing the estimation of its genetic risk to develop Crohn's disease and allowing to tailor drug treatment according to the patients genotype.
Crohn's disease (CD) is a chronic relapsing inflammatory disorder of the intestinal tract, described for the first time in the 1920ies. Lifetime prevalence has increased to current estimates of ˜0.15% in Caucasians. The precise environmental causes underlying this rise remain essentially unknown, but familial clustering and twin-studies clearly identify an inherited component to predisposition. More than ten susceptibility loci have been identified by linkage and/or association studies and convincing causative mutations have been reported, particularly in CARD15 (Schreiber S. et al. Nat Rev Genet. 6:376-388 (2005); Hugot J P et al. Nature 411:599-603 (2001)). As known loci don't fully account for the genetic risk for CD in the present studies a genome-wide association scan (WGA) was performed to contribute to the identification of additional susceptibility loci.
Many of the common human diseases including cancer, hypertension, diabetes, asthma and CD are multifactorial diseases. This means that what determines the fact that some individuals will be afflicted by the disease and others not are a series of environmental that act in concert with a series of genetic risk factors. Risk variants at susceptibility loci (=genetic risk factors) cause the mis-regulation of specific genes which ultimately cause an increased propensity to suffer from the disease.
Identifying the corresponding genetic risk variants for common diseases is presently one of the most important objectives of medical genetics. Indeed these findings pave the way towards individualized, predictive medicine and towards the identification of novel drug targets. Individuals that are genetically predisposed to the disease may alter their behaviour undergo preventive treatment to decrease the risk of becoming sick. Knowing the genetic risk variants of specific individuals may orient the choice of treatment on the basis of their genetically altered molecular biology. Moreover, the products of genetically misregulated genes are prime targets for drug development.
Therefore the object of the present invention was to provide a method for allowing an improved estimation of the genetic risk of an human individual to develop CD.
In this invention, the identification of a novel susceptibility locus for Crohn's disease (CD) located on human chromosome 5p13.1 is described. The 5p13.1 CD risk locus corresponds to a region located between positions ˜40,300,000 and ˜40,600,000 (defined according to the march 2006 assembly of the human genome) on human chromosome 5. The region corresponds to a “gene desert”, i.e. it doesn't contain any protein-encoding gene known at the time or writing. However, the invention demonstrates that genetic variants of the 5p13.1 CD risk locus modulate the expression levels of the closest gene coding for the prostaglandin receptor EP4 or PTGER4. PTGER4 is a very strong candidate gene for CD as its inactivation by genetic (PTGER4 knock-out mouse) or by pharmacological means increases susceptibility to colitis in the mouse, while its activation on the other hand protects mice from developing colitis (Kabashima et al., J Clin Invest. 109:883-893 (2002)).
The object of the present invention was solved by a method for determining the genotype of a human individual at the 5p13.1 Crohn's disease (CD) risk locus, the method comprising:
a) providing a sample from the individual;
b) determining whether a DNA sequence corresponding to a DNA sequence polymorphism located between coordinated 40,300,000 and 40,600,000 of human chromosome (coordinates corresponding to the march 2006 assembly of the human genome) is present in the sample;
c) determining the nature of the DNA sequence polymorphism genotype located between coordinated 40,300,000 and 40,600,000 of human chromosome as it relates to the genetic risk to develop Crohn's disease.
The present invention provides a method for determining the genotype of an individual at the 5p13.1 CD risk locus, the method comprising:
a) obtaining a sample of material containing genomic DNA from the individual, wherein the sample can be any material containing nucleated cells from said individual including blood, buccal swaps, urine as well as any other tissues, and
b) ascertaining:
Further, the present invention provides a method for determining the genotype of an individual at the 5p13.1 CD risk locus, the method comprising:
a) obtaining a sample of material containing RNA from the individual, wherein the sample can be any material containing nucleated cells from said individual including blood, buccal swaps, urine as well as any other tissues, and
b) converting the RNA in cDNA by means of a reverse transcriptase, and
c) ascertaining:
In addition, the present invention provides a method for determining the genotype of an individual at the 5p13.1 CD risk locus, the method comprising:
a) obtaining a sample of material containing genomic DNA from the individual, wherein the sample can be any material containing nucleated cells from said individual including blood, buccal swaps, urine as well as any other tissues, and
b) ascertaining:
Still further, the present invention provides a method for determining the genotype of an individual at the 5p13.1 CD risk locus, the method comprising:
a) obtaining a sample of material containing RNA from the individual, wherein the sample can be any material containing nucleated cells from said individual including blood, buccal swaps, urine as well as any other
b) converting the RNA in cDNA by means of a reverse transcriptase, and
c) ascertaining:
In a preferred method the DNA sequence polymorphism is any of the SNPs (single nucleotide polymorphisms) listed in Table 2. Table 2 gives the identification number of the marker, the position of the SNP on the chromosome according to the march 2006 assembly of the human genome, the frequency of the indicated nucleotide in patients having Crohn's disease and in normal individuals (control group [Ctl]), respectively.
It is further preferred that the method includes
i) the determination if or if not an allele associated with increased risk for Crohn's disease as indicated in Table 2 is present;
ii) the judgment if or if not said individual is having a genetic risk to develop Crohn's disease, based on the information of step i).
In another embodiment of the present invention the method includes
i) the determination if an allele associated with increased risk for Crohn's disease as indicated in Table 2 is present;
ii) the judgment that said individual is having a genetic risk to develop Crohn's disease, if an allele associated with increased risk for Crohn's disease was determined.
In another preferred embodiment the sample is any material containing nucleated cells from said individual including blood, buccal swaps, urine as well as any other tissue.
Further preferred RNA is obtained from said sample and the RNA is converted into cDNA by means of a reverse transcriptase.
According to one embodiment of the present invention the allele associated with increased risk for Crohn's disease is selected from haplotypes consisting of IIIA, IIIC, IIA, IIB, IIC, IVB as indicated in
A further preferred method includes
iii) the determination if a further allele selected from the group consisting of
CARD15, IL23R, OCTN, DLG5, TNFSF15 and ATG16L1 associated with increased risk for Crohn's disease is present in said individual; and
iv) the judgment that said individual is having a further increased genetic risk to develop Crohn's disease, if in addition to the presence of risk alleles at the 5p13.1 Crohn's disease risk locus any one or more of the allele associated with increased risk for Crohn's disease indicated in iii) was determined.
The 5p13.1 CD risk locus encompasses a large number of DNA sequence polymorphisms (DSP) of different types including single nucleotide polymorphisms (SNPs), insertion-deletions (indels), and microsatelles. Many of these are known and compiled in public databases including dbSNP. These DSP are in linkage disequilibrium with each other and define five so-called haplotype blocks. Each block contains a limited number of common haplotypes. Some of these haplotypes increase the risk to develop CD, while others are protective. The present inventors have defined which haplotypes are associated with increased risk (e.g. haplotypes IIIA, IIIC, IIA, IIB, IIC, IVB; see
The genetic composition of an individual at the 5p13.1 CD risk locus can be determined by genotyping the individual using one or preferably several DSP. This can be accomplished using a variety of genotyping methods known by those skilled in the art. Ideally the DSP are chosen to allow unambiguous discrimination of the haplotypes present in the DNA of tested individual.
Knowing the haplotype composition of a given individual at the 5p13.1 CD risk locus will allow an estimation of its risk to develop CD. The risk haplotypes at the 5p13.1 risk locus increase the relative risk by a factor of approximately 1.5. The best prediction will be based on the genotype at the 5p13.1 locus in combination with other known CD genetic risk loci including CARD15, the IL23R, OCTN, DLG5, TNFSF15 and ATG16L1. This is useful as it allows the physician to prescribe preventive behaviour and treatment. Moreover, as the 5p13.1 modulates the expression level of the prostaglandin EP4 receptor, knowledge of the genotype may help the physician to choose for or against medication that acts on this receptor or the corresponding pathway, or to adjust the dose.
The present invention also provides a method for judging a possibility of the onset of Crohn's disease, wherein a sample from a human individual is tested, wherein a human individual in which the DNA sequence located between coordinated 40,300,000 and 40,600,000 of human chromosome (coordinates corresponding to the march 2006 assembly of the human genome) contains an allele associated with increased risk for Crohn's disease as indicated in Table 2 is judged to have a risk of the onset of Crohn's disease. In a preferred embodiment of the method the allele associated with increased risk for Crohn's disease is selected from the CD risk haplotypes consisting of IIIA, IIIC, IIA, IIB, IIC, IVB as indicated in
The present invention also provides the use of a genetic marker located on the human 5p13.1 locus for the judgement whether a human individual has increased risk of the onset of Crohn's disease, wherein said marker is represented by DNA sequence polymorphisms.
In a preferred use the DNA sequence polymorphism is any of the single nucleotide polymorphisms listed in Table 2. Further preferred, said marker is represented by single nucleotide polymorphisms associated with increased risk for Crohn's disease as indicated in Table 2. Still further preferred, said marker is represented by alleles associated with increased risk for Crohn's disease selected from the Crohn's disease risk haplotypes consisting of IIIA, IIIC, IIA, IIB, IIC, IVB as indicated in
The present invention also provides an oligonucleotide for determining the genotype of a human individual at the 5p13.1 Crohn's disease risk locus, selected from the group consisting of:
a) an oligonucleotide comprising from 12 to 30 contiguous nucleotides of the sequence located between coordinated 40,300,000 and 40,600,000 of human chromosome (coordinates corresponding to the march 2006 assembly of the human genome), wherein said oligonucleotide include one position of the SNPs listed in Table 2, and wherein said position is occupied by a nucleotide corresponding to the respective SNPs correlated with the risk of Crohn's disease as listed in Table 2.
b) an oligonucleotide which is entirely complementary to the oligonucleotide of (a).
Throughout the description of the present invention, several terms are used that are specific to the science of this field. For the sake of clarity and to avoid any misunderstanding, these definitions are provided to aid in the understanding of the specification and claims:
Allele: One of a pair, or series, of forms of a gene or non-genic region that occur at a given locus in a chromosome. Alleles are symbolized with the same basic symbol (e.g., B for dominant and b for recessive; B1, B2, Bn for n additive alleles at a locus). In a normal diploid cell there are two alleles of any one gene (one from each parent), which occupy the same relative position (locus) on homologous chromosomes. Within a population there may be more than two alleles of a gene. See multiple alleles. SNPs also have alleles, i.e., the two (or more) nucleotides that characterize the SNP.
Amplification of nucleic acids: refers to methods such as polymerase chain reaction (PCR), ligation amplification (or ligase chain reaction, LCR) and amplification methods based on the use of Q-beta replicase. These methods are well known in the art. Reagents and hardware for conducting PCR are commercially available. Primers useful for amplifying sequences from the disorder region are preferably complementary to, and preferably hybridize specifically to, sequences in the disorder region or in regions that flank a target region therein.
cDNA: refers to complementary or copy DNA produced from an RNA template by the action of RNA-dependent DNA polymerase (reverse transcriptase). Thus, a cDNA clone means a duplex DNA sequence complementary to an RNA molecule of interest, included in a cloning vector or PCR amplified. This term includes genes from which the intervening sequences have been removed.
cDNA library: refers to a collection of recombinant DNA molecules containing cDNA inserts that together comprise essentially all of the expressed genes of an organism or tissue. A cDNA library can be prepared by methods known to one skilled in the art. Generally, RNA is first isolated from the cells of the desired organism, and the RNA is used to prepare cDNA molecules.
Complement of a nucleic acid sequence (complementary sequence): refers to the antisense sequence that participates in Watson-Crick base-pairing with the original sequence.
Gene: Refers to a DNA sequence that encodes through its template or messenger RNA a sequence of amino acids characteristic of a specific peptide, polypeptide, or protein. The term “gene” also refers to a DNA sequence that encodes an RNA product. The term gene as used herein with reference to genomic DNA includes intervening, non-coding regions, as well as regulatory regions, and can include 5′ and 3′ ends. A gene sequence is wild-type if such sequence is usually found in individuals unaffected by the disorder or condition of interest. However, environmental factors and other genes can also play an important role in the ultimate determination of the disorder. In the context of complex disorders involving multiple genes (oligogenic disorder), the wild type, or normal sequence can also be associated with a measurable risk or susceptibility, receiving its reference status based on its frequency in the general population.
GeneMaps: are defined as groups of gene(s) that are directly or indirectly involved in at least one phenotype of a disorder (some non-limiting example of GeneMaps comprises varius combinations of genes from tables 8-10). As such, GeneMaps enable the development of synergistic diagnostic products, creating “theranostics”.
Genotype: Set of alleles at a specified locus or loci.
Haplotype: The allelic pattern of a group of (usually contiguous) DNA markers or other polymorphic loci along an individual chromosome or double helical DNA segment. Haplotypes identify individual chromosomes or chromosome segments.
The presence of shared haplotype patterns among a group of individuals implies that the locus defined by the haplotype has been inherited, identical by descent (IBD), from a common ancestor. Detection of identical by descent haplotypes is the basis of linkage disequilibrium (LD) mapping. Haplotypes are broken down through the generations by recombination and mutation. In some instances, a specific allele or haplotype may be associated with susceptibility to a disorder or condition of interest, e.g., Crohn's disease. In other instances, an allele or haplotype may be associated with a decrease in susceptibility to a disorder or condition of interest, i.e., a protective sequence.
Host: includes prokaryotes and eukaryotes. The term includes an organism or cell that is the recipient of an expression vector (e.g., autonomously replicating or integrating vector).
Hybridizable: nucleic acids are hybridizable to each other when at least one strand of the nucleic acid can anneal to another nucleic acid strand under defined stringency conditions. In some embodiments, hybridization requires that the two nucleic acids contain at least 10 substantially complementary nucleotides; depending on the stringency of hybridization, however, mismatches may be tolerated. The appropriate stringency for hybridizing nucleic acids depends on the length of the nucleic acids and the degree of complementarity, and can be determined in accordance with the methods described herein.
Identity by descent (IBD): Identity among DNA sequences for different individuals that is due to the fact that they have all been inherited from a common ancestor. LD mapping identifies IBD haplotypes as the likely location of disorder genes shared by a group of patients.
Identity: as known in the art, is a relationship between two or more polypeptide sequences or two or more polynucleotide sequences, as determined by comparing the sequences. In the art, identity also means the degree of sequence relatedness between polypeptide or polynucleotide sequences, as the case may be, as determined by the match between strings of such sequences. Identity and similarity can be readily calculated by known methods.
Isolated nucleic acids: are nucleic acids separated away from other components (e.g., DNA, RNA, and protein) with which they are associated (e.g., as obtained from cells, chemical synthesis systems, or phage or nucleic acid libraries). Isolated nucleic acids are at least 60% free, preferably 75% free, and most preferably 90% free from other associated components. In accordance with the present invention, isolated nucleic acids can be obtained by methods described herein, or other established methods, including isolation from natural sources (e.g., cells, tissues, or organs), chemical synthesis, recombinant methods, combinations of recombinant and chemical methods, and library screening methods.
Linkage disequilibrium (LD): the situation in which the alleles for two or more loci do not occur together in individuals sampled from a population at frequencies predicted by the product of their individual allele frequencies. In other words, markers that are in LD do not follow Mendel's second law of independent random segregation. LD can be caused by any of several demographic or population artifacts as well as by the presence of genetic linkage between markers. However, when these artifacts are controlled and eliminated as sources of LD, then LD results directly from the fact that the loci involved are located close to each other on the same chromosome so that specific combinations of alleles for different markers (haplotypes) are inherited together. Markers that are in high LD can be assumed to be located near each other and a marker or haplotype that is in high LD with a genetic trait can be assumed to be located near the gene that affects that trait. The physical proximity of markers can be measured in family studies where it is called linkage or in population studies where it is called linkage disequilibrium.
LD mapping: population based gene mapping, which locates disorder genes by identifying regions of the genome where haplotypes or marker variation patterns are shared statistically more frequently among disorder patients compared to healthy controls. This method is based upon the assumption that many of the patients will have inherited an allele associated with the disorder from a common ancestor (IBD), and that this allele will be in LD with the disorder gene.
Locus: a specific position along a chromosome or DNA sequence. Depending upon context, a locus could be a gene, a marker, a chromosomal band or a specific sequence of one or more nucleotides.
Markers: an identifiable DNA sequence that is variable (polymorphic) for different individuals within a population. These sequences facilitate the study of inheritance of a trait or a gene. Such markers are used in mapping the order of genes along chromosomes and in following the inheritance of particular genes; genes closely linked to the marker or in LD with the marker will generally be inherited with it. Two types of markers are commonly used in genetic analysis, microsatellites and SNPs.
Microsatellite: DNA of eukaryotic cells comprising a repetitive, short sequence of DNA that is present as tandem repeats and in highly variable copy number, flanked by sequences unique to that locus.
Mutant sequence: if it differs from one or more wild-type sequences. In some cases, the individual carrying this allele has increased susceptibility toward the disorder or condition of interest. In other cases, the mutant sequence might also refer to an allele that decreases the susceptibility toward a disorder or condition of interest and thus acts in a protective manner. The term mutation may also be used to describe a specific allele of a polymorphic locus.
Non-conservative variants: are those in which a change in one or more nucleotides in a given codon position results in a polypeptide sequence in which a given amino acid residue in a polypeptide has been replaced by a non-conservative amino acid substitution. Non-conservative variants also include polypeptides comprising non-conservative amino acid substitutions.
Nucleic acid or polynucleotide: purine- and pyrimidine-containing polymers of any length, either polyribonucleotides or polydeoxyribonucleotide or mixed polyribo polydeoxyribonucleotides. This includes single-and double-stranded molecules, i.e., DNA-DNA, DNA-RNA and RNA-RNA hybrids, as well as protein nucleic acids (PNA) formed by conjugating bases to an amino acid backbone. This also includes nucleic acids containing modified bases.
Nucleotide: a nucleotide, the unit of a DNA molecule, is composed of a base, a 2′-deoxyribose and phosphate ester(s) attached at the 5′ carbon of the deoxyribose. For its incorporation in DNA, the nucleotide needs to possess three phosphate esters but it is converted into a monoester in the process. Operably linked: means that the promoter controls the initiation of expression of the gene. A promoter is operably linked to a sequence of proximal DNA if upon introduction into a host cell the promoter determines the transcription of the proximal DNA sequence(s) into one or more species of RNA. A promoter is operably linked to a DNA sequence if the promoter is capable of initiating transcription of that DNA sequence.
Phenotype: any visible, detectable or otherwise measurable property of an organism such as symptoms of, or susceptibility to, a disorder.
Polymorphism: occurrence of two or more alternative genomic sequences or alleles between or among different genomes or individuals at a single locus. A polymorphic site thus refers specifically to the locus at which the variation occurs. In some cases, an individual carrying a particular allele of a polymorphism has an increased or decreased susceptibility toward a disorder or condition of interest.
Probe or primer: refers to a nucleic acid or oligonucleotide that forms a hybrid structure with a sequence in a target region of a nucleic acid due to complementarity of the probe or primer sequence to at least one portion of the target region sequence. Protein and polypeptide: are synonymous. Peptides are defined as fragments or portions of polypeptides, preferably fragments or portions having at least one functional activity (e.g., proteolysis, adhesion, fusion, antigenic, or intracellular activity) as the complete polypeptide sequence.
Recombinant nucleic acids: nucleic acids which have been produced by recombinant DNA methodology, including those nucleic acids that are generated by procedures which rely upon a method of artificial replication, such as the polymerase chain reaction (PCR) and/or cloning into a vector using restriction enzymes.
Sample: as used herein refers to a biological sample, such as, for example, tissue or fluid isolated from an individual or animal (including, without limitation, plasma, serum, cerebrospinal fluid, lymph, tears, nails, hair, saliva, milk, pus, and tissue exudates and secretions) or from in vitro cell culture-constituents, as well as samples obtained from, for example, a laboratory procedure.
Single nucleotide polymorphism (SNP): variation of a single nucleotide. This includes the replacement of one nucleotide by another and deletion or insertion of a single nucleotide. Typically, SNPs are biallelic markers. For example, SNP A\C may comprise allele C or allele A. Thus, a nucleic acid molecule comprising SNP A\C may include a C or A at the polymorphic position. For a combination of SNPs, the term “haplotype” is used, e.g. the genotype of the SNPs in a single DNA strand that are linked to one another. In certain embodiments, the term “haplotype” is used to describe a combination of SNP alleles, e.g., the alleles of the SNPs found together on a single DNA molecule. In specific embodiments, the SNPs in a haplotype are in linkage disequilibrium with one another.
Sequence-conservative: variants are those in which a change of one or more nucleotides in a given codon position results in no alteration in the amino acid encoded at that position (i.e., silent mutation).
Substantially homologous: a nucleic acid or fragment thereof is substantially homologous to another if, when optimally aligned (with appropriate nucleotide insertions and/or deletions) with the other nucleic acid (or its complementary strand), there is nucleotide sequence identity in at least 60% of the nucleotide bases, usually at least 70%, more usually at least 80%, preferably at least 90%, and more preferably at least 95-98% of the nucleotide bases. Alternatively, substantial homology exists when a nucleic acid or fragment thereof will hybridize, under selective hybridization conditions, to another nucleic acid (or a complementary strand thereof). Selectivity of hybridization exists when hybridization which is substantially more selective than total lack of specificity occurs. Typically, selective hybridization will occur when there is at least about 55% sequence identity over a stretch of at least about nine or more nucleotides, preferably at least about 65%, more preferably at least about 75%, and most preferably at least about 90%. The length of homology comparison, as described, may be over longer stretches, and in certain embodiments will often be over a stretch of at least 14 nucleotides, usually at least 20 nucleotides, more usually at least 24 nucleotides, typically at least 28 nucleotides, more typically at least 32 nucleotides, and preferably at least 36 or more nucleotides.
In one aspect, the present invention provides a method to determine the genotype of an individual at the 5p13.1 CD risk locus by analyzing its genomic DNA. The method includes obtaining a sample of material containing genomic DNA from the individual and genotyping it for DSP/markers mapping between coordinates 40,300,000 and 40,600,000 of human chromosome 5 (coordinates corresponding to the march 2006 assembly of the human genome). The markers can be any single or combination of microsatellite markers, single nucleotide polymorphisms (SNPs) or insertion-deletions (indels). Many of these are listed in public databases including dbSNP, but additional ones can easily be generated by the person skilled in the art by re-sequencing the corresponding region from one or more individuals. Based on the genotype of these markers and given the information presented in later sections of the present patent the person skilled in the art can determine whether the individuals has a genotype that increases or decreases the risk to have CD, or whether a CD patient should be administered drugs that affect the function of the PTGER4 receptor or not. The sample can be any material containing nucleated cells from said individual.
There are several methods known by those skilled in the art for determining the genotype of an individual at a DSP. These include the amplification of a DNA segment encompassing the polymorphism by means of the polymerase chain reaction and interrogate the variant nucleotide position by means of allele specific hybridization, or the 3′exonuclease assay (Taqman assay), or the use of allele-specific restriction enzymes, or direct sequencing, or the oligonucleotide ligation assay, or pyrosequencing, or the invader assay, or minisequencing, or DHPLC, or SSCP, or combinations of these methods. Alternatively the gene sequence and mutation can be ascertained by means of allele specific PCRs using primers that are specific for either the allele. This list of methods is not meant to be exclusive, but just to illustrate the diversity of available methods. Some of these methods can be performed in microarray format.
In another aspect, the present invention provides a method for determining the genotype of individual at the 5p13.1 CD susceptibility locus by analyzing its RNA. The method includes obtaining a sample of material containing RNA from the individual and genotyping it for polymorphic markers mapping between coordinates 40,300,000 and 40,600,000 of human chromosome 5 (coordinates corresponding to the march 2006 assembly of the human genome). The sample can be any material containing nucleated cells from said individual. There are several methods known by those skilled in the art for determining whether a particular nucleotide sequence is present in a RNA sample. These include the conversion of the RNA in cDNA by means of a reverse transcriptase, and the application of the methods mentioned above or variants thereof that are known by those skilled in the art to genotype a given polymorphism.
Genotyping for the whole genome scan was performed on a Illumina HumanHap300 Genotyping Beadchip (Gunderson K. L. et al, Nat Genet. 37:549-554 (2005)). Genotyping of individual SNPs was performed on an ABI7900HT Sequence Detection System using TaqMan MGB probes from “Pre-designed SNP Genotyping” or “Custom TaqMan SNP Genotyping” Assays (Applied Biosystems, Foster City, Calif.).
Association analyses were conducted using Fisher's exact test (whole genome scan) or chi-squared tests of independence (confirmation analysis). The logistic regression method of Setakis et al. (Genome Research 16: 290-296 (2006)) was applied to test for the possible effect of population structure on the most significant association results. The 110 control markers included in the logistic regression had 100% genotype success rate with minor allele frequency >30%, and no two markers were within 20 Mb. To test for an effect of block I conditional on the effect of an adjacent block II, the proportion of I haplotype clades nested within a given II clade (f.i. proportion of IA, IB and IC within IIA) was compared between cases and controls by chi-squared. Chi-squared values (and d.f.) were summed across II clades to yield an overall (I|II) test statistic.
The database genome-wide expression analysis data was provided by W. Cookson (Imperial College, London). Briefly, expression data were generated from RNA extracted from EBV-transformed cells from 378 genotyped offspring in nuclear families. Annotations for individual transcripts on the Affymetrix arrays were extracted from the Affymetrix NetAffx database (www.affymetrix.com). Data from the gene expression experiment was normalized together using the RMA (Robust Multi-Array Average) package (Irizarry R. A. et al., Biostatistics 4: 249-264 (2003), Bolstad B. M. et al., Bioinformatics 19: 185-193 (2003)) to remove any technical or spurious background variation. An inverse normalization transformation step was also applied to each trait to avoid any outliers. A variance components method was used to estimate heritability of each trait using the Merlin-regress (RandomSample option) (Abecasis G. R. et al., Nat Genet 30: 97-101 (2002); Sham P. C. et al., Am J Hum Genet 71: 238-253 (2002)). For PTGER4, a mean quantitative expression value of −0.017 and a variance of 0.722 was obtained while the heritability estimate for PTGER4 estimated using the sibship data was 0.844. Association analysis was applied with Merlin (FASTASSOC option). An additive effect for SNPs was estimated and its significance tested using a score test that adjusts for familiality and takes into account uncertainty in the inference of missing genotypes.
Genotype data from the Illumina HumanHap300 Genotyping Beadchip were obtained on 547 Caucasian CD patients from Belgium and compared to genotypes for 928 healthy controls from Belgium and France. Genotype call rates were>93% for all individuals included in the study. Of the total 317,497 SNPs available, 5,615 with genotyping success rate of less than 91% or deviating from Hardy-Weinberg proportions in controls (Fisher's exact test p≦10−3) were eliminated from further analysis as it is known that less reliable markers generate spurious associations. For the remaining 311,882 SNPs, we compared allele frequencies between cases and controls as outlined below.
On chromosome 5p13.1, a region of approximately 250 Kb was identified that contained six markers with p<10−6 in the association test. This region has not previously been reported as a CD susceptibility locus. 10 markers from the regions of IL23R and 5p13.1 were selected for confirmation genotyping in up to 1,266 additional Caucasian CD patients and 559 additional controls. The IL23R locus was included in the confirmation genotyping. The associations at these two loci were clearly replicated with p-values as low as 4.2×10−7 at the IL23R and 3.7×10−4 at 5p13.1 (Table 1). In the combined data from the WGA and replication studies, p-values as low as 2.2×10−18 at IL23R and 2.1×10−12 at the 5p13.1 locus were obtained. In addition, trios with non-affected parents for the same SNPs were genotyped to perform a transmission disequilibrium test (TDT). The 10 SNPs were typed on 137 trios with affected offspring included in the case-control study, while two of the 5p13.1 SNPs were typed on an additional 291 independent trios originating also from Belgium. Significant over-transmission of the associated alleles were found at both loci, thus providing additional confirmatory evidence in support of the IL23R1 and 5p13.1 susceptibility loci (Table 1).
To further characterize the novel 5p13.1 locus, a subset of 1,092 CD patients and 374 Belgian controls were genotyped for 111 markers (Table 2)(average interval: 2.3 Kb) spanning the 250 Kb segment. The most likely linkage phase for each individual was determined using PHASE, and the corresponding haplotype frequencies was used to quantify the level of linkage disequilibrium (LD) between all marker pairs. The 250 Kb encompass five clearly delineated LD blocks, the central one (block III) being the largest and spanning 122 Kb (
No known genes or CpG islands were found within the region of association on 5p13.1 after examination with the Ensembl and UCSC genome browsers. The region has an average G+C content of 38%, and an excess of interspersed repeats given GC content (58.36% vs 42.3%), which is mainly due to an excess of LINE1's (33.05% vs 19.6%) and LTR elements (15.36% vs 7.70%). It contains 98 Phastcons conserved elements. It is part of a 1.25 Mb gene desert 30 between DAB2 (850 Kb distally from the block) and PTGER4 (270 Kb proximally from the block). Interestingly several of the genes flanking the region have been implicated in pathogenesis of CD, or are related to genes that have been implicated in the disease. These include a member of the caspase recruitment domain family (CARD6), three complement factors (C6, C7 and C9), and—most notably—the prostaglandin receptor EP4 (PTGER4), which resides closest to the group of disease associated markers.
One hypothesis is that the disease-associated region contains cis-acting regulatory elements that control the expression levels of the causal gene(s) located in the vicinity, and that the causal variants modulate the activity of these elements. As a first step to test this, the effect of SNPs in the disease-associated region on the expression levels of neighbouring genes was studied. To that end a database of genome-wide gene expression (Affymetrix HG-U133 Plus 2.0 chips) measured in EBV-transformed lymphoblastoid cell lines from 378 individuals genotyped with the Illumina HumanHap300 Genotyping Beadchip was exploited. Remarkably, seven of the 26 Illumina markers spanning 264 Kb coinciding precisely with the CD-associated region yielded p-values between 6.7×10−5 and 1×10−3 for PTGER4 (
CD is the most common form of inflammatory bowel disease (IBD), the other being ulcerative colitis (UC). In the studies of the present invention a cohort of 246 Belgian UC patients (Caucasians) was genotyped for IL23R (rs11209026), ATG16L1 (rs2241880) and the novel 5p13.1 locus (rs4613763). A significant association was found for IL23R (p=1.2×10−3; OR: 2.51) but not for ATG16L1 (p=0.78). There was no effect of the novel 5p13.1 locus on UC (p=0.54). While additional studies will be needed to exclude completely a role in UC, these results suggests that the principal susceptibility effects of the 5p13.1 locus are for CD. The restriction to CD risk observed for ATG16L1 and the 5p13.1 locus is similar to that found for CARD15.
The present invention describes the localisation of a novel major susceptibility locus for CD on 5p13.1 by WGA. The region of strongest association coincides with a gene desert devoid of known protein-coding genes. The observed effect may be mediated by as of yet unknown transcripts mapping within the region. As a matter of fact limited numbers of spliced and unspliced ESTs originating from the HT1080 fibrosarcoma cell line or medulla (e.g. BG182136, BG184600) map to the region. An alternative explanation, however, is that the disease-associated region contains cis-acting elements controlling the expression of more distant genes. The present invention provides evidence in support of this hypothesis by demonstrating that genetic variants in the CD-associated region differentially regulate the expression levels of PTGER4, the closest known gene located at 270 Kb proximally. PTGER4 is a strong candidate gene for CD as it is known that knock-out (KO) mice develop severe colitis upon dextran sodium sulphate treatment contrary to mice deficient in either of the seven other types of prostanoid receptors. Increased susceptibility to colitis is also observed in wild-type mice administered an EP4-selective antagonist, while EP4-selective agonist are protective. In particular, it was observed that the CD susceptibility allele at marker rs4495224 is associated with increased PTGER4 transcript levels in lymphoblastoid cell lines. This finding establishes a direct link between disease susceptibility and PTGER4 expression, although the direction of the effect apparently contradicts the results in KO mice. Detailed studies of the effect of genetic variants in the disease-associated region on PTGER4 expression in different tissues and of a possible connection between PTGER4 levels and CD susceptibility are certainly needed and work towards that goal is in progress. The hypothesis that the 5p13.1 CD-susceptibility locus operates by modulating PTGER4 expression levels could—at least in theory—be tested by replacing the corresponding murine sequences with the human orthologous variants and quantitatively complement the murine KO allele. The present results suggest that the 5p13.1 effect on CD could result from the combined action of multiple susceptibility variants. Extensive sequencing of the most common haplotypes in the region of association is being conducted towards their identification.
0.915#
§Chromosomal position on march 2006 assembly.
#allelic frequency of risk allele;
&number of individuals with genotype;
$p-value of Hardy-Weinberg proportions (Fisher's exact test).
εallelic frequency of risk allele;
£number of individuals with genotype;
%p-value of allelic association (chi-squared test);
@number of genotyped trios;
φp-value of segregation distortion (one-sided chi-squared test)
Number | Date | Country | Kind |
---|---|---|---|
07103460.7 | Mar 2007 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP08/52497 | 2/29/2008 | WO | 00 | 12/30/2009 |