Inflammatory Bowel Disease or “IBD” is a collective term used to describe diseases including Crohn's disease (CD), ulcerative colitis (UC), microscopic colitis, and indeterminate colitis. Most IBD can be categorized as either CD or UC. With current diagnostic approaches, approximately 60% of IBD patients are classified as CD, 30% as UC, and 10% as indeterminate colitis (IC). The occurrence of IBD is estimated to be as high as up to approximately 2,000,000 Americans, at a cost of greater than $2 billion dollars annually.
CD is characterized by discontinuous transmural inflammation that can involve any part of the gastrointestinal (GI) tract, although the terminal ileum and proximal colon are most commonly involved. This inflammation can result in strictures, microperforations, and fistulae. The inflammation is noncontiguous and thus can produce skip lesions throughout the bowel. Histologically, CD can have either transmural lymphoid aggregates or non-necrotizing granulomas. Although granulomas are pathognomonic, they are seen in only 40% of patients with CD. In contrast, UC is characterized by continuous superficial inflammation limited to the colon, beginning in the rectum and extending proximally.
Both CD and UC are chronic and most frequently have their onset in early adolescence or early adult life. The cause of IBD is unclear, though it is speculated that both environmental and genetic factors play a role. See Collins, P. et al, Ulcerative colitis: Diagnosis and Management” BMJ Vol. 333, 12 Aug. 2006 and Hanauer, S. Inflammatory Bowel Disease: Epidemiology, pathogenesis, and Therapeutic Opportunities. Inflamm. Bowel Dis. 2006 January; 12 Suppl. 1:53-9. Review. The most common symptom of both UC and CD is diarrhea, sometimes accompanied by abdominal cramps, tenesmus (straining at stool), blood, fever, fatigue, and loss of appetite. Some patients have alternating periods of remission with relapse or flare. Other patients have continuous symptoms without remission due to continued inflammation. The severity and responsiveness to treatment for IBD varies widely from individual to individual.
Diagnosis
The diagnosis of UC or CD is established by finding characteristic intestinal ulcerations and excluding alternative diagnoses, such as enteric infections or ischemia. Active disease in UC is characterized by the endoscopic appearance of superficial ulcerations, friability, a distorted mucosal vascular pattern, and exudate. Patients with severely active disease can have deep ulcers and friability that result in spontaneous bleeding. The typical distribution of disease is continuous from the rectum proximally. However, patients with partially treated UC may have discontinuous or patchy involvement.
The ulcerations of CD may appear aphthoid, but could also be deep and serpiginous. Skip areas, a “cobblestone” appearance, pseudopolyps, and rectal sparing are characteristic findings. Air contrast barium enema, small-bowel series, or colonoscopy may demonstrate these typical lesions. On a small-bowel series, CD often is manifested by separation of bowel loops and a narrowed-terminal ileal lumen, the so-called “string sign.”
Histologic features of UC include disease limited to the mucosa and submucosa, mucin depletion, ulcerations, exudate, and crypt abscesses. In CD, non-necrotizing granulomas, transmural lymphoid aggregates, and microscopic skip lesions can be seen. Typical lesions of CD also may be seen in the upper gastrointestinal tract. The inflammation is localized in the ileocecal region in 50% of cases, the small bowel in 25% of cases, the colon in 20% of cases, and the upper gastrointestinal tract or perirectum in 5%.
Assessment of Disease Activity
Disease activity including response to treatment or remission of disease in patients having UC may be assessed using the Clinical Activity Disease Index developed in 1955 by Truelove and Witts (See “Cortisone in ulcerative colitis: final report on a therapeutic trial,” BMJ 1955;2:1041-1048; See also Table 1). Patients with fulminant or toxic colitis usually have more than 10 bowel movements per day, continuous bleeding, abdominal distention and tenderness, and radiologic evidence of edema and possibly bowel dilation.
Patients with fewer than all 6 of the above criteria for severe activity have moderately active disease.
The severity of disease in CD patients may be determined using several clinical disease activity indices. For example, the Crohn's Disease Activity Index (CDAI) developed by Best et al. is often used in clinical trials to measure disease activity. (See Best W R, Becktel J M, Singleton J W. “Rederived values of the eight coefficients of the Crohn's Disease Activity Index (CDAI),” Gastroenterology. 1979;77:843-846; Hyams J S, et al., “Development and Validation of a Pediatric Crohn's Disease Activity Index” J. Pediatric Gastroenterol. Nutr. 1991; 12:439-47; Hanauer S P et al, “Maintenance infliximab for Crohn's disease, the ACCENT I Randomized Trial” Lancet 2002; 359:1541-9, both incorporated herein by reference.) The index consists of eight factors, each summed after adjustment with a weighting factor. The components of the CDAI and weighting factors are listed in Table 2:
Remission of CD is defined as an absolute value of the CDA1 of less than 150, while severe disease is defined as a value of greater than 450 in adults. Most major research studies on medications in CD define response as a fall of the CDAI of greater than 70 points. In pediatric patients, disease activity is measured in clinical traials using the PCDAI, and remission is defined as an absolute value of 10 or less, with moderate disease defined as greater than or equal to 30. Response in pediatric patients is defined as a fall of the PCDAI of 12. 5 points.
Alternatively, the Harvey-Bradshaw index may be used to assess disease activity. The Harvey-Bradshaw index was devised in 1980 as a simpler version of the CDAI for data collection purposes. The index is described in Harvey R, Bradshaw J (1980). “A simple index of Crohn's-disease activity.” Lancet 1 (8167): 514, incorporated herein by reference. It consists of only clinical parameters listed in Table 3.
In addition, the PCDAI index is well-established for defining remission and mild, moderate and severely active disease in pediatric disease, as described by Hyams J S, et al., “Development and Validation of a Pediatric Crohn's Disease Activity Index” J. Pediatric Gastroenterol. Nutr. 1991; 12:439-47, incorporated herein by reference.
Therapeutic Treatment of IBD
The current approach to the treatment of CD is sequential: first to treat acute disease, then to maintain remission. The initial treatment is directed towards treatment of infection and reduction of inflammation. Current options for induction of remission in IBD include 5-aminosalicylic acid (5-ASA) drugs, corticosteroids, methotrexate, and infliximab. Options for maintenance of remission include mesalamine, the immunomodulators 6-mercaptopurine/azathioprine (6-MP/AZA), methotrexate and infliximab. Once remission is induced, the goal of treatment becomes maintenance of remission, avoidine the return of active disease, or “flares.” Where drug therapy fails, surgery may be required.
The most common first line regiment includes induction of remission with prednisone, and maintenance of remission with 6-MP/AZA or 5-ASA. However, this treatment yields a steroid-free remission rate of only fifty percent at one year, and a significant portion of patients fail to respond to first line therapy. To date, there are currently no established clinical tests for predicting response to first line therapy, and newly diagnosed patients must first be subjected to first line therapy, despite only a 50% chance of a successful outcome. In the absence of a reliable test to predict response to therapy, patients are empirically offered agents for induction and maintenance of remission largely based upon disease severity and location. As the effectiveness of any one agent is typically on the order of 50% to 80%, this leads to a substantial number of patients receiving a series of ineffective agents, with attendant side effects, before an effective regimen is identified.
The two most widely used drug families for IBD are steroids and 5-aminosalicylic acid (5-ASA) drugs, both of which reduce inflammation of the affected parts of the intestines. A non-limiting review of therapeutics commonly used for the treatment of IBD follows below.
Steroids
Corticosteroids are used primarily for treatment of moderate to severe flares of CD. The most commonly prescribed oral steroid is prednisone, which is typically dosed at 1.0 mg/kg for induction of remission. Intravenous steroids are used for cases refractory to oral steroids, or where the patient cannot take oral steroids. Budesonide (formulated as Entocort) is an oral corticosteroid with fewer systemic adverse effects due to 90% first-pass metabolism by the liver. Budesonide is effective as a conventional corticosteroid treatment for distal ileal and right colonic disease, but is less potent in transverse and distal colonic disease. Budesonide is also useful when used in combination with antibiotics for active CD.
Aminosalicylates
5-aminosalicylic acid (5-ASA) drugs are also effective in inducing and maintaining remission for patients with UC, and may have a modest effect in some patients with CD. The 5-ASAs include mesalazine or mesalamine, which is marketed in the forms Asacol, Pentasa, Salofalk, Dipentum and Rowasa and, sulfasalazine (Azulfidine, Azulfidine EN-Tabs; Salazopyrin EN-Tabs, SAS in Canada; salazosulfapyridine, salicylazosulpapyridine), which is converted to 5-ASA and sulfapyridine by intestinal bacteria. The sulfapyridine may also have some therapeutic effect in addition to the 5-ASA. Two other aminosalicylates, olsalazine sodium (Dipentum) consisting of two 5-ASA moieties connected by an azobond, and balsalazide disodium (Colazal), a 5-ASA moiety attached to an inert molecule by an azobond, may be used to treat CD or UC.
Immunosuppressive Medications
Immunosuppressive medications may also be used to treat patients with moderate to severe IBD. These include, for example, azathioprine and its active metabolite 6-mercaptopurine. Immunosuppressive drugs such as 6-mercaptopurine may be used for long-term treatment of IBD, and are particularly used for patients dependent on chronic high-dose steroid therapy. Azathioprine is a prodrug for 6-mercaptopurine, which is converted into 6-methylmercaptopurine by the enzyme thiopurine methyltransferase (TPMT) or 6-thioguanine by the enzyme hypoxanthine phosphoribosyltransferase.
Methotrexate is another immunosuppressive medication effective for induction and maintenance of remission in CD. Alternatively, cyclosporine may be used in patients with severe UC. Approximately 50% to 80% of patients refractory to intravenous corticosteroid treatment may avoid surgical treatment such as colectomy with intravenous cyclosporine treatment. Tacrolimus and mycophenolate mofetil may also be used as second-line immunosuppressive options.
TNF-Alpha Antagonists
Remicade is the first of a new class of agents for the treatment of Crohn's disease that block activity of a key biologic response mediator called tumour necrosis factor alpha (TNF-alpha). Overproduction of TNF-alpha leads to inflammation in autoimmune conditions such as Crohn's disease. It is believed that Remicade reduces intestinal inflammation in patients with Crohn's disease by binding to and neutralising TNF-alpha on the cell membrane and in the blood. Remicade is indicated for treatment of severe, active Crohn's disease in patients who have not responded despite a full and adequate course of therapy with a corticosteroid and/or an immunosuppressant, and as a treatment of fistulizing Crohn's disease in patients who have not responded despite a full and adequate course of therapy with conventional treatment.
Due to the side effects of first line therapy, the cost of treatment, and the delay in improving the quality of living among those suffering from IBD, there is an urgent and unmet need for determining the most effective course of treatment for IBD patients.
The instant disclosure generally relates to a method for classifying an individual having or suspected of having an inflammatory bowel disease as a responder or a non-responder to first-line therapy for the inflammatory bowel disease, wherein the first line therapy is one of 5-aminosalicylic acid (5-ASA) drugs, corticosteroids, methotrexate, or infliximab. The method generally comprises the steps of identifying an individual having or suspected of having an inflammatory bowel disease, such as Crohn's disease, obtaining a biological sample from the individual, isolating mRNA from the biological sample, determining the mRNA levels of one or more genes identified in any of Tables 4-8 to obtain a gene expression profile and comparing the gene expression profile to a suitable control such that the individual may be classified as a responder or a non-responder to first-line therapy. The control may be, for example, the gene expression profile of sample obtained from known responders or non-responders.
In one embodiment, gene expression is determined by PCR. In yet another embodiment, gene expression is determined by a technique using hybridization, for example, to a oligonucleotide of a pre-determined sequence comprising DNA, RNA, cDNA, PNA, genomic DNA, or synthetic oligonucleotides.
In yet another embodiment, gene expression may be obtained by detection and/or measurement of the gene product, where the gene product is known or determined to reasonably correlate with gene expression.
The instant disclosure further relates to a gene expression system for identifying responders and non-responders to first line treatment for an inflammatory bowel disease in individuals having or suspected of having the disease, comprising a solid support having one or more oligonucleotides affixed to said solid support wherein the one or more nucleotides further comprises at least one sequence selected from those listed in Table 4, 5, 6, 7, or 8. The gene expression system may further comprise one or more normalization sequences and/or a reference standard. In one embodiment, the solid support comprises an array selected from the group consisting of a chip array, a plate array, a bead array, a pin array, a membrane array, a solid surface array, a liquid array, an oligonucleotide array, a polynucleotide array, a cDNA array, a microfilter plate, a membrane or a chip.
Definitions
Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Singleton et al., Dictionary of Microbiology and Molecular Biology 2nd ed., J. Wiley & Sons (New York, N.Y. 1994), provide one skilled in the art with a general guide to many of the terms used in the present application.
For purposes of the present invention, the following terms are defined below.
The term “array” or “microarray” in general refers to an ordered arrangement of hybridizable array elements such as polynucleotide probes on a substrate. An “array” is typically a spatially or logically organized collection, e.g., of oligonucleotide sequences or nucleotide sequence products such as RNA or proteins encoded by an oligonucleotide sequence. In some embodiments, an array includes antibodies or other binding reagents specific for products of a candidate library. The array element may be an oligonucleotide, DNA fragment, polynucleotide, or the like, as defined below. The array element may include any element immobilized on a solid support that is capable of binding with specificity to a target sequence such that gene expression may be determined, either qualitatively or quantitatively. When referring to a pattern of expression, a “qualitative” difference in gene expression refers to a difference that is not assigned a relative value. That is, such a difference is designated by an “all or nothing” valuation. Such an all or nothing variation can be, for example, expression above or below a threshold of detection (an on/off pattern of expression). Alternatively, a qualitative difference can refer to expression of different types of expression products, e.g., different alleles (e.g., a mutant or polymorphic allele), variants (including sequence variants as well as post-translationally modified variants), etc. In contrast, a “quantitative” difference, when referring to a pattern of gene expression, refers to a difference in expression that can be assigned a value on a graduated scale, (e.g., a 0-5 or 1-10 scale, a + +++ scale, a grade 1 grade 5 scale, or the like: it will be understood that the numbers selected for illustration are entirely arbitrary and in no-way are meant to be interpreted to limit the invention). Microarrays are useful in carrying out the methods disclosed herein b-cause of the reproducibility between different experiments. DNA microarrays provide one method for the simultaneous measurement of the expression levels of large numbers of genes. Each array consists of a reproducible pattern of capture probes attached to a solid support. Labeled RNA or DNA is hybridized to complementary probes on the array and then detected by laser scanning. Hybridization intensities for each probe on the array are determined and converted to a quantitative value representing relative gene expression levels. See, U.S. Pat. Nos. 6,040,138, 5,800,992 and 6,020,135, 6.033,860, and 6,344,316, which are incorporated herein by reference. High-density oligonucleotide arrays are particularly useful for determining the gene expression profile for a large number of RNA's in a sample.
A “DNA fragment” includes polynucleotides and/or oligonucleotides and refers to a plurality of joined nucleotide units formed from naturally-occurring bases and cyclofuranosyl groups joined by native phosphodiester bonds. This term effectively refers to naturally-occurring species or synthetic species formed from naturally-occurring subunits. “DNA fragment” also refers to purine and pyrimidine groups and moieties which function similarly but which have non naturally-occurring portions. Thus, DNA fragments may have altered sugar moieties or inter-sugar linkages. Exemplary among these are the phosphorothioate and other sulfur containing species. They may also contain altered base units or other modifications, provided that biological activity is retained. DNA fragments may also include species that include at least some modified base forms. Thus, purines and pyrimidines other than those normally found in nature may be so employed. Similarly, modifications on the cyclofuranose portions of the nucleotide subunits may also occur as long as biological function is not eliminated by such modifications.
The term “polynucleotide,” when used in singular or plural, generally refers to any polyribonucleotide or polydeoxribonucleotide, which may be unmodified RNA or DNA or modified RNA or DNA. Thus, for instance, polynucleotides as defined herein include, without limitation, single- and double-stranded DNA, DNA including single- and double-stranded regions, single- and double-stranded RNA, and RNA including single- and double-stranded regions, hybrid molecules comprising DNA and RNA that may be single-stranded or, more typically, double-stranded or include single- and double-stranded regions. In addition, the term “polynucleotide” as used herein refers to triple-stranded regions comprising RNA or DNA or both RNA and DNA. The strands in such regions may be from the same molecule or from different molecules. The regions may include all of one or more of the molecules, but more typically involve only a region of some of the molecules. One of the molecules of a triple-helical region often is an oligonucleotide. Thus, DNAs or RNAs with backbones modified for stability or for other reasons are “polynucleotides” as that term is intended herein. Moreover, DNAs or RNAs comprising unusual bases, such as inosine, or modified bases, such as tritiated bases, are included within the term “polynucleotides” as defined herein. In general, the term “polynucleotide” embraces all chemically, enzymatically and/or metabolically modified forms of unmodified polynucleotides, as well as the chemical forms of DNA and RNA characteristic of viruses and cells, including simple and complex cells.
The term “oligonucleotide” refers to a relatively short polynucleotide, including, without limitation, single-stranded deoxyribonucleotides, single- or double-stranded ribonucleotides, RNA:DNA hybrids and double-stranded DNAs. Oligonucleotides, such as single-stranded DNA oligonucleotides, are often synthesized by chemical methods, for example using automated oligonucleotide synthesizers that are commercially available. However, oligonucleotides can be made by a variety of other methods, including in vitro recombinant DNA-mediated techniques and by expression of DNAs in cells and organisms.
The terms “differentially expressed gene,” “differential gene expression” and their synonyms, which are used interchangeably, refer to a gene whose expression is activated to a higher or lower level in a subject, relative to its expression in a normal or control subject. A differentially expressed gene may be either activated or inhibited at the nucleic acid level or protein level, or may be subject to alternative splicing to result in a different polypeptide product. Such differences may be evidenced by a change in mRNA levels, surface expression, secretion or other partitioning of a polypeptide, for example. Differential gene expression may include a comparison of expression between two or more genes, or a comparison of the ratios of the expression between two or more genes, or even a comparison of two differently processed products of the same gene, which differ between normal subjects and subjects suffering from a disease, or between various stages of the same disease. Differential expression includes both quantitative, as well as qualitative, differences in the temporal or cellular expression pattern in a gene or its expression products. As used herein, “differential gene expression” can be present when there is, for example, at least an about a one to about two-fold, or about two to about four-fold, or about four to about six-fold, or about six to about eight-fold, or about eight to about ten-fold, or greater than about 11 fold difference between the expression of a given gene in a patient of interest compared to a suitable control. However, a fold change less than one is not intended to be excluded, and to the extent such change can be accurately measured, a fold change less than one may be reasonably relied upon in carrying out the methods disclosed herein. In some embodiments, the fold change may be greater than about five or about 10 or about 20 or about 30,or about 40.
The phrase “gene expression profile” as used herein, is intended to encompass the general usage Of the term as used in the art, and generally means the collective data representing gene expression with respect to a selected group of two or more genes, wherein the gene expression may be upregulated, downregulated, or unchanged as compared to a reference standard. A gene expression profile is obtained via measurement of the expression level of many individual genes. The expression profiles can be prepared using different methods. Suitable methods for preparing a gene expression profile include, but are not limited to, quantitative RT-PCR. Northern Blot, in situ hybridization, slot-blotting, nuclease protection assay, nucleic acid arrays, and immunoassays. The gene expression profile may also be determined indirectly via measurement of one or more gene products (whether a full or partial gene product) for a given gene sequence, where that gene product is known or determined to correlate with gene expression.
The phrase “gene product” is intended to have the meaning as generally understood in the art and is intended to generally encompass the product(s) of RNA translation resulting in a protein and/or a protein fragment. The gene products of the genes identified herein may also be used for the purposes of diagnosis or treatment in accordance with the methods described herein.
A “reference gene expression profile” as used herein, is intended to indicate the gene expression profile, as defined above, for a pre-selected group which is useful for comparison to the gene expression profile of a subject of interest. For example, the reference gene expression profile may he the gene expression profile of a single individual known to not have an inflammatory bowel disease (i.e., a “normal” subject) or the gene expression profile represented by a collection of RNA samples from “normal” individuals that has been processed as a single sample. The “reference gene expression profile” may vary, and such variance will be readily appreciated by one of ordinary skill in the art.
The phrase “reference standard” as used herein may refer to the phrase “reference gene expression profile” or may more broadly encompass any suitable reference standard which may be used as a basis of comparison with respect to the measured variable. For example, a reference standard may he an internal control, the gene expression or a gene product of a “healthy” or “normal” subject, a housekeeping gene, or any unregulated gene or gene product. The phrase is intended to be generally non-limiting in that the choice of a reference standard is well within the level of skill in the art and is understood to vary based on the assay conditions and reagents available to one using the methods disclosed herein.
“Gene expression profiling” as used herein, refers to any method that can analyze the expression of selected genes in selected samples.
The phrase “gene expression system” as used herein, refers to any system, device or means to detect gene expression and includes diagnostic agents, candidate libraries, oligonucleotide sets or probe sets.
The terms “diagnostic oligonucleotide” or “diagnostic oligonucleotide set” generally refers to an oligonucleotide or to a set of two or more oligonucleotides that, when evaluated for differential expression their corresponding diagnostic genes, collectively yields predictive data. Such predictive data typically relates to diagnosis, prognosis, selection of therapeutic agents, monitoring of therapeutic outcomes, and the like. In general, the components of a diagnostic oligonucleotide or a diagnostic oligonucleotide set are distinguished from oligonucleotide sequences that are evaluated by analysis of the DNA to directly determine the genotype of an individual as it correlates with a specified trait or phenotype, such as a disease, in that it is the pattern of expression of the components of the diagnostic oligonucleotide set, rather than mutation or polymorphism of the DNA sequence that provides predictive value. It will be understood that a particular component (or member) of a diagnostic oligonucleotide set can, in some cases, also present one or more mutations, or polymorphisms that are amenable to direct genotyping by any of a variety of well known analysis methods, e.g., Southern blotting, RFLP, AFLP, SSCP, SNP, and the like.
The phrase “gene amplification” refers to a process by which multiple copies of a gene or gene fragment are formed in a particular cell or cell line. The duplicated region (a stretch of amplified DNA) is often referred to as “amplicon.” Usually, the amount of the messenger RNA (mRNA) produced, i.e., the level of gene expression, also increases in the proportion of the number of copies made of the particular gene expressed.
A “gene expression system” refers to any system, device or means to detect gene expression and includes diagnostic agents, candidate libraries oligonucleotide, diagnostic gene sets, oligonucleotide sets, array sets, or probe sets.
As used herein, a “probe” refers to the gene sequence arrayed on a substrate.
The terms “splicing” and “RNA splicing” are used interchangeably and refer to RNA processing that removes introns and joins exons to produce mature mRNA with continuous coding sequence that moves into the cytoplasm of an eukaryotic cell.
“Stringency” of hybridization reactions is readily determinable by one of ordinary skill in the art, and generally is an empirical calculation dependent upon probe length, washing temperature, and salt concentration. In general, longer probes require higher temperatures for proper annealing, while shorter probes need lower temperatures. Hybridization generally depends on the ability of denatured DNA to re-anneal when complementary strands are present in an environment below their melting temperature. The higher the degree of desired homology between the probe and hybridizable sequence, the higher the relative temperature which can be used. As a result, it follows that higher relative temperatures would tend to make the reaction conditions more stringent, while lower temperatures less so. For additional details and explanation of stringency of hybridization reactions, see Ausubel et al., Current Protocols in Molecular Biology, Wiley Interscience Publishers, (1995).
As used herein, a “target” refers to the sequence derived from a biological sample that is labeled and suitable for hybridization to a probe affixed on a substrate.
The term “treatment” refers to both therapeutic treatment and prophylactic or preventative measures, wherein the object is to prevent or slow down (lessen) the targeted pathologic condition or disorder. Those in need of treatment include those already with the disorder as well as those prone to have the disorder or those in whom the disorder is to be prevented.
The practice of the present invention will employ, unless otherwise indicated, conventional techniques of molecular biology (including recombinant techniques), microbiology, cell biology and biochemistry, which are within the skill of the art.
Gene Expression Profiling
The present invention relates to a method of predicting the optimal course of therapy for patients having an inflammatory bowel disease (IBD), for example, Crohn's disease (CD) or ulcerative colitis (UC) using a diagnostic oligonucleotide set or gene expression profile as described herein, via classification of an individual having or suspected of having a inflammatory bowel disease as being either a “responder” or “non-responder” to first-line therapy. In one embodiment, the methods described herein may be used to predict the optimal course of therapy, or identify the efficacy of a given treatment in an individual having, or suspected of having an inflammatory bowel disease. In other embodiments, the methods described herein may be used to predict the optimal course of therapy post-diagnosis, for example, after treatment of an individual having an IBD has begun, such that the therapy may be changed or adjusted, in accordance with the outcome of the diagnostic methods.
The present invention also relates to diagnostic oligonucleotides and diagnostic oligonucleotide sets and methods of using the diagnostic oligonucleotides and oligonucleotide sets to diagnose or monitor disease, assess severity of disease, predict future occurrence of disease, predict future complications of disease, determine disease prognosis, evaluate the patient's risk, “stratify” or classify a group of patients, assess response to current drug therapy, assess response to current non-pharmacological therapy, identify novel therapeutic compounds, determine the most appropriate medication or treatment for the patient, predict whether a patient is likely to respond to a particular drug, and determine most appropriate additional diagnostic testing for the patient, as well as other clinically and epidemiologically relevant applications. As set forth above, the term “diagnostic oligonucleotide set” generally refers to a set of two or more oligonucleotides that, when evaluated for differential expression of their products, collectively yields predictive data. Such predictive data typically relates to diagnosis, prognosis, monitoring of therapeutic outcomes, and the like. In general, the components of a diagnostic oligonucleotide set are distinguished from nucleotide sequences that are evaluated by analysis of the DNA to directly determine the genotype of an individual as it correlates with a specified trait or phenotype, such as a disease, in that it is the pattern of expression of the components of the diagnostic nucleotide set, rather than mutation or polymorphism of the DNA sequence that provides predictive value. It will be understood that a particular component (or member) of a diagnostic nucleotide set can, in some cases, also present one or more mutations, or polymorphisms that are amenable to direct genotyping by any of a variety of well known analysis methods, e.g., Southern blotting, RFLP, AFLP, SSCP, SNP, and the like.
In another embodiment of the present invention, a gene expression system useful for carrying out the described methods is also provided. This gene expression system can be conveniently used for determining a diagnosis, prognosis, or selecting a treatment for patients having or suspected of having an IBD such as CD or UC.
In one embodiment, the methods disclosed herein allow one to classify an individual of interest as either a “responder” or a “non-responder” to first-line treatment using a gene expression profile. For purposes of the methods disclosed herein, the term “responder” refers to a patient that responds to first line therapy and does not require a second induction of remission during the year following the induction of remission. In contrast, the term “non-responder” refers to a patient having an IBD such as CD that will require a second induction of remission using any therapy. For example, treatment non-responders may require more than one course of corticosteroids, or anti-TNF, during the first year.
Thus, in accordance with the methods, a classification of an individual as a “responder” indicates that first line treatment is likely to be successful in treating the IBD, and as such, may be the treatement of choice, while an individual identified as being a non-responder would generally not be an ideal candidate for traditional first-line therapies. Rather, an individual identified as a non-responder would likely benefit from more aggressive, or second-line therapies typically reserved for individuals that have not responded to first-line treatment.
Classifying patients as either a “responder” or a “non-responder” is advantageous, in that it allows one to predict the optimal course of therapy for the patient. This classification may be useful at the outset of therapy (at the time of diagnosis) or later, when first-line therapy has already been initiated, such that treatment may be altered to the benefit of the patient.
In general, the method of using a gene expression profile or gene expression system for diagnosing an individual as a responder or a non-responder comprises measuring the gene expression of a gene identified in any of Tables 4-8 or the sequence listing. Gene expression, as used herein, may be determined using any method known in the art reasonably calculated to determine whether the expression of a gene is upregulated, down-regulated, or unchanged, and may include measurement of RNA or the gene product itself.
In one embodiment, an individual is characterized as a responder or nonresponder to first line therapy via measurement of the expression of one or more genes of Table 4 in the individual as compared to the expression of one or more genes of Table 4 in a suitable control (such as an individual previously determined to be a responder or non-responder). In another embodiment the one or more genes are selected from Table 5. In another embodiment the one or more genes are selected from Table 6. In another embodiment the one or more genes are selected from Table 7. In another embodiment the one or more genes are selected from Table 8. The genes selected for measurement of expression may be selected on the basis of fold difference. For example, the genes may be those having a fold-change of greater than about 2 or about 3, or about 4 or about 5 as identified in any of Tables 4, 5, 6, 7, or 8.
In yet another embodiment, the method of identifying an individual having or suspected of having an inflammatory bowel disease such as comprises the steps of: 1) providing an array set immobilized on a substrate, wherein the array set comprises one or more oligonucleotides derived from the sequences listed in Tables 4-8, or the Sequence Listing, 2) providing a labeled target obtained from mRNA isolated from a biological sample from a patient having an IBD such as CD or UC, 3) hybridizing the labeled target to the array set under suitable hybridization conditions such that the labeled target hybridizes to the array elements, 4) determining the relative amounts of gene expression in the patient's biological sample as compared to a reference sample by detecting labeled target that is hybridized to the array set; 5) using the gene expression profile to classify the patient as a responder or a non-responder; and 6) predicting the optimal course of therapy based on said classification.
The one or more sequences that comprise the array elements may be selected from any of the sequences listed in Tables 4-8 or the Sequence Listing. In one embodiment, the gene expression system comprises one or more array elements wherein the one or more array elements correspond to sequences selected from those sequences listed in Tables 4-8, or the Sequence Listing. In one embodiment, the array set comprises the sequences listed in Table 5. In another embodiment, the array set comprises the sequences listed in Table 6.
The present invention also relates to an apparatus for predicting the optimal course of therapy in a patient having an inflammatory bowel disease such as CD or UC. The apparatus comprises a solid support having an array set immobilized thereon, wherein labeled target derived from mRNA from a patient of interest is hybridized to the one or more sequences of the array set on the solid support, such that a change in gene expression for each sequence compared to a reference sample or other suitable control may be determined, permiting a determination of the optimal course of therapy for the patient. The array set comprises one or more sequences selected from those listed in Tables 4-8 or the Sequence Listing described herein. In one embodiment, the array set comprises the sequences listed in Table 5. In another embodiment, the array set comprises the sequences listed in Table 6.
In yet another embodiment, the method of classifying an individual having or suspected of having an inflammatory bowel disease as a responder or non-responder comprises the steps of: 1) obtaining mRNA isolated from a biological sample from a patient having or suspected of having an inflammatory bowel disease, 2) reverse transcribing mRNA to obtain the corresponding DNA; 3) selecting suitable oligonucleotide primers corresponding to one or more genes selected from Tables 4-8 or the Sequence Listing, 4) combining the DNA and oligonucleotide primers in a suitable hybridization solution; 5) incubating the solution under conditions that permit amplification of the sequences corresponding to the primers; and 6) determining the relative amounts of gene expression in the patient's biological sample as compared to a reference sample or other suitable control; wherein the the resulting gene expression profile can be used to classify the patient as a responder or a non-responder.
In other embodiments, real time PCR methods or any other method useful in measuring mRNA levels as known in the art may also be used. Alternatively, measurement of one or more gene products using any standard method of measuring protein (such as radioimmunoassay methods or Western blot analysis) may be used to determine a gene expression profile.
The methods of gene expression profiling that may be used with the methods and apparatus described herein are well-known in the art. In general, methods of gene expression profiling can be divided into methods based on hybridization analysis of polynucleotides, and methods based on sequencing of polynucleotides. Commonly used methods known in the art for the quantification of mRNA expression in a sample include northern blotting and in situ hybridization (Parker & Barnes, Methods in Molecular Biology 106:247 283 (1999)), RNAse protection assays (Hod, Biotechniques 13:852 854 (1992)), and reverse transcription polymerase chain reaction (RT-PCR) (Weis et al., Trends in Genetics 8:263 264 (1992)), or modified RT-PCR methods, such as that described in U.S. Pat. No. 6,618,679. Alternatively, antibodies may be employed that can recognize specific duplexes, including DNA duplexes, RNA duplexes, and DNA-RNA hybrid duplexes or DNA-protein duplexes. Representative methods for sequencing-based gene expression analysis include Serial Analysis of Gene Expression (SAGE), and gene expression analysis by massively parallel signature sequencing (MPSS). In one embodiment described herein, gene array technology such as microarray technology is used to profile gene expression.
Arrays and Microarray Technologies
Array and microarray techniques known in the art to determine gene expression may be employed with the invention described herein. Where used herein, array refers to either an array or microarray. An array is commonly a solid-state grid containing sequences of polynucleotides or oligonucleotides (array elements) of known sequences are immobilized at a particular position (also referred to as an “address”) on the grid. Microarrays are a type of array termed as such due to the small size of the grid and the small amounts of nucleotide (such as nanogram, nanomolar or nanoliter quantities) that are usually present at each address. The immobilized array elements (collectively, the “array set”) serve as hybridization probes for cDNA or cRNA derived from messenger RNA (mRNA) isolated from a biological sample. An array set is defined herein as one or more DNA fragments or oligonucleotides, as defined above, that are immobilized on a solid support to form an array.
In one embodiment, for example, the array is a “chip” composed, e.g., of one of the above specified materials. Polynucleotide probes, e.g., RNA or DNA, such as cDNA, synthetic oligonucleotides, and the like, or binding proteins such as antibodies, that specifically interact with expression products of individual components of the candidate library are affixed to the chip in a logically ordered manner, i.e., in an array. In addition, any molecule with a specific affinity for either the sense or anti-sense sequence of the marker nucleotide sequence (depending on the design of the sample labeling), can be fixed to the array surface without loss of specific affinity for the marker and can be obtained and produced for array production, for example, proteins that specifically recognize the specific nucleic acid sequence of the marker, ribozymes, peptide nucleic acids (PNA), or other chemicals or molecules with specific affinity.
The techniques described herein, including array and microarray techniques, may be used to compare the gene expression profile of a biological sample from a patient of interest to the gene expression profile of a reference sample or other suitable control. The gene expression profile is determined by first extracting RNA from a biological sample of interest, such as from a patient diagnosed with an IBD. The RNA is then reverse transcribed into cDNA and labeled. In another embodiment, the cDNA may be transcribed into cRNA and labeled. The labeled cDNA or cRNA forms the target that may be hybridized to the array set comprising probes selected according the the methods described herein. The reference sample obtained from a control patient is prepared in the same way. In one embodiment, both a test sample and reference sample may be used, the targets from each sample being differentially labeled (for example, with fluorophores having different excitation properties), and then combined and hybridized to the array under controlled conditions. In general, the labeled target and immobilized array sets are permitted, under appropriate conditions known to one of ordinary skill in the art, to hybridize such that the targets hybridize to complementary sequences on the arrays. After the array is washed with solutions of appropriately determined stringency to remove or reduce non-specific binding of labled target, gene expression may be determined. The ratio of gene expression between the test sample and reference sample for a given gene determines the color and/or intensity of each spot, which can then be measured using standard techniques as known in the art. Analysis of the differential gene expression of a given array set provides an “expression profile” or “gene signature” for that array set. The expression profile is the pattern of gene expression produced by the experimental sample, wherein transcription of some genes are increased or decreased compared to the reference sample. Amplification methods using in vitro transcription may also be used to yield increased quantities of material to array where sample quantities are limited. In one embodiment, the Nugen Ovation amplification system may be incorporated into the protocol, as described below.
Commercially-produced, high-density arrays such as those manufactured by Affymetrix GeneChip (available from Affymetrix, Santa Clara, Calif.) containing synthesized oligonucleotides may be used with the methods disclosed herein. In one embodiment, the HGU133 Plus Version 2 Affymetrix GeneChip may be used to determine gene expression of an array sets comprising sequences listed in Tables 4-8 or the Sequence Listing.
In another embodiment, customized cDNA or oligonucleotide arrays may be manufactured by first selecting one or more array elements to be deposited on the array, selected from one or more sequences listed in Tables 4-8 or the Sequence Listing. Purified PCR products or other suitably derived oligonucleotides having the selected sequence may then be spotted or otherwise deposited onto a suitable matrix. The support may be selected from any suitable support known in the art, for example, microscope slides, glass, plastic or silicon chips, membranes such as nitrocellulose or paper, fibrous mesh arrangement, nylon filter arrays, glass-based arrays or the like. The array may be a chip array, a plate array, a bead array, a pin array, a membrane array, a solid surface array, a liquid array, an oligonucleotide array, a polynucleotide array, a cDNA array, a microfilter plate, a membrane or a chip. Where transparent surfaces such as microscope slides are used, the support provides the additional advantage of two-color fluorescent labeling with low inherent background fluorescence. The gene expression systems described above, such as arrays or microarrays, may be manufactured using any techniques known in the art, including, for example, printing with fine-pointed pins onto glass slides, photolithograpahy using dynamic micromirror devices, ink-jet printing, or electrochemistry on microelectrode arrays. Oligonucleotide adherence to the slide may be enhanced, for example, by treatment with polylysine or other cross-linking chemical coating or by any other method known in the art. The DNA or oligonucleotide may then be cross-linked by ultraviolet irradiation and denatured by exposure to either heat or alkali. The microarray may then be hybridized with labeled target derived from mRNA from one or more samples to be analyzed. For example, in one embodiment, cDNA or cRNA obtained from mRNA from colon samples derived from both a patient diagnosed with IBD and a healthy control sample is used. The samples may be labeled with different detectable labels such as, for example, fluorphores that exhibit different excitation properties. The samples may then be mixed and hybridized to a single microarray that is then scanned, allowing the visualization of up-regulated or down-regulated genes. The DualChip™ platform available from Eppendorf is an example of this type of array.
The probes affixed to the solid support in the gene expression system comprising the array elements may be a candidate library, a diagnostic agent, a diagnostic oligonucleotide set or a diagnostic probe set. In one embodiment of the present invention, the one or more array elements comprising the array set are selected from those sequences listed in Tables 4-8 or the Sequence Listing.
Determination of Array Sets
A global pattern of gene expression in colon biopsies from Crohn's Disease (CD) patients at diagnosis (CDD), treated CD patients refractory to first line corticosteroid/6-MP therapy (chronic refractory, CDT), and healthy controls has been determined and is disclosed herein. cRNA was prepared from biopsies obtained from endoscopically affected segments, predominantly the ascending colon, with control biopsies obtained from matched segments in healthy patients. cRNA was labelled and then hybridized to the HGU133 Plus Version 2 Affymetrix GeneChip. RNA obtained from a pool of RNA from one normal colon specimen was labelled and hybridized to the GeneChip with each batch of new samples to serve as an internal control for batch to batch variability in signal intensity. Results were interpreted utilizing GeneSpring™ 7.3 Software (Silicon Genetics). Differentially expressed genes were identified by filtering levels of gene-specific signal intensity for statistically significant differences when grouped by clinical forms (e.g. healthy control versus CDD and healthy control versus CDT) using ANOVA, p values of ≦0.05 considered significant, without multiple testing correction and filtering for a fold-change expression level of at least 1.5-fold in the CDD versus normal and 2-fold for CDT versus normal. The overall gene expression profile was generated by gene tree hierarchical cluster analysis based on similarity of Pearson correlation, separation ratio 1, and minimal distance of 0.001.
An array set of 779 genes were identified. These genes, referred to as the Crohn's Disease Genomic Signature (Table 8) were differentially expressed in both CD colon at diagnosis and in chronic refractory disease, relative to healthy controls, with at least 1.5 fold difference in expression and significance level of at least 0.05. The global pattern of gene expression was substantially homogenous in the panel of chronic refractory patients, relative to a more heratogenous pattern in the CD patients at diagnosis, suggesting a distinct sub-set of CD patients that could be identified at diagnosis relative to their ultimate response to therapy. A cohort of CD patients having a known genomic signature was then prospectively followed.
From that cohort, responder patients and non-responder patients were identified. Treatment “responders” are defined as requiring one course of corticosteroids during the first year. Treatment “non-responders” are defined as requiring more than one course of corticosteroids, or anti-TNF, during the first year. The only clinical distinction between the responder and non-responder groups was the response to first line therapy, as they otherwise possessed similar age (12±1.2 vs 12±1.3, disease distribution, and clinical (Pediatric Crohn's Disease Activity Index (PCDAI): 40±9 vs 45±6) and histological (Crohn's Disease Histological Index of Severity (CDHIS): 6±1.8 vs 5±2) disease activity, respectively 70, 71. They also did not differ in the frequency of immunomodulator or mesalamine use.
Condition tree hierarchical cluster analysis using a distance correlation, in which the individual patients where grouped based upon similar patterns of gene expression and not pre-defined clinical sub-sets, has shown that most non-responders cluster together, with a pattern of gene expression intermediate between most responders and chronic refractory patients.
This gene set (the Crohn's Disease Genomic Signature, Table 8) was then reduced to smaller sets that can be used to distinguish responders from non-responders using the methods described herein. The smaller gene sets were identified via class prediction analysis using GeneSpring™ software, beginning with the CDGS gene set. The class prediction analysis used to arrive at the smaller gene sets is described in full below.
The smaller gene sets, referred to herein as “array sets” comprise the sequences disclosed in Tables 4-8 or the Sequence Listing. These array sets can be used to identify distinct sub-sets of CD patients at diagnosis, relative to their ultimate response to therapy. In particular, the gene sets, in one embodiment, may be used to determine whether a patient diagnosed with IBD may be classified as a “responder” or “non-responder,” thus permitting the clinician to predict the optimal course of therapy.
Thus, in one embodiment, gene expression methods can be used to define clinically meaningful sub-sets of IBD patients with respect to treatment response, using intestinal samples obtained at the time of diagnosis. Further, the CDGS and the K-nearest neighbors class prediction algorithm, using additional training and test sets derived from additional patient samples may be used to define novel array sets for predicting treatment response.
Determination of a Gene Expression Profile
The present invention is related to methods of detecting gene expression using a gene expression system having one or more array elements wherein the array elements comprise one or more sequence that corresponds to sequence selected from those sequences listed in Tables 4-8 or the Sequence Listing, forming an array set. From the gene lists disclosed in Tables 4-8 and the Sequence Listing, it should be understood by one of ordinary skill in the art that standard methods of data analysis or using the disclosed methods (such as cluster analysis, K-nearest neighbors class prediction algorithms, or class prediction analysis using appropriately selected parameters) can be used to identify a smaller number of array elements, while still retaining the predictive characterisitics of the the array sets disclosed herein. Non-limiting examples of data analysis that may be used are listed below.
In one embodiment, an array may be used to determine gene expression as described above. For example, PCR amplified inserts of cDNA clones may be applied to a substrate in a dense array. These cDNA may be selected from one or more of those sequences listed in Tables 4-8 or the Sequence Listing. In one embodiment, the array comprises a gene set further comprising one or more sequences listed in Table 4. In another embodiment, the array comprises an array set comprising one or more sequences listed in Table 5.
In another embodiment, the array (or gene expression system) comprises at least 2, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200, 250 or more different polynucleotide probes, each different probe capable of hybridizing to a different gene sequence listed in Table 6.
In another embodiment, the array (or gene expression system) comprises at least 2, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200, 250 or more different polynucleotide probes, each different probe capable of hybridizing to a different gene sequence listed in Table 7.
In another embodiment, the array (or gene expression system) comprising at least 2, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200, 250 or more different polynucleotide probes, each different probe capable of hybridizing to a different gene sequence listed in Table 8.
In one embodiment of the present invention, the array (or gene expression system) comprises a gene set further comprising from about 1 to about 1000 gene sequences, or about 200 to about 800 genes sequences, or about 20 to about 60 genes sequences, or about 10 to about 20 genes sequences, selected from the sequences listed in Tables 4-8 or the Sequence Listing.
In yet another embodiment, the selected genes include at least two groups of genes. The first group includes genes upregulated in inflammatory bowel disease compared to normal controls wherein the upregulated genes have IBD/Normal ratios of at least 2, 3, 4, 5, 10, or more. The second group includes genes downregulated in inflammatory bowel disease which have IBD/Normal ratios of no greater than 0.5, 0.333, 0.25, 0.2, 0.1, or less. Each group may include at least 1, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, or more genes.
It is also understood that each probe can correspond to one gene, or multiple probes can correspond to one gene, or both, or one probe can correspond to more than one gene. In some embodiments, DNA molecules are less than about any of the following lengths (in bases or base pairs): 10,000; 5,000; 2500; 2000; 1500; 1250; 1000; 750; 500; 300; 250; 200; 175; 150; 125; 100; 75; 50; 25; 10. In some embodiments, the DNA molecule is greater than about any of the following lengths (in bases or base pairs): 10; 15; 20; 25; 30; 40; 50; 60; 75; 100; 125; 150; 175; 200; 250; 300; 350; 400; 500; 750; 1000; 2000; 5000; 7500; 10000; 20000; 50000. Alternately, a DNA molecule can be any of a range of sizes having an upper limit of 10,000; 5,000; 2500; 2000; 1500; 1250; 1000; 750; 500; 300; 250; 200; 175; 150; 125; 100; 75; 50; 25; or 10 and an independently selected lower limit of 10; 15; 20; 25; 30; 40; 50; 60; 75; 100; 125; 150; 175; 200; 250; 300; 350: 400; 500; 750; 1000; 2000; 5000; 7500 wherein the lower limit is less than the upper limit.
Homologs and variants of the disclosed nucleic acid molecules in Tables 4-8 or the Sequence Listing may be used in the present invention. Homologs and variants of these nucleic acid molecules typically possess a relatively high degree of sequence identity when aligned using standard methods. Sequences suitable for use in the methods described herein have at least about 40-50, about 50-60, about 70-80, about 80-85, about 85-90, about 90-95 or about 95-100% sequence identity to the sequences disclosed herein.
The probes, immobilized on the selected substrate, are suitable for hybridization under conditions with appropriately determined stringency, such that targets binding non-specifically to the substrate or array elements are substantially removed. Appropriately labeled targets generated from mRNA are generated using any standard method as known in the art. For example, the targets may be cDNA targets generated through incorporation of fluorescent nucleotides by reverse transcription of RNA extracted from tissues of interest. Alternatively, biotin labeled targets may be used, such as using the method described herein. It should be clear that any suitable oligonucleotide-based target may be used. In another embodiment, suitably labeled cRNA targets may be used. Regardless of the type of target, the targets are such that the labeled targets applied to the chip hybridize to complementary probes on the array. After washing to minimize non-specific binding, the chip may be scanned by confocal laser microscopy or by any other suitable detection method known in the art, for example, a CCD camera. Quantification of hybridization at each spot in the array allows a determination of corresponding mRNA expression. With dual color fluorescence, separately labeled cDNA targets generated from two sources of RNA are hybridized pairwise to the array. The relative abundance of the transcripts from the two sources corresponding to each specified gene can then be determined simultaneously. (See Schena et al., Proc. Natl. Acad. Sci. USA 93(2):106 149 (1996)). Microarray analysis can be performed by commercially available equipment, following manufacturer's protocols, such as by using the Affymetrix GenChip technology (for example, HGU133 Plus Version 2 Affymetrix GeneChip), or Incyte's microarray technology, or using any other methods as known in the art.
It is understood that for determination of a gene expression profile, variations in the disclosed sequences will still permit detection of gene expression. The degree of sequence identity required to detect gene expression varies depending on the length of the oligomer. For example, in a 60-mer, (an oligonucleotide with about 60 nucleotides), about 6 to about 8 random mutations or about 6 to about 8 random deletions in a 60-mer do not affect gene expression detection. Hughes, T R, et al. “Expression profiling using microarrays fabricated by an ink jet oligonucleotide synthesizer. Nature Biotechnology, 19:343-347(2001). As the length of the DNA sequence is increased, the number of mutations or deletions permitted while still allowing gene expression detection is increased.
As will be appreciated by those skilled in the art, the sequences of the present invention may contain sequencing errors. That is, there may be incorrect nucleotides, frameshifts, unknown nucleotides, or other types of sequencing errors in any of the sequences; however, the correct sequences will fall within the homology and stringency definitions herein.
Additional Methods of Determining Gene Expression
The array sets disclosed herein may also be used to determine a gene expression profile such that a patient may be classified as a responder or a nonresponder any other techniques that measure gene expression. For example, the expression of genes disclosed in the array sets herein may be detected using RT-PCR methods or modified RT-PCR methods. In this embodiment, RT-PCR is used to detect gene expression of genes selected from one or more genes selected from the array sets listed in Tables 4-8 or the Sequence Listing.
Various methods using RT-PCR may be employed. For example, standard RT-PCR methods may be used. Using this method, well-known in the art, isolated RNA may be reverse transcribed using into cDNA using standard methods as known in the art. This cDNA is then exponentially amplified in a PCR reaction using standard PCR techniques. The reverse transcription step is typically primed using specific primers, random hexamers, or oligo-dT primers, depending on the circumstances and the goal of expression profiling. For example, extracted RNA can be reverse-transcribed using a GeneAmp RNA PCR kit (Perkin Elmer, Calif., USA), following the manufacturer's instructions. The derived cDNA can then be used as a template in the subsequent PCR reaction. Although the PCR step can use a variety of thermostable DNA-dependent DNA polymerases, it typically employs the Taq DNA polymerase, which has a 5′-3′ nuclease activity but lacks a 3′-5′ proofreading endonuclease activity. Thus, TaqMan® PCR typically utilizes the 5′-nuclease activity of Taq or Tth polymerase to hydrolyze a hybridization probe bound to its target amplicon, but any enzyme with equivalent 5′ nuclease activity can be used. Two oligonucleotide primers are used to generate an amplicon typical of a PCR reaction. A third oligonucleotide is designed to detect nucleotide sequence located between the two PCR primers. The third oligonucleotide is non-extendible by Taq DNA polymerase enzyme, and is labeled with a reporter fluorescent dye and a quencher fluorescent dye. Any laser-induced emission from the reporter dye is quenched by the quenching dye when the two dyes are located close together as they are on the probe. During the amplification reaction, the Taq DNA polymerase enzyme cleaves the third oligonucleotide in a template-dependent manner. The resultant fragments disassociate in solution, and signal from the released reporter dye is free from the quenching effect of the second fluorophore. One molecule of reporter dye is liberated for each new molecule synthesized, and detection of the unquenched reporter dye provides the basis for quantitative interpretation of the data. TaqMan® RT-PCR can be performed using commercially available equipment, such as, for example, ABI PRISM 7700™ Sequence Detection System™ (Perkin-Elmer-Applied Biosystems, Foster City, Calif., USA), or Lightcycler (Roche Molecular Biochemicals, Mannheim, Germany). In a preferred embodiment, the 5′ nuclease procedure is run on a real-time quantitative PCR device such as the ABI PRISM 7700™ Sequence Detection System™. The system consists of a thermocycler, laser, charge-coupled device (CCD), camera and computer. The system amplifies samples in a 96-well format on a thermocycler. During amplification, laser-induced fluorescent signal is collected in real-time through fiber optics cables for all 96 wells, and detected at the CCD. The system includes software for running the instrument and for analyzing the data. To minimize errors and the effect of sample-to-sample variation, RT-PCR is usually performed using an internal standard. The ideal internal standard is expressed at a constant level among different tissues, and is unaffected by the experimental treatment. RNAs most frequently used to normalize patterns of gene expression are mRNAs for the housekeeping genes glyceraldehyde-3-phosphate-dehydrogenase (GAPDH) and β-actin, although any other housekeeping gene or other gene established to be expressed at constant levels between comparison groups can be used.
Real time quantitative PCR techniques, which measure PCR product accumulation through a dual-labeled fluorigenic target (i.e., TaqMan® probe) may also be used with the methods disclosed herein to determine a gene expression profile. The Stratagene Brilliant SYBR Green QPCR reagent, available from 11011 N. Torrey Pines Road, La Jolla, Calif. 92037, may also be used. The SYBR® Green dye binds specifically to double-stranded PCR products, without the need for sequence-specific targets. Real time PCR is compatible both with quantitative competitive PCR, where internal competitor for each target sequence is used for normalization, and with quantitative comparative PCR using a normalization gene contained within the sample, or a housekeeping gene for RT-PCR. For further details see, e.g. Held et al., Genome Research 6:986 994 (1996).
Alternatively, a modified RT-PCR method such as eXpress Profiling™ (XP) technology for high-throughput gene expression analysis, available from Althea Technologies, Inc. 11040 Roselle Street, San Diego, Calif. 92121 U.S.A. may be used to determine a gene expression profiles of a patient diagnosed with IBD. The gene expression analysis may be limited to one or more array sets as disclosed herein. This technology is described in U.S. Pat. No. 6,618,679, incorporated herein by reference. This technology uses a modified RT-PCR process that permits simultaneous, quantitative detection of expression levels of about 20 genes. This method may be complementary to or used in place of array technology or PCR and RT-PCR methods to determine or confirm a gene expression profile, for example, when classifying the status of a patient as a responder or non-responder.
Multiplex mRNA assays may also be used, for example, that described in Tian, et al., “Multiplex mRNA assay using Electrophoretic tags for high-throughput gene expression analysis,” Nucleic Acids Research 2004, Vol. 32, No. 16, published online Sep. 8, 2004 and Elnifro, et al. “Multiplex PCR: Optimization and Application in Diagnostic Virology,” Clinical Microbiology Reviews, October 2000, p. 559-570, both incorporated herein by reference. In multiplex CR, more than one target sequence can be amplified by including more than one pair of primers in the reation.
Collection and Preparation of Sample
The methods disclosed herein employ a biological sample derived from patients diagnosed with an IBD such as UC or CD. The samples may include, for example, tissue samples obtained by biopsy of endoscopically affected colonic segments including the cecum/ascending, transverse/descending or sigmoid/rectum; small intestine; ileum; intestine; cell lysates: serum; or blood samples. Colon epithelia cells and lamina propria cells may be used for mRNA isolation. Control biopsies are obtained from the same source. Sample collection will depend on the target tissue or sample to be assayed.
Immediately after collection of a biological sample, the sample may be placed in a medium appropriate for storage of the sample such that degradation of mRNA is minimized and stored on ice. For example, a suitable medium for storage of sample until processing is RNALater®, available from Applied Biosystems, 850 Lincoln Centre Drive, Foster City Calif. 94404, U.S.A. Total RNA may then be prepared from a target sample using standard methods for RNA extraction known in the art and disclosed in standard textbooks of molecular biology, including Ausubel et al., Current Protocols of Molecular Biology, John Wiley and Sons (1997). For example, RNA isolation can be performed using purification kit, buffer set and protease from commercial manufacturers, such as Qiagen, according to the manufacturer's instructions. In one embodiment, total RNA is prepared utilizing the Qiagen RNeasy mini-column, available from QIAGEN Inc., 27220 Turnberry Lane Suite 200, Valencia, Calif. 91355. Other commercially available RNA isolation kits include MasterPure™ Complete DNA and RNA Purification Kit (EPICENTRE®, Madison, Wis.), or Paraffin Block RNA Isolation Kit (Ambion, Inc.). Total RNA from tissue samples may also be isolated using RNA Stat-60 (Tel-Test). RNA may also be prepared, for example, by cesium chloride density gradient centrifugation. RNA quality may then be assessed. RNA quality may be determined using, for example, the Agilent 2100 Bioanalyzer. Acceptable RNA samples have distinctive 18S and 28S Ribosomal RNA Bands and a 28S/18S ribosomal RNA ratio of about 1.5 to about 2.0.
In one embodiment, about 400 to about 500 nanograms of total RNA per sample is used to prepare labeled mRNA as targets. The RNA may be labeled using any methods known in the art, including for example, the TargetAmp 1-Round Aminoallyl-aRNA Amplification Kit available from Epicentre to prepare cRNA, following the manufacturer's instructions. The TargetAmpl-Round Aminoallyl-aRNA Amplification Kit (Epicentre) is used to make double-stranded cDNA from total RNA. An in vitro transcription reaction creates cRNA target. Biotin-X-X-NHS (Epicentre) is used to label the aminoallyl-aRNA with biotin following the manufacturer's instructions. In one embodiment, the biotin-labeled cRNA target is then chemically fragmented and a hybridization cocktail is prepared and hybridized to a suitable array set immobilized on a suitable substrate. For example, the labeled cRNA may be hybridized to an Affymetrix Genechip Array (HGU133 Plus Version 2 Affymetrix GeneChip, available from Affymetrix, 3420 Central Expressway, Santa Clara, Calif. 95051). In this embodiment, the hybridization cocktail contains 0.034 ug/uL fragmented cRNA, 50 pM Control Oligonucleotide B2 (Affymetrix), 20× Eukaryotic Hybridization Controls (1.5 pM bioB, 5 pM bioC. 25 pM bioD, 100 pM cre) (Affymetrix), 0.1 mg/mL Herring Sperm DNA (Promega), 0.5 mg/mL Acetylated BSA (Invitrogen), and 1× Hybridization Buffer, though it should be understood that any suitable hybridization cocktail may be used.
In another embodiment, the total RNA may be used to prepare cDNA targets. The targets may be labeled using any suitable labels known in the art. The labeled cDNA targets may then be hybridized under suitable conditions to any array set or subset of an array set described herein, such that a gene expression profile may be obtained.
Normalization
Normalization is an adjustment made to microarray gene expression values to correct for potential bias or error introduced into an experiment. With respect to array-type analyses, such errors may be the result of unequal amounts of cDNA probe, differences in dye properties, differences in dye incorporation etc. Where appropriate, the present methods include the step of normalizing data to minimize the effects of bias or error. The type of normalization used will depend on the experimental design and the type of array being used. The type of normalization used will be understood by one of ordinary skill in the art.
Levels of Normalization
There may be two types of normalization levels used with the methods disclosed herein: “within slide” (this compensates, for example, for variation introduced by using different printing pins, unevenness in hybridization or, in the case of two channel arrays, differences in dye incorporation between the two samples) or “between slides,” which is sometimes referred to as “scaling” and permits comparison of results of different slides in an experiment, replicates, or different experiments.
Nornialization Methods
Within slide normalization can be accomplished using local or global methods as known in the art. Local normalization methods include the use of “housekeeping genes” and “spikes” or “internal controls”. “Housekeeping” genes are genes which are known, or expected, not to change in expression level despite changes in disease state or phenotype or between groups of interest (such as between known non-responders and responders). For example, common housekeeping genes used to normalize data are those that encode for ubiquitin, actin and elongation factors. Where housekeeping genes are used, expression intensities on a slide are adjusted such that the housekeeping genes have the same intensity in all sample assays.
Normalization may also be achieved using spikes or internal controls that rely on RNA corresponding to particular probes on the microarray slide being added to each sample. These probes may be from a different species than the sample RNAs and optimally should not cross-hybridize to sample RNAs. For two channel arrays, the same amount of spike RNA is added to each sample prior to labeling and normalization is determined via measurement of the spiked features. Spikes can also be used to normalize spatially across a slide if the controls have been printed by each pin—the same controls on different parts of the slide should hybridize equally. Spikes may also be used to normalize between slides.
Reference samples may be any suitable reference sample or control as will be readily understood by one of skill in the art. For example, the reference sample may be selected from normal patients, “responder” patients, “non-responder” patients, or “chronic-refractory patients.” Normal patients are those not diagnosed with an IBD. “Responder” patients and “non-responder” patients are described above. “Chronic refractory” patients are patients with moderate to severe disease that require a second induction of remission using any drug. In one embodiment of the present invention, the control sample comprises cDNA from one or more patients that do not have an inflammatory bowel disease. In this embodiment, the cDNA of multiple normal samples are combined prior to labeling, and used as a control when determining gene expression of experimental samples. The data obtained from the gene expression analysis may then be normalized to the control cDNA.
A variety of global normalization methods may be used including, for example, linear regression. This method is suitable for two channel arrays and involves plotting the intensity values of one sample against the intensity values of the other sample. A regression line is then fitted to the data and the slope and intercept calculated. Intensity values in one channel are then adjusted so that the slope=1 and the intercept is 0. Linear regression can also be carried out using MA plots. These are plots of the log ratio between the Cy5 and Cy3 channel values against the average intensity of the two channels. Again regression lines are plotted and the normalized log ratios are calculated by subtracting the fitted value from the raw log ratio. In the alternative, lowess regression (locally weighted polynomial regression) may be used. This regression method again uses MA plots but is a non-linear regression method. This normalization method is suitable if the MA plots show that the intensity of gene expression is influencing the log ratio between the channels. Lowess essentially applies a large number of linear regressions using a sliding window of the data.
Yet another alternative method of normalization is “print tip normalization.” This is a form of spatial normalization that relies on the assumption that the majority of genes printed with individual print tips do not show differential expression. Either linear or non-linear regression can be used to normalize the data. Data from features printed by different print tips are normalized independently. This type of normalization is especially important when using single channel arrays.
Yet another method of normalization is “2D lowess normalization.” This form of spatial normalization uses a 2d polynomial lowess regression that is fitted to the data using a false color plot of log ratio or intensity as a function of the position of the feature on the array. Values are adjusted according to this polynomial. “Between slide normalization” enables you to compare results from different slides, whether they are two channel or single channel arrays.
Centering and scaling may also be used. This adjusts the distributions of the data (either of log ratios or signal intensity) on different slides such that the data is more similar. These adjustments ensure that the mean of the data distribution on each slide is zero and the standard deviation is 1. For each value on a slide, the mean of that slide is subtracted and the resulting value divided by the standard deviation of the slide. This ensures that the “spread” of the data is the same in each slide you are comparing.
Quantile normalization is yet another method that is particularly useful for comparing single channel arrays. Using this method, the data points in each slide are ranked from highest to lowest and the average computed for the highest values, second highest values and so on. The average value for that position is then assigned to each slide, i.e. the top ranked data point in each slide becomes the average of the original highest values and so on. This adjustment ensures that the data distributions on the different slides are identical.
Various tools for normalizing data are known in the art, and include GenePix, Excel, GEPAS, TMeV/MIDAS and R.
Hybridization Techniques
Where array techniques are used to determine a gene expression profile, the targets must be hybridized to the array sets under suitable hybridization conditions using hybridization and wash solutions having appropriate stringency, such that labeled targets may hybridize to complementary probe sequences on the array. Washes of appropriate stringency are then used to remove non-specific binding of target to the array elements or substrate. Determination of appropriate stringency is within the ordinary skill of one skilled in the art.
In one embodiment of the the present invention, the array set is that of the Affymetrix Genechip Array (HGU133 Plus Version 2 Affymetrix GeneChip, available from Affymetrix, 3420 Central Expressway, Santa Clara, Calif. 95051). In this embodiment, suitably labeled cRNA and hybridization cocktail are first prepared. In this embodiment, the hybridization cocktail contains about 0.034 ug/uL fragmented cRNA, about 50 pM Control Oligonucleotide B2 (available from Affymetrix), 20× Eukaryotic Hybridization Controls (1.5 pM bioB, 5 pM bioC, 25 pM bioD, 100 pM cre) (available from Affymetrix), about 0.1 mg/mL Herring Sperm DNA (Promega), about 0.5 mg/mL Acetylated BSA (Invitrogen). The hybridization cocktail is heated to 99° C. for 5 minutes, to 45° C. for 5 minutes, and spun at maximum speed in a microcentrifuge for 5 minutes. The probe array is then filled with 200 uL of 1× Hybridization Buffer (available from Affymetrix) and incubated at 45° C. for 10 minutes while rotating at 60 rpm. The 1× Hybridization Buffer is removed and the probe array filled with 200 uL of the hybridization cocktail. The probe array is then incubated at 45° C. for about 16 hours in a hybridization oven rotating at 60 rpm.
The array is then washed and stained using any method as known in the art. In one embodiment, the Fluidics Station 450 (Affymetrix) and the fluidics protocol EukGE-WS2v4—450 is used. This protocol comprises the steps of a first post-hybridization wash (10 cycles of 2 mixes/cycle with Affymetrix Wash Buffer A at 25° C.), a second post-hybridization wash (4 cycles of 15 mixes/cycle with Affymetrix Wash Buffer B at 50° C.), a first stain (staining the probe array for 10 minutes with Affymetrix Stain Cocktail 1 at 25° C.), a post-stain wash (10 cycles of 4 mixes/cycle with Affymetrix Wash Buffer A at 25° C.), a second stain (stain the probe array for 10 minutes with Stain Cocktail 2 at 25° C.), a third stain (stain the probe array for 10 minutes with Stain Cocktail 3 at 25° C.) and a final wash (15 cycles of 4 mixes/cycle with Wash Buffer A at 30° C. The holding temperature is 25° C.). All Wash Buffers and Stain Cocktails are those provided in the GeneChip® Hybridization, Wash and Stain Kit, Manufatured for Affymetrix, Inc., by Ambion, Inc., available from Affymetrix. In one embodiment, the stain used is R-Phycoerythrin Streptavidin, available from Molecular Probes. The antibody used is anti-streptavidin antibody (goat) biotinylated, available from Vector Laboratories.
Data Collection and Processing
When using an array to determine a gene expression profile, the data from the array must be obtained and processed. The data may then be used for any of the purposes set forth herein, such as to predict the outcome of a therapeutic treatment or to classify a patient as a responder or nonresponder.
Following appropriate hybridization and wash steps, the substrate containing the array set and hybridized target is scanned. Data is then collected and may be saved as both an image and a text file. Precise databases and tracking of files should be maintained regarding the location of the array elements on the substrates. Information on the location and names of genes should also be maintained. The files may then be imported to software programs that perform image analysis and statistical analysis functions.
The gene expression profile of a patient of interest is then determined from the collected data. This may be done using any standard method that permits qualitative or quantitative measurements as described herein. Appropriate statistical methods may then be used to predict the significance of the variation in the gene expression profile, and the probability that the patient's gene expression profile is within the category of non-responder or responder. For example, in one embodiment, the data may be collected, then analyzed such that a class determination may be made (i.e., categorizing a patient as a responder or nonresponder) using a class prediction algorithm and GeneSpring™ software as described below.
Expression patterns can be evaluated by qualitative and/or quantitative measures. Qualitative methods detect differences in expression that classify expression into distinct modes without providing significant information regarding quantitative aspects of expression. For example, a technique can be described as a qualitative technique if it detects the presence or absence of expression of a candidate nucleotide sequence, i.e., an on/off pattern of expression. Alternatively, a qualitative technique measures the presence (and/or absence) of different alleles, or variants, of a gene product.
In contrast, some methods provide data that characterize expression in a quantitative manner. That is, the methods relate expression on a numerical scale, e.g., a scale of 0-5, a scale of 1-10, a scale of +-+++, from grade 1 to grade 5, a grade from a to z, or the like. It will be understood that the numerical, and symbolic examples provided are arbitrary, and that any graduated scale (or any symbolic representation of a graduated scale) can be employed in the context of the present invention to describe quantitative differences in nucleotide sequence expression. Typically, such methods yield information corresponding to a relative increase or decrease in expression.
Any method that yields either quantitative or qualitative expression data is suitable for evaluating expression. In some cases, e.g., when multiple methods are employed to determine expression patterns for a plurality of candidate nucleotide sequences, the recovered data, e.g., the expression profile, for the nucleotide sequences is a combination of quantitative and qualitative data.
In some applications, expression of the plurality of candidate nucleotide sequences is evaluated sequentially. This is typically the case for methods that can be characterized as low- to moderate-throughput. In contrast, as the throughput of the elected assay increases, expression for the plurality of candidate nucleotide sequences in a sample or multiple samples is assayed simultaneously. Again, the methods (and throughput) are largely determined by the individual practitioner, although, typically, it is preferable to employ methods that permit rapid, e.g. automated or partially automated, preparation and detection, on a scale that is time-efficient and cost-effective.
It is understood that the preceding discussion is directed at both the assessment of expression of the members of candidate libraries and to the assessment of the expression of members of diagnostic nucleotide sets.
Many techniques have been applied to the problem of making sense of large amounts of gene expression data. Cluster analysis techniques (e.g., K-Means), self-organizing maps (SOM), principal components analysis (PCA), and other analysis techniques are all widely available in packaged software used in correlating this type of gene expression data.
Class Prediction
In one embodiment, the data obtained may be analyzed using a class prediction algorithm to predict whether a subject is a non-responder or a responder, as defined above. Class prediction is a supervised learning method in which the algorithm learns from samples with known class membership (the training set) and establishes a prediction rule to classify new samples (the test set). Class prediction consists of several steps. The first is feature selection, a process by which genes within a defined gene set are scored for their ability to distinguish between classes (responders and non-responders) in the training set. Genes may be selected for uses as predictors, by individual examination and ranking based on the power of the gene to discriminate responders from non-responders. Genes may then be scored on the basis of the best prediction point for responders or non-responders. The score function is the negative natural logarithm of the p-value for a hypergeometric test of predicted versus actual group membership for responder versus non-responder. A combined list for responders and non-responders for the most discriminating genes may then be produced, up to the number of predictor genes specified by the user. The Golub method may then be used to test each gene considered for the predictor gene set for its ability to discriminate responders from non-responders using a signal-to-noise ratio. Genes with the highest scores may then be kept for subsequent calculations. A subset of genes with high predictive strength may then used in class prediction, with cross validation performed using the known groups from the training set. The K-nearest neighbors approach may be used to classify training set samples during cross validation, and to classify test set samples once the predictive rule had been established. In this system, each sample is classified by finding the K-nearest neighboring training set samples (where K is the number of neighbors defined by the user) plotted based in Euclidean space over normalized expression intensity for each of the genes in the predictor set. For example, a predictive gene set of twenty members may be selected using four nearest neighbors. Depending on the number of samples available, the k value may vary. The class membership of the selected number of nearest neighbors to each sample is enumerated and p-values computed to determine the likelihood of seeing at least the observed number of neighbors from each class relative to the whole training set by chance in a K-sized neighborhood. With this method, the confidence in class prediction is best determined by the ratio of the smallest p-value and the second smallest p-value, termed the decision cut-off p-value. If it is lower, the test sample is classified as the class corresponding to the smallest p-value. If it is higher, a prediction is not made. In one embodiment, a decision cut-off p-value ratio of about 0.5 may be used. Cross validation in GeneSpring may then be then done by a drop-one-out algorithm, in which the accuracy of the prediction rule is tested. This approach removes one sample from the training set and uses it as a test sample. By predicting the class of a given sample only after it is removed from the training set, the rule makes unbiased prediction of the sample class. Once performance of the predictive rule has been optimized in this fashion, it may be tested using additional samples.
Cluster Analysis
Cluster analysis is a loose term covering many different algorithms for grouping data. Clustering can be divided into two main types: top-down and bottom-up. Top-down clustering starts with a given number of clusters or classes and proceeds to partition the data into these classes. Bottom-up clustering starts by grouping data at the lowest level and builds larger groups by bringing the smaller groups together at the next highest level.
K-Means is an example of top-down clustering. K-means groups data into K number of best-fit clusters. Before using the algorithm, the user defines the number of clusters that are to be used to classify the data (K clusters). The algorithm randomly assigns centers to each cluster and then partitions the nearest data into clusters with those centers. The algorithm then iteratively finds new centers by averaging over the data in the cluster and reassigning data to new clusters as the centers change. The analysis iteratively continues until the centers no longer move (Sherlock, G., Current Opinion in Immunology, 12:201, 2000).
Tree clustering is an example of bottom-up clustering. Tree clustering joins data together by assigning nearest pairs as leaves on the tree. When all pairs have been assigned (often according to either information-theoretical criteria or regression methods), the algorithm progresses up to the next level joining the two nearest groups from the prior level as one group. Thus, the number and size of the clusters depends on the level. Often, the fewer clusters, the larger each cluster will be. The stoppage criteria for such algorithms varies, but often is determined by an analysis of the similarity of the members inside the cluster compared to the difference across the clusters.
Self-organizing maps (SOMs) are competitive neural networks that group input data into nearest neighbors (Torklcola. K., et al., Information Sciences, 139:79, 2001; Toronen, P., et al., FEBS Letters, 451:142 146, 1999). As data is presented to the neural network, neurons whose weights currently are capable of capturing that data (the winner neuron) are updated toward the input. Updating the weights, or training the neural net, shifts the recognition space of each neuron toward a center of similar data. SOMs are similar to K-means with the added constraint that all centers are on a 1 or 2 dimensional manifold (i.e., the feature space is mapped into a 1 or 2 dimensional array, where new neighborhoods are formed). In SOM, the number of neurons is chosen to be much larger than the possible number of the clusters. It is hoped that the clusters of trained neurons will provide a good estimation of the number of the neurons. In many cases, however, a number of small clusters are formed around the larger clusters, and there is no practical way of distinguishing such smaller clusters from, or of merging them into, the larger clusters. In addition, there is no guarantee that the resulting clusters of genes actually exhibit statistically independent expression profiles. Thus, the members of two different clusters may exhibit similar patterns of gene expression.
Principal component analysis (PCA), although not a clustering technique in its nature (Jolliffe, I. T., Principal Component Analysis, New York: Springer-Verlag, 1986) can also be used for clustering (Yeung, K. Y., et al., Bioinformatics, 17:763, 2001). PCA is a stepwise analysis that attempts to create a new component axis at each step that contains most of the variation seen for the data. Thus, the first component explains the first most important basis for the variation in the data, the second component explains the second most important basis for the variation in the data, the third component the third most important basis, and so on. PCA projects the data into a new space spanned by the principal components. Each successive principal component is selected to be orthogonal to the previous ones, and to capture the maximum information that is not already present in the previous components. The principal components are therefore linear combinations (or eigenarrays) of the original data. These principal components are the classes of data in the new coordinate generated by PCA. If the data is highly non-correlated, then the number of significant principal components can be as high as the number of original data values. If, as in the case of DNA microarray experiments, the data is expected to correlate among groups, than the data should be described by a set of components which is fewer than the full complement of data points.
A variety of systems known in the art may be used for image analysis and compiling the data. For example, where the mRNA is labeled with a fluorescent tag, and fluorescence imaging system (such as the microarray processor commercially available from AFFYMETRIX®, Santa Clara, Calif.) may be used to capture, and quantify the extent of hybridization at each address. Or, in the case where the mRNA is radioactive, the array may be exposed to X-ray film and a photographic image made. Once the data is collected, it may be compiled to quantify the extent of hybridization at each address as for example, using software to convert the measured signal to a numerical value.
Any publicly available imaging software may be used. Examples include BioDiscovery (ImaGene), Axon Instruments (GenePix Pro 6.0), EisenLab—Stanford University (ScanAlyze), Spotfinder (TIGR), Imaxia (ArrayFox), F-Scan (Analytical Biostatistics Section—NIH), MicroDiscovery (GeneSpotter), CLONDIAG(IconoClust), Koada Technology (Koadarray), Vigene Tech (MicroVigene), Nonlinear Dynamics (Phoretix), CSIRO Mathematical and Information Sciences (SPOT) Niles Scientific (SpotReader).
Any commercially available data analysis software may also be used. Examples include, BRB Array Tools (Biometric Research Branch—NCI), caGEDA (University of Pittsburgh), Cleaver 1.0 (Stanford Biomedical Informatics), ChipSC2C (Peterson Lab—Baylor College of Medicine), Cluster (Eisen Lab—Stanford/UC Berkeley), DNA-Chip Analyzer (dChip) (Wong Laboratory—Harvard University), Expression Profiler (European Bioinformatics Institute), FuzzyK (Eisen Lab—Stanford/UC Berkeley), GeneCluster 2.0 (Broad Institute), GenePattern (Broad Institute), GeneXPress (Stanford University), Genesis (Alexander Sturn—Graz University of Technology), GEPAS (Spanish National Cancer Center), GLR (University of Utah), GQL (Max Planck Institute for Molecular Genetics), INCLUSive (Katholieke Universiteit Leuven), Maple Tree (Eisen Lab—Stanford/UC Berkeley) MeV (TIGR) MIDAS (TIGR), Onto-Tools (Sorin Draghici—Wayne State University), Short Time—series Expression Miner (Carnegie Mellon University), Significance Analysis of Microarrays (Rob Tibshirani—Stanford University), SNOMAD (Johns Hopkins Schools of Medicine and Public Health), SparseLOGREG (Shevade & Keerthi—National University of Singapore), SuperPC Microarrays (Rob Tibshirani—Stanford University), TableView (University of Minnesota), TreeView (Eisen Lab—Stanford/UC Berkeley), Venn Mapper (Universitais Medisch Centrum Rotterdam), Applied Maths (GeneMaths XT), Array Genetics (AffyMate), Axon Instruments (Acuity 4.0) BioDiscovery (GeneSight), BioSieve (ExpressionSieve), CytoGenomics (SilicoCyte), Microarray Data Analysis (GeneSifter), MediaCybernetics (ArrayPro Analyzer), Microarray Fuzzy Clustering (BioRainbow), Molmine (J-Express Pro), Optimal Design (ArrayMiner), Partek (Partek Pro) Predictive Patterns Software (GeneLinker), Promoter Extractor (BioRainbow) SAS Microarray Silicon Genetics (GeneSpring), Spotfire (Spotfire), Strand Genomics (Avadis) Vialogy Corp.
It should also be understood that confounding factors may exist in individual subjects that may affect the ability of a given gene set to predict responders versus non-responders. These cofounding variables include variation in medications, such as cases in which concurrent 6-MP with infliximab overcomes the adverse effects of an unfavorable FasL polymorphism on response, the CARD15 genotype status, or the location of the biopsy, due to variation of gene expression along the colon. To account for this variation, outliers may be identified, and subsequently determined whether the ouliers may be accounted for by variations in medication use, CARD15 genotype, or the location of the colon biopsy.
Kits
In an additional aspect, the present invention provides kits embodying the methods, compositions, and systems for analysis of gene expression as described herein. Kits of the present invention may comprise one or more of the following: a) at least one pair of universal primers; b) at least one pair of target-specific primers, wherein the primers are specific to one or more sequences listed in Tables 4-8 or the sequence listing; c) at least one pair of reference gene-specific primers; and d) one or more amplification reaction enzymes, reagents, or buffers. The universal primers provided in the kit may include labeled primers. The target-specific primers may vary from kit to kit, depending upon the specified target gene(s) to be investigated, and may also be labeled. Exemplary reference gene-specific primers (e.g., target-specific primers for directing transcription of one or more reference genes) include, but are not limited to, primers for β-actin, cyclophilin, GAPDH, and various rRNA molecules.
The kits of the invention optionally include one or more preselected primer sets that are specific for the genes to be amplified. The preselected primer sets optionally comprise one or more labeled nucleic acid primers, contained in suitable receptacles or containers. Exemplary labels include, but are not limited to, a fluorophore, a dye, a radiolabel, an enzyme tag, etc., that is linked to a nucleic acid primer itself.
In addition, one or more materials and/or reagents required for preparing a biological sample for gene expression analysis are optionally included in the kit. Furthermore, optionally included in the kits are one or more enzymes suitable for amplifying nucleic acids, including various polymerases (RT, Tag, etc.), one or more deoxynucleotides, and buffers to provide the necessary reaction mixture for amplification.
In one embodiment of the invention, the kits are employed for analyzing gene expression patterns using mRNA as the starting template. The mRNA template may be presented as either total cellular RNA or isolated mRNA. In other embodiments, the methods and kits described in the present invention allow quantification of other products of gene expression, including tRNA, rRNA, or other transcription products. In still further embodiments, other types of nucleic acids may serve as template in the assay, including genomic or extragenomic DNA, viral RNA or DNA, or nucleic acid polymers generated by non-replicative or artificial mechanism, including PNA or RNA/DNA copolymers.
Optionally, the kits of the present invention further include software to expedite the generation, analysis and/or storage of data, and to facilitate access to databases. The software includes logical instructions, instructions sets, or suitable computer programs that can be used in the collection, storage and/or analysis of the data. Comparative and relational analysis of the data is possible using the software provided.
Array Sets 1-5 are listed below in Tables 4-8.
A biological sample is obtained via standard biopsy techniques from the ascending colon of a patient diagnosed with Crohn's Disease. A control biopsy is obtained from a matched segment of the colon from a normal subject (not diagnosed with an IBD). The biopsy is obtained at the time of diagnosis. The biological sample is placed in RNAlater™ and stored on ice until processing. Total RNA is prepared utilizing the Qiagen RNeasy mini-column. RNA quality is then assessed using the Agilent 2100 Bioanalyzer. About 400 to about 500 nanograms of total RNA are used. The RNA is then labeled using the TargetAmp 1-Round Aminoallyl-aRNA Amplification Kit available from Epicentre (726 Post Road Madison, Wis. 53713 U.S.A.) to prepare cRNA, following the manufacturer's instructions. The TargetAmpl-Round Aminoallyl-aRNA Amplification Kit (Epicentre) is used to make double-stranded cDNA from total RNA. An in vitro transcription reaction creates cRNA target. Biotin-X-X-NHS (Epicentre) is used to label the aminoallyl-aRNA with biotin following the manufacturer's instructions.
The biotin-labeled cRNA target is then chemically fragmented and hybridized to an Affymetrix Genechip Array, HGU133 Plus Version 2 Affymetrix GeneChip, available from Affymetrix (3420 Central Expressway, Santa Clara, Calif. 95051). A hybridization cocktail is prepared, containing 0.034 ug/uL fragmented cRNA, 50 pM Control Oligonucleotide B2 (Affymetrix), 20× Eukaryotic Hybridization Controls (1.5 pM bioB, 5 pM bioC. 25 pM bioD, 100 pM cre) (Affymetrix), 0.1 mg/mL Herring Sperm DNA (Promega, 2800 Woods Hollow Road, Madison, Wis. 53711 USA), 0.5 mg/mL Acetylated BSA (Invitrogen), and 1× Hybridization Buffer. The hybridization cocktail is heated to 99° C. for 5 minutes, to 45° C. for 5 minutes, and spun at maximum speed in a microcentrifuge for 5 minutes. The probe array is then filled with 200 uL of 1× Hybridization Buffer and incubated at 45° C. for 10 minutes in the GeneChip Hybridization Oven 640 (Affymetrix) while rotating at 60 rpm. The 1× Hybridization Buffer is removed and the probe array filled with 200 uL of the hybridization cocktail. The probe array is then incubated at 45° C. for 16 hrs in a Hybridization Oven rotating at 60 rpm.
The array is then washed and stained using the Fluidics Station 450 (Affymetrix) and the fluidics protocol EukGE-WS2v4—450 (Affymetrix). The stain used is R-Phycoerythrin Streptavidin, available from Molecular Probes. The antibody used is anti-streptavidin antibody (goat) biotinylated, available from Vector Laboratories.
A labeled sample obtained from a single control is used in each batch of microarray experiments. The gene expression results for the new samples within that batch are normalized to the gene expression results for the common control within that batch to provide normalized results that can then be compared between batches.
The probe arrays are then scanned using the Affymetrix GeneChip
Scanner 3000, using the Genechip Operating Software 1 v4, available from Affymetrix.
Results are interpreted using GeneSpring 7.3 Software, available from Silicon Genetics. Raw data is filtered on an expression level of 10, and then normalized to a uniform internal control RNA from a single healthy control. Each array is then normalized in the same manner. Global scaling is used to adjust the average intensity or signal value of each probe array to the same Target Intensity value (TGT) of about 1500. The internal control genes, GAPDH and B-actin, are used to check the quality of the RNA. The assay quality is determined by comparing the signals of the 3′ probe set to the 5′ probe set of the internal control genes. Acceptable 3′ to 5′ ratios are between about 1 and about 3.
Prokaryotic Spike controls are used to determine whether the hybridization of target RNA to the array occurred properly. To control for chip to chip variation in expression intensities, a common RNA specimen is used, which is labeled and hybridized together with each new batch of biopsy samples.
A biological sample is obtained via standard biopsy techniques from the intestines of a patient diagnosed with an inflammatory bowel disease. A control biopsy is obtained from a matched segment of the colon from a subject diagnosed with an IBD, but known to be a “responder” to first line therapy. The biological sample is placed in RNAlater™ and stored on ice until processing. Total RNA is prepared utilizing the Qiagen RNeasy mini-column. RNA quality is then assessed using the Agilent 2100 Bioanalyzer. About 400 to about 500 nanograms of total RNA are used.
PCR primers corresponding to the genes listed in Table 5 and the housekeeping gene GAPDH are synthesized using techniques known in the art. The PCR primers are radiolabeled and selected such that the primers have a primer length of about 18 to about 24 base pairs, and a GC content of about 35% to about 60%, thus having an annealing temperature of about 55° C. to about 58′C. Longer primers of about 28-30 base pairs may be used at higher annealing temperatures. Melting point and primer-primer interatctions may be determined using commercially available software such as Primer Premier, available from Premier Biosoft International, 3786 Corina Way, Palo Alto, Calif. 94303-4504. The PCR reaction mixture includes 1× PCR buffer, 0.4 uM of each primer, 5% DMSO, and 1 unit Taq polymerase (Life Technologies, Gaithersburg, Md., USA) per 24 uL reaction volume. Nucleotides (dNTP) (Pharmacia Biotech, Piscataway, N.J. USA) are stored as a 100 mM stock solution (25 mM each dATP, dCTP, dGTP and dTTP). The standard 10× PCR buffer is made as described (Perkin-Elmer, Norwalk, Conn., USA) and contains 400 mM KCL, 100 mM Tris-HCl, pH 8.3 (at 24° C.) and 14 mM MgCl2. DMSO, BSA and gycerol may be purchased from Sigma Chemical, St. Louis, Mo., USA. The reaction mixtures are then subjected to the following cycling conditions: a first denatureing step of 94° C. for 4 minutes, a denature step at 94° C. for 30 seconds, an annealing step at 54° C. for 30 s, then an extension step at 65 C for one minute. The samples are subjected to 32 cycles, with a final extension step at 65 C for 3 minutes.
Multiplex PCR products are then separated by size on a standard sequencing gel composed of 5% polyacrylamide, and containing 6M urea and 890 mM Tris-borate and 2 mM EDTA. A radiolabeled DNA ladder is used for size determination of each product. Sample is loaded on the gel and the multiplex reaction mixture is electrophoretically separated by size according to standard conditions, for example, 1.5 hours at 2000 V, 50 mA current. 20 W power, gel temperature of 51 C. Gene expression of the genes listed in Table 5 is then determined by computer imaging (using GeneScan™ software) of the resultant bands corresponding to PCR products for each gene of interest, quantifying the intensity of each band, and comparing relative quantities of each band of the patient of interest to gene expression in a control subject (the “responder” patient). Both the experimental sample and the control subject results are normalized to GAPDH expression in each sample.
The expression pattern of the patient sample is then compared to the training set of 20 responders and 20 non-responders, using the k-nearest neighbors algorithm, to predict whether the patient is likely to be a “responder” or “non-responder” patient, as described above.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US2007/081597 | 10/17/2007 | WO | 00 | 6/16/2010 |
Number | Date | Country | |
---|---|---|---|
60852364 | Oct 2006 | US |