Irritable bowel syndrome (IBS) is the most common of all gastrointestinal disorders, affecting 10-20% of the general population and accounting for more than 50% of all patients with digestive complaints. However, studies suggest that only about 10% to 50% of those afflicted with IBS actually seek medical attention. Patients with IBS present with disparate symptoms such as, for example, abdominal pain predominantly related to defecation, diarrhea, constipation or alternating diarrhea and constipation, abdominal distention, gas, and excessive mucus in the stool. More than 40% of IBS patients have symptoms so severe that they have to take time off from work, curtail their social life, avoid sexual intercourse, cancel appointments, stop traveling, take medication, and even stay confined to their house for fear of embarrassment. The estimated health care cost of IBS in the United States is $8 billion per year (Talley et al., Gastroenterol., 109:1736-1741 (1995)).
The precise pathophysiology of IBS is not well understood. Nevertheless, there is a heightened sensitivity to visceral pain perception, known as peripheral sensitization. This sensitization involves a reduction in the threshold and an increase in the gain of the transduction processes of primary afferent neurons, attributable to a variety of mediators including monoamines (e.g., catecholamines and indoleamines), substance P, and a variety of cytokines and prostanoids such as E-type prostaglandins (see, e.g., Mayer et al., Gastroenterol., 107:271-293 (1994)). Also implicated in the etiopathology of IBS is intestinal motor dysfunction, which leads to abnormal handling of intraluminal contents and/or gas (see, e.g., Kellow et al., Gastroenterol., 92:1885-1893 (1987); Levitt et al., Ann. Int. Med., 124:422-424 (1996)). Psychological factors may also contribute to IBS symptoms appearing in conjunction with, if not triggered by, disturbances including depression and anxiety (see, e.g., Drossman et al., Gastroenterol. Int., 8:47-90 (1995)).
The causes of IBS are not well understood. The walls of the intestines are lined with layers of muscle that contract and relax as they move food from the stomach through the intestinal tract to the rectum. Normally, these muscles contract and relax in a coordinated rhythm. In IBS patients, these contractions are typically stronger and last longer than normal. As a result, food is forced through the intestines more quickly in some cases causing gas, bloating, and diarrhea. In other cases, the opposite occurs: food passage slows and stools become hard and dry causing constipation.
The precise pathophysiology of IBS remains to be elucidated. While gut dysmotility and altered visceral perception are considered important contributors to symptom pathogenesis (Quigley, Scand. J. Gastroenterol., 38(Suppl. 237):1-8 (2003); Mayer et al., Gastroenterol., 122:2032-2048 (2002)), this condition is now generally viewed as a disorder of the brain-gut axis. Recently, roles for enteric infection and intestinal inflammation have also been proposed. Studies have documented the onset of IBS following bacteriologically confirmed gastroenteritis, while others have provided evidence of low-grade mucosal inflammation (Spiller et al., Gut, 47:804-811 (2000); Dunlop et al., Gastroenterol., 125:1651-1659 (2003); Cumberland et al., Epidemiol. Infect., 130:453-460 (2003)) and immune activation (Gwee et al., Gut, 52:523-526 (2003); Pimentel et al., Am. J. Gastroenterol., 95:3503-3506 (2000)) in IBS. The enteric flora has also been implicated, and a recent study demonstrated the efficacy of the probiotic organism Bifidobacterium in treating the disorder through modulation of immune activity (O'Mahony et al., Gastroenterol., 128:541-551 (2005)).
The hypothalamic-pituitary-adrenal axis (HPA) is the core endocrine stress system in humans (De Wied et al., Front. Neuroendocrinol., 14:251-302 (1993)) and provides an important link between the brain and the gut immune system. Activation of the axis takes place in response to both physical and psychological stressors (Dinan, Br. J. Psychiatry, 164:365-371 (1994)), both of which have been implicated in the pathophysiology of IBS (Cumberland et al., Epidemiol. Infect., 130:453-460 (2003)). Patients with IBS have been reported as having an increased rate of sexual and physical abuse in childhood together with higher rates of stressful life events in adulthood (Gaynes et al., Baillieres Clin. Gastroenterol., 13:437-452 (1999)). Such psychosocial trauma or poor cognitive coping strategy profoundly affects symptom severity, daily functioning, and health outcome.
Although the etiology of IBS is not fully characterized, the medical community has developed a consensus definition and criteria, known as the Rome II criteria, to aid in the diagnosis of IBS based upon patient history. The Rome II criteria requires three months of continuous or recurrent abdominal pain or discomfort over a one-year period that is relieved by defecation and/or associated with a change in stool frequency or consistency as well as two or more of the following: altered stool frequency, altered stool form, altered stool passage, passage of mucus, or bloating and abdominal distention. The absence of any structural or biochemical disorders that could be causing the symptoms is also a necessary condition. As a result, the Rome II criteria can be used only when there is a substantial patient history and is reliable only when there is no abnormal intestinal anatomy or metabolic process that would otherwise explain the symptoms. Similarly, the Rome III criteria recently developed by the medical community can be used only when there is presentation of a specific set of symptoms, a detailed patient history, and a physical examination.
It is well documented that diagnosing a patient as having IBS can be challenging due to the similarity in symptoms between IBS and other diseases or disorders. In fact, because the symptoms of IBS are similar or identical to the symptoms of so many other intestinal illnesses, it can take years before a correct diagnosis is made. For example, patients who have inflammatory bowel disease (IBD), but who exhibit mild signs and symptoms such as bloating, diarrhea, constipation, and abdominal pain, may be difficult to distinguish from patients with IBS. As a result, the similarity in symptoms between IBS and IBD renders rapid and accurate diagnosis difficult. The difficulty in differentially diagnosing IBS and IBD hampers early and effective treatment of these diseases. Unfortunately, rapid and accurate diagnostic methods for definitively distinguishing IBS from other intestinal diseases or disorders presenting with similar symptoms are currently not available. The present invention satisfies this need and provides related advantages as well.
The present invention provides methods, systems, and code for accurately classifying whether a sample from an individual is associated with irritable bowel syndrome (IBS). As a non-limiting example, the present invention is useful for classifying a sample from an individual as an IBS sample using a statistical algorithm and/or empirical data. The present invention is also useful for ruling out one or more diseases or disorders that present with IBS-like symptoms and ruling in IBS using a combination of statistical algorithms and/or empirical data. Thus, the present invention provides an accurate diagnostic prediction of IBS and prognostic information useful for guiding treatment decisions.
In one aspect, the present invention provides a method for classifying whether a sample from an individual is associated with IBS, the method comprising:
In some embodiments, the diagnostic marker profile is determined by detecting the presence or level of at least one diagnostic marker selected from the group consisting of a cytokine, growth factor, anti-neutrophil antibody, anti-Saccharomyces cerevisiae antibody (ASCA), antimicrobial antibody, lactoferrin, anti-tissue transglutaminase (tTG) antibody, lipocalin, matrix metalloproteinase (MMP), tissue inhibitor of metalloproteinase (TIMP), alpha-globulin, actin-severing protein, S100 protein, fibrinopeptide, calcitonin gene-related peptide (CGRP), tachykinin, ghrelin, neurotensin, corticotropin-releasing hormone, IBS1, MUC20, VSIG2, CKB, M160, VSIG4, CASP1, NCF4, LYZ, KCNS3, PSME2, MS4A4A, HELLS, COP1, FCGR2A, RFC4, MCM5, TAP2, LRAP, L2DTL and combinations thereof.
In a preferred aspect, the present invention provides a method for classifying whether a sample from an individual is associated with IBS, the method comprising:
In preferred embodiments, the presence or level of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, or more of the biomarkers shown in Table 1 is detected to generate a diagnostic marker profile that is useful for predicting IBS. In certain instances, the biomarkers described herein are analyzed using an immunoassay such as an enzyme-linked immunosorbent assay (ELISA) or an immunohistochemical assay.
In some embodiments, the present invention provides a method for classifying whether a sample from an individual is associated with IBS, the method comprising:
In other embodiments, the method of ruling in IBS comprises determining a diagnostic marker profile optionally in combination with a symptom profile, wherein the symptom profile is determined by identifying the presence or severity of at least one symptom in the individual; and classifying the sample as an IBS sample or non-IBS sample using an algorithm based upon the diagnostic marker profile and the symptom profile.
The symptom profile is typically determined by identifying the presence or severity of at least one symptom selected from the group consisting of chest pain, chest discomfort, heartburn, uncomfortable fullness after having a regular-sized meal, inability to finish a regular-sized meal, abdominal pain, abdominal discomfort, constipation, diarrhea, bloating, abdominal distension, negative thoughts or feelings associated with having pain or discomfort, and combinations thereof.
In preferred embodiments, the presence or severity of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more of the symptoms described herein is identified to generate a symptom profile that is useful for predicting IBS. In certain instances, a questionnaire or other form of written, verbal, or telephone survey is used to produce the symptom profile. The questionnaire or survey typically comprises a standardized set of questions and answers for the purpose of gathering information from respondents regarding their current and/or recent IBS-related symptoms.
In some embodiments, the symptom profile is produced by compiling and/or analyzing all or a subset of the answers to the questions set forth in the questionnaire or survey. In other embodiments, the symptom profile is produced based upon the individual's response to the following question: “Are you currently experiencing any symptoms?” The symptom profile generated in accordance with either of these embodiments can be used in combination with a diagnostic marker profile in the algorithmic-based methods described herein to improve the accuracy of predicting IBS.
In another aspect, the present invention provides a method for classifying whether a sample from an individual is associated with IBS, the method comprising:
In some embodiments, the diagnostic marker profile is determined by detecting the presence or level of at least one diagnostic marker selected from the group consisting of a cytokine, growth factor, anti-neutrophil antibody, ASCA, antimicrobial antibody, lactoferrin, anti-tTG antibody, lipocalin, MMP, TIMP, alpha-globulin, actin-severing protein, S100 protein, fibrinopeptide, CGRP, tachykinin, ghrelin, neurotensin, corticotropin-releasing hormone, IBS1, MUC20, VSIG2, CKB, M160, VSIG4, CASP1, NCF4, LYZ, KCNS3, PSME2, MS4A4A, HELLS, COP1, FCGR2A, RFC4, MCM5, TAP2, LRAP, L2DTL and combinations thereof.
In other embodiments, the method of first ruling out IBD and then ruling in IBS comprises determining a diagnostic marker profile in combination with a symptom profile, wherein the symptom profile is determined by identifying the presence or severity of at least one symptom in the individual; classifying the sample as an IBD sample or non-IBD sample using a first statistical algorithm based upon the diagnostic marker profile and the symptom profile; and if the sample is classified as a non-IBD sample, classifying the non-IBD sample as an IBS sample or non-IBS sample using a second statistical algorithm based upon the same profiles as determined in step (a) or different profiles.
In yet another aspect, the present invention provides a method for monitoring the progression or regression of IBS in an individual, the method comprising:
In some embodiments, the diagnostic marker profile is determined by detecting the presence or level of at least one diagnostic marker selected from the group consisting of a cytokine, growth factor, anti-neutrophil antibody, ASCA, antimicrobial antibody, lactoferrin, anti-tTG antibody, lipocalin, MMP, TIMP, alpha-globulin, actin-severing protein, S100 protein, fibrinopeptide, CGRP, tachykinin, ghrelin, neurotensin, corticotropin-releasing hormone, IBS1, MUC20, VSIG2, CKB, M160, VSIG4, CASP1, NCF4, LYZ, KCNS3, PSME2, MS4A4A, HELLS, COP1, FCGR2A, RFC4, MCM5, TAP2, LRAP, L2DTL and combinations thereof.
In other embodiments, the method of monitoring the progression or regression of IBS comprises determining a diagnostic marker profile optionally in combination with a symptom profile, wherein the symptom profile is determined by identifying the presence or severity of at least one symptom in the individual; and determining the presence or severity of IBS in the individual using an algorithm based upon the diagnostic marker profile and the symptom profile.
In a related aspect, the present invention provides a method for monitoring drug efficacy in an individual receiving a drug useful for treating IBS, the method comprising:
In some embodiments, the diagnostic marker profile is determined by detecting the presence or level of at least one diagnostic marker selected from the group consisting of a cytokine, growth factor, anti-neutrophil antibody, ASCA, antimicrobial antibody, lactoferrin, anti-tTG antibody, lipocalin, MMP, TIMP, alpha-globulin, actin-severing protein, S100 protein, fibrinopeptide, CGRP, tachykinin, ghrelin, neurotensin, corticotropin-releasing hormone, IBS1, MUC20, VSIG2, CKB, M160, VSIG4, CASP1, NCF4, LYZ, KCNS3, PSME2, MS4A4A, HELLS, COPT, FCGR2A, RFC4, MCM5, TAP2, LRAP, L2DTL and combinations thereof.
In other embodiments, the method of monitoring IBS drug efficacy comprises determining a diagnostic marker profile optionally in combination with a symptom profile, wherein the symptom profile is determined by identifying the presence or severity of at least one symptom in the individual; and determining the effectiveness of the drug using an algorithm based upon the diagnostic marker profile and the symptom profile.
In a further aspect, the present invention provides a computer-readable medium including code for controlling one or more processors to classify whether a sample from an individual is associated with IBS, the code comprising:
In some embodiments, the diagnostic marker profile indicates the presence or level of at least one diagnostic marker selected from the group consisting of a cytokine, growth factor, anti-neutrophil antibody, ASCA, antimicrobial antibody, lactoferrin, anti-tTG antibody, lipocalin, MMP, TIMP, alpha-globulin, actin-severing protein, S100 protein, fibrinopeptide, CGRP, tachykinin, ghrelin, neurotensin, corticotropin-releasing hormone, IBS1, MUC20, VSIG2, CKB, M160, VSIG4, CASP1, NCF4, LYZ, KCNS3, PSME2, MS4A4A, HELLS, COP1, FCGR2A, RFC4, MCM5, TAP2, LRAP, L2DTL and combinations thereof.
In other embodiments, the computer-readable medium for ruling in IBS comprises instructions to apply a statistical process to a data set comprising a diagnostic marker profile optionally in combination with a symptom profile which indicates the presence or severity of at least one symptom in the individual to produce a statistically derived decision classifying the sample as an IBS sample or non-IBS sample based upon the diagnostic marker profile and the symptom profile.
In a related aspect, the present invention provides a computer-readable medium including code for controlling one or more processors to classify whether a sample from an individual is associated with IBS, the code comprising:
In some embodiments, the diagnostic marker profile indicates the presence or level of at least one diagnostic marker selected from the group consisting of a cytokine, growth factor, anti-neutrophil antibody, ASCA, antimicrobial antibody, lactoferrin, anti-tTG antibody, lipocalin, MMP, TIMP, alpha-globulin, actin-severing protein, S100 protein, fibrinopeptide, CGRP, tachykinin, ghrelin, neurotensin, corticotropin-releasing hormone, IBS1, MUC20, VSIG2, CKB, M160, VSIG4, CASP1, NCF4, LYZ, KCNS3, PSME2, MS4A4A, HELLS, COP1, FCGR2A, RFC4, MCM5, TAP2, LRAP, L2DTL and combinations thereof.
In other embodiments, the computer-readable medium for first ruling out IBD and then ruling in IBS comprises instructions to apply a first statistical process to a data set comprising a diagnostic marker profile optionally in combination with a symptom profile which indicates the presence or severity of at least one symptom in the individual to produce a statistically derived decision classifying the sample as an IBD sample or non-IBD sample based upon the diagnostic marker profile and the symptom profile; and if the sample is classified as a non-IBD sample, instructions to apply a second statistical process to the same or different data set to produce a second statistically derived decision classifying the non-IBD sample as an IBS sample or non-IBS sample.
In an additional aspect, the present invention provides a system for classifying whether a sample from an individual is associated with IBS, the system comprising:
In some embodiments, the diagnostic marker profile indicates the presence or level of at least one diagnostic marker selected from the group consisting of a cytokine, growth factor, anti-neutrophil antibody, ASCA, antimicrobial antibody, lactoferrin, anti-tTG antibody, lipocalin, MMP, TIMP, alpha-globulin, actin-severing protein, S100 protein, fibrinopeptide, CGRP, tachykinin, ghrelin, neurotensin, corticotropin-releasing hormone, IBS1, MUC20, VSIG2, CKB, M160, VSIG4, CASP1, NCF4, LYZ, KCNS3, PSME2, MS4A4A, HELLS, COP1, FCGR2A, RFC4, MCM5, TAP2, LRAP, L2DTL and combinations thereof.
In other embodiments, the system for ruling in IBS comprises a data acquisition module configured to produce a data set comprising a diagnostic marker profile optionally in combination with a symptom profile which indicates the presence or severity of at least one symptom in the individual; a data processing module configured to process the data set by applying a statistical process to the data set to produce a statistically derived decision classifying the sample as an IBS sample or non-IBS sample based upon the diagnostic marker profile and the symptom profile; and a display module configured to display the statistically derived decision.
In a related aspect, the present invention provides a system for classifying whether a sample from an individual is associated with IBS, the system comprising:
In some embodiments, the diagnostic marker profile indicates the presence or level of at least one diagnostic marker selected from the group consisting of a cytokine, growth factor, anti-neutrophil antibody, ASCA, antimicrobial antibody, lactoferrin, anti-tTG antibody, lipocalin, MMP, TIMP, alpha-globulin, actin-severing protein, S100 protein, fibrinopeptide, CGRP, tachykinin, ghrelin, neurotensin, corticotropin-releasing hormone, IBS1, MUC20, VSIG2, CKB, M160, VSIG4, CASP1, NCF4, LYZ, KCNS3, PSME2, MS4A4A, HELLS, COP1, FCGR2A, RFC4, MCM5, TAP2, LRAP, L2DTL and combinations thereof.
In other embodiments, the system for first ruling out IBD and then ruling in IBS comprises a data acquisition module configured to produce a data set comprising a diagnostic marker profile optionally in combination with a symptom profile which indicates the presence or severity of at least one symptom in the individual; a data processing module configured to process the data set by applying a first statistical process to the data set to produce a first statistically derived decision classifying the sample as an IBD sample or non-IBD sample based upon the diagnostic marker profile and the symptom profile; if the sample is classified as a non-IBD sample, a data processing module configured to apply a second statistical process to the same or different data set to produce a second statistically derived decision classifying the non-IBD sample as an IBS sample or non-IBS sample; and a display module configured to display the first and/or the second statistically derived decision.
Other objects, features, and advantages of the present invention will be apparent to one of skill in the art from the following detailed description and figures.
Diagnosing a patient as having irritable bowel syndrome (IBS) can be challenging due to the similarity in symptoms between IBS and other diseases or disorders. For example, patients who have inflammatory bowel disease (IBD), but who exhibit mild signs and symptoms such as bloating, diarrhea, constipation, and abdominal pain can be difficult to distinguish from patients with IBS. As a result, the similarity in symptoms between IBS and IBD renders rapid and accurate diagnosis difficult and hampers early and effective treatment of the disease.
The present invention is based, in part, upon the surprising discovery that the accuracy of classifying a biological sample from an individual as an IBS sample can be substantially improved by detecting the presence or level of certain diagnostic markers (e.g., cytokines, growth factors, anti-neutrophil antibodies, anti-Saccharomyces cerevisiae antibodies, antimicrobial antibodies, lactoferrin, etc.), alone or in combination with identifying the presence or severity of IBS-related symptoms based upon the individual's response to one or more questions (e.g., “Are you currently experiencing any symptoms?”).
As used herein, the following terms have the meanings ascribed to them unless specified otherwise.
The term “classifying” includes “to associate” or “to categorize” a sample with a disease state. In certain instances, “classifying” is based on statistical evidence, empirical evidence, or both. In certain embodiments, the methods and systems of classifying use a so-called training set of samples having known disease states. Once established, the training data set serves as a basis, model, or template against which the features of an unknown sample are compared, in order to classify the unknown disease state of the sample. In certain instances, classifying the sample is akin to diagnosing the disease state of the sample. In certain other instances, classifying the sample is akin to differentiating the disease state of the sample from another disease state.
The term “irritable bowel syndrome” or “IBS” includes a group of functional bowel disorders characterized by one or more symptoms including, but not limited to, abdominal pain, abdominal discomfort, change in bowel pattern, loose or more frequent bowel movements, diarrhea, and constipation, typically in the absence of any apparent structural abnormality. There are at least three forms of IBS, depending on which symptom predominates: (1) diarrhea-predominant (IBS-D); (2) constipation-predominant (IBS-C); and (3) IBS with alternating stool pattern (IBS-A). IBS can also occur in the form of a mixture of symptoms (IBS-M). There are also various clinical subtypes of IBS, such as post-infectious IBS (IBS-PI).
The term “sample” includes any biological specimen obtained from an individual. Suitable samples for use in the present invention include, without limitation, whole blood, plasma, serum, saliva, urine, stool, sputum, tears, any other bodily fluid, tissue samples (e.g., biopsy), and cellular extracts thereof (e.g., red blood cellular extract). In a preferred embodiment, the sample is a serum sample. The use of samples such as serum, saliva, and urine is well known in the art (see, e.g., Hashida et al., J. Clin. Lab. Anal., 11:267-86 (1997)). One skilled in the art will appreciate that samples such as serum samples can be diluted prior to the analysis of marker levels.
The term “biomarker” or “marker” includes any diagnostic marker such as a biochemical marker, serological marker, genetic marker, or other clinical or echographic characteristic that can be used to classify a sample from an individual as an IBS sample or to rule out one or more diseases or disorders associated with IBS-like symptoms in a sample from an individual. The term “biomarker” or “marker” also encompasses any classification marker such as a biochemical marker, serological marker, genetic marker, or other clinical or echographic characteristic that can be used to classify IBS into one of its various forms or clinical subtypes. Non-limiting examples of diagnostic markers suitable for use in the present invention are described below and include cytokines, growth factors, anti-neutrophil antibodies, anti-Saccharomyces cerevisiae antibodies, antimicrobial antibodies, anti-tissue transglutaminase (tTG) antibodies, lipocalins, matrix metalloproteinases (MMPs), tissue inhibitor of metalloproteinases (TIMPs), alpha-globulins, actin-severing proteins, S 100 proteins, fibrinopeptides, calcitonin gene-related peptide (CGRP), tachykinins, ghrelin, neurotensin, corticotropin-releasing hormone (CRH), elastase, C-reactive protein (CRP), lactoferrin, anti-lactoferrin antibodies, calprotectin, hemoglobin, NOD2/CARD 15, serotonin reuptake transporter (SERT), tryptophan hydroxylase-1,5-hydroxytryptamine (5-HT), lactulose, IBS1, MUC20, VSIG2, CKB, M160, VSIG4, CASP1, NCF4, LYZ, KCNS3, PSME2, MS4A4A, HELLS, COP1, FCGR2A, RFC4, MCM5, TAP2, LRAP, L2DTL and the like. Examples of classification markers include, without limitation, leptin, SERT, tryptophan hydroxylase-1,5-HT, antrum mucosal protein 8, keratin-8, claudin-8, zonulin, corticotropin releasing hormone receptor-1 (CRHR1), corticotropin releasing hormone receptor-2 (CRHR2) and the like. In some embodiments, diagnostic markers can be used to classify IBS into one of its various forms or clinical subtypes. In other embodiments, classification markers can be used to classify a sample as an IBS sample or to rule out one or more diseases or disorders associated with IBS-like symptoms. One skilled in the art will know of additional diagnostic and classification markers suitable for use in the present invention.
As used herein, the term “profile” includes any set of data that represents the distinctive features or characteristics associated with a disease or disorder such as IBS or IBD. The term encompasses a “diagnostic marker profile” that analyzes one or more diagnostic markers in a sample, a “symptom profile” that identifies one or more IBS-related clinical factors (i.e., symptoms) an individual is experiencing or has experienced, and combinations thereof. For example, a “diagnostic marker profile” can include a set of data that represents the presence or level of one or more diagnostic markers associated with IBS and/or IBD. Likewise, a “symptom profile” can include a set of data that represents the presence, severity, frequency, and/or duration of one or more symptoms associated with IBS and/or IBD.
The term “individual,” “subject,” or “patient” typically refers to humans, but also to other animals including, e.g., other primates, rodents, canines, felines, equines, ovines, porcines, and the like.
As used herein, the term “substantially the same amino acid sequence” includes an amino acid sequence that is similar, but not identical to, the naturally-occurring amino acid sequence. For example, an amino acid sequence that has substantially the same amino acid sequence as a naturally-occurring peptide, polypeptide, or protein can have one or more modifications such as amino acid additions, deletions, or substitutions relative to the amino acid sequence of the naturally-occurring peptide, polypeptide, or protein, provided that the modified sequence retains substantially at least one biological activity of the naturally-occurring peptide, polypeptide, or protein such as immunoreactivity. Comparison for substantial similarity between amino acid sequences is usually performed with sequences between about 6 and 100 residues, preferably between about 10 and 100 residues, and more preferably between about 25 and 35 residues. A particularly useful modification of a peptide, polypeptide, or protein of the present invention, or a fragment thereof, is a modification that confers, for example, increased stability. Incorporation of one or more D-amino acids is a modification useful in increasing stability of a polypeptide or polypeptide fragment. Similarly, deletion or substitution of lysine residues can increase stability by protecting the polypeptide or polypeptide fragment against degradation.
The term “monitoring the progression or regression of IBS” includes the use of the methods, systems, and code of the present invention to determine the disease state (e.g., presence or severity of IBS) of an individual. In certain instances, the results of an algorithm (e.g., a learning statistical classifier system) are compared to those results obtained for the same individual at an earlier time. In some embodiments, the methods, systems, and code of the present invention can be used to predict the progression of IBS, e.g., by determining a likelihood for IBS to progress either rapidly or slowly in an individual based on an analysis of diagnostic markers and/or the identification or IBS-related symptoms. In other embodiments, the methods, systems, and code of the present invention can be used to predict the regression of IBS, e.g., by determining a likelihood for IBS to regress either rapidly or slowly in an individual based on an analysis of diagnostic markers and/or the identification or IBS-related symptoms.
The term “monitoring drug efficacy in an individual receiving a drug useful for treating IBS” includes the use of the methods, systems, and code of the present invention to determine the effectiveness of a therapeutic agent for treating IBS after it has been administered. In certain instances, the results of an algorithm (e.g., a learning statistical classifier system) are compared to those results obtained for the same individual before initiation of use of the therapeutic agent or at an earlier time in therapy. As used herein, a drug useful for treating IBS is any compound or drug used to improve the health of the individual and includes, without limitation, IBS drugs such as serotonergic agents, antidepressants, chloride channel activators, chloride channel blockers, guanylate cyclase agonists, antibiotics, opioids, neurokinin antagonists, antispasmodic or anticholinergic agents, belladonna alkaloids, barbiturates, glucagon-like peptide-1 (GLP-1) analogs, corticotropin releasing factor (CRF) antagonists, probiotics, free bases thereof, pharmaceutically acceptable salts thereof, derivatives thereof, analogs thereof, and combinations thereof.
The teen “therapeutically effective amount or dose” includes a dose of a drug that is capable of achieving a therapeutic effect in a subject in need thereof. For example, a therapeutically effective amount of a drug useful for treating IBS can be the amount that is capable of preventing or relieving one or more symptoms associated with IBS. The exact amount can be ascertainable by one skilled in the art using known techniques (see, e.g., Lieberman, Pharmaceutical Dosage Forms, Vols. 1-3 (1992); Lloyd, The Art, Science and Technology of Pharmaceutical Compounding (1999); Pickar, Dosage Calculations (1999); and Remington: The Science and Practice of Pharmacy, 20th Edition, Gennaro, Ed., Lippincott, Williams & Wilkins (2003)).
The present invention provides methods, systems, and code for accurately classifying whether a sample from an individual is associated with irritable bowel syndrome (IBS). In some embodiments, the present invention is useful for classifying a sample from an individual as an IBS sample by applying a statistical algorithm (e.g., a learning statistical classifier system) and/or empirical data (e.g., the presence or level of an IBS marker). The present invention is also useful for ruling out one or more diseases or disorders that present with IBS-like symptoms and ruling in IBS by applying a combination of statistical algorithms and/or empirical data. Accordingly, the present invention provides an accurate diagnostic prediction of IBS and prognostic information useful for guiding treatment decisions.
In one aspect, the present invention provides a method for classifying whether a sample from an individual is associated with IBS, the method comprising:
In some embodiments, the diagnostic marker profile is determined by detecting the presence or level of at least one diagnostic marker selected from the group consisting of a cytokine, growth factor, anti-neutrophil antibody, anti-Saccharomyces cerevisiae antibody (ASCA), antimicrobial antibody, lactoferrin, anti-tissue transglutaminase (tTG) antibody, lipocalin, matrix metalloproteinase (MMP), tissue inhibitor of metalloproteinase (TIMP), alpha-globulin, actin-severing protein, S100 protein, fibrinopeptide, calcitonin gene-related peptide (CGRP), tachykinin, ghrelin, neurotensin, corticotropin-releasing hormone, IBS1, MUC20, VSIG2, CKB, M160, VSIG4, CASP1, NCF4, LYZ, KCNS3, PSME2, MS4A4A, HELLS, COP1, FCGR2A, RFC4, MCM5, TAP2, LRAP, L2DTL and combinations thereof.
In other embodiments, the presence or level of at least two, three, four, five, six, seven, eight, nine, ten, or more diagnostic markers are determined in the individual's sample. In certain instances, the cytokine comprises one or more of the cytokines described below. Preferably, the presence or level of IL-8, IL-1β, TNF-related weak inducer of apoptosis (TWEAK), leptin, osteoprotegerin (OPG), GROα, CXCL4/PF-4, and/or CXCL7/NAP-2 is determined in the individual's sample. In certain other instances, the growth factor comprises one or more of the growth factors described below. Preferably, the presence or level of epidermal growth factor (EGF), vascular endothelial growth factor (VEGF), pigment epithelium-derived factor (PEDF), brain-derived neurotrophic factor (BDNF), and/or amphiregulin (SDGF) is determined in the individual's sample.
In some instances, the anti-neutrophil antibody comprises ANCA, pANCA, cANCA, NSNA, SAPPA, and combinations thereof. In other instances, the ASCA comprises ASCA-IgA, ASCA-IgG, ASCA-IgM, and combinations thereof. In further instances, the antimicrobial antibody comprises an anti-OmpC antibody, anti-flagellin antibody, anti-I2 antibody, and combinations thereof.
In certain instances, the lipocalin comprises one or more of the lipocalins described below. Preferably, the presence or level of neutrophil gelatinase-associated lipocalin (NGAL) and/or a complex of NGAL and a matrix metalloproteinase (e.g., NGAL/MMP-9 complex) is determined in the individual's sample. In other instances, the matrix metalloproteinase (MMP) comprises one or more of the MMPs described below. Preferably, the presence or level of MMP-9 is determined in the individual's sample. In further instances, the tissue inhibitor of metalloproteinase (TIMP) comprises one or more of the TIMPs described below. Preferably, the presence or level of TIMP-1 is determined in the individual's sample. In yet further instances, the alpha-globulin comprises one or more of the alpha-globulins described below. Preferably, the presence or level of alpha-2-macroglobulin, haptoglobin, and/or orosomucoid is determined in the individual's sample.
In certain other instances, the actin-severing protein comprises one or more of the actin-severing protein described below. Preferably, the presence or level of gelsolin is determined in the individual's sample. In additional instances, the S100 protein comprises one or more of the S100 proteins described below including, for example, calgranulin. In yet other instances, the fibrinopeptide comprises one or more of the fibrinopeptides described below. Preferably, the presence or level of fibrinopeptide A (FIBA) is determined in the individual's sample. In further instances, the presence or level of a tachykinin such as Substance P, neurokinin A, and/or neurokinin B is determined in the individual's sample. The presence or level of other diagnostic markers such as, for example, anti-lactoferrin antibody, L-selectin/CD62L, elastase, C-reactive protein (CRP), calprotectin, anti-U1-70 kDa autoantibody, zona occludens 1 (ZO-1), vasoactive intestinal peptide (VIP), serum amyloid A, and/or gastrin can also be determined.
In preferred embodiments, the present invention provides a method for classifying whether a sample from an individual is associated with IBS, the method comprising:
The sample used for detecting or determining the presence or level of at least one diagnostic marker is typically whole blood, plasma, serum, saliva, urine, stool (i.e., feces), tears, and any other bodily fluid, or a tissue sample (i.e., biopsy) such as a small intestine or colon sample. Preferably, the sample is serum, whole blood, plasma, stool, urine, or a tissue biopsy. In certain instances, the methods of the present invention further comprise obtaining the sample from the individual prior to detecting or determining the presence or level of at least one diagnostic marker in the sample.
In some embodiments, a panel for measuring one or more of the diagnostic markers described above may be constructed and used for classifying the sample as an IBS sample or non-IBS sample. One skilled in the art will appreciate that the presence or level of a plurality of diagnostic markers can be determined simultaneously or sequentially, using, for example, an aliquot or dilution of the individual's sample. In certain instances, the level of a particular diagnostic marker in the individual's sample is considered to be elevated when it is at least about 25%, 50%, 75%, 100%, 125%, 150%, 175%, 200%, 250%, 300%, 350%, 400%, 450%, 500%, 600%, 700%, 800%, 900%, or 1000% greater than the level of the same marker in a comparative sample (e.g., a normal, GI control, IBD, and/or Celiac disease sample) or population of samples (e.g., greater than a median level of the same marker in a comparative population of normal, GI control, IBD, and/or Celiac disease samples). In certain other instances, the level of a particular diagnostic marker in the individual's sample is considered to be lowered when it is at least about 5%,10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95% less than the level of the same marker in a comparative sample (e.g., a normal, GI control, IBD, and/or Celiac disease sample) or population of samples (e.g., less than a median level of the same marker in a comparative population of normal, GI control, IBD, and/or Celiac disease samples).
In certain embodiments, the presence or level of at least one diagnostic marker is determined using an assay such as a hybridization assay or an amplification-based assay. Examples of hybridization assays suitable for use in the methods of the present invention include, but are not limited to, Northern blotting, dot blotting, RNase protection, and a combination thereof. A non-limiting example of an amplification-based assay suitable for use in the methods of the present invention includes a reverse transcriptase-polymerase chain reaction (RT-PCR).
In certain other embodiments, the presence or level of at least one diagnostic marker is determined using an immunoassay or an immunohistochemical assay. A non-limiting example of an immunoassay suitable for use in the methods of the present invention includes an enzyme-linked immunosorbent assay (ELISA). Examples of immunohistochemical assays suitable for use in the methods of the present invention include, but are not limited to, immunofluorescence assays such as direct fluorescent antibody assays, indirect fluorescent antibody (IFA) assays, anticomplement immunofluorescence assays, and avidin-biotin immunofluorescence assays. Other types of immunohistochemical assays include immunoperoxidase assays.
In some embodiments, the method of ruling in IBS comprises determining a diagnostic marker profile optionally in combination with a symptom profile, wherein the symptom profile is detennined by identifying the presence or severity of at least one symptom in the individual; and classifying the sample as an IBS sample or non-IBS sample using an algorithm based upon the diagnostic marker profile and the symptom profile. One skilled in the art will appreciate that the diagnostic marker profile and the symptom profile can be determined simultaneously or sequentially in any order.
The symptom profile is typically determined by identifying the presence or severity of at least one symptom selected from the group consisting of chest pain, chest discomfort, heartburn, uncomfortable fullness after having a regular-sized meal, inability to finish a regular-sized meal, abdominal pain, abdominal discomfort, constipation, diarrhea, bloating, abdominal distension, negative thoughts or feelings associated with having pain or discomfort, and combinations thereof.
In preferred embodiments, the presence or severity of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more of the symptoms described herein is identified to generate a symptom profile that is useful for predicting IBS. In certain instances, a questionnaire or other form of written, verbal, or telephone survey is used to produce the symptom profile. The questionnaire or survey typically comprises a standardized set of questions and answers for the purpose of gathering information from respondents regarding their current and/or recent IBS-related symptoms. For instance, Example 13 provides exemplary questions that can be included in a questionnaire for identifying the presence or severity of one or more IBS-related symptoms in the individual.
In certain embodiments, the symptom profile is produced by compiling and/or analyzing all or a subset of the answers to the questions set forth in the questionnaire or survey. In certain other embodiments, the symptom profile is produced based upon the individual's response to the following question: “Are you currently experiencing any symptoms?” The symptom profile generated in accordance with either of these embodiments can be used in combination with a diagnostic marker profile in the algorithmic-based methods described herein to improve the accuracy of predicting IBS.
In some embodiments, classifying a sample as an IBS sample or non-IBS sample is based upon the diagnostic marker profile, alone or in combination with a symptom profile, in conjunction with a statistical algorithm. In certain instances, the statistical algorithm is a learning statistical classifier system. The learning statistical classifier system can be selected from the group consisting of a random forest (RF), classification and regression tree (C&RT), boosted tree, neural network (NN), support vector machine (SVM), general chi-squared automatic interaction detector model, interactive tree, multiadaptive regression spline, machine learning classifier, and combinations thereof. Preferably, the learning statistical classifier system is a tree-based statistical algorithm (e.g., RF, C&RT, etc.) and/or a NN (e.g., artificial NN, etc.). Additional examples of learning statistical classifier systems suitable for use in the present invention are described in U.S. patent application Ser. No. 11/368,285.
In certain instances, the statistical algorithm is a single learning statistical classifier system. Preferably, the single learning statistical classifier system comprises a tree-based statistical algorithm such as a RF or C&RT. As a non-limiting example, a single learning statistical classifier system can be applied to classify the sample as an IBS sample or non-IBS sample based upon a prediction or probability value and the presence or level of at least one diagnostic marker (i.e., diagnostic marker profile), alone or in combination with the presence or severity of at least one symptom (i.e., symptom profile). The application of a single learning statistical classifier system typically classifies the sample as an IBS sample with a sensitivity, specificity, positive predictive value, negative predictive value, and/or overall accuracy of at least about 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%.
In certain other instances, the statistical algorithm is a combination of at least two learning statistical classifier systems. Preferably, the combination of learning statistical classifier systems comprises a RF and a NN, e.g., applied in tandem or parallel. As a non-limiting example, a RF can first be applied to generate a prediction or probability value based upon the diagnostic marker profile, alone or in combination with a symptom profile, and a NN can then be applied to classify the sample as an IBS sample or non-IBS sample based upon the prediction or probability value and the same or different diagnostic marker profile or combination of profiles. Advantageously, the hybrid RF/NN learning statistical classifier system of the present invention classifies the sample as an IBS sample with a sensitivity, specificity, positive predictive value, negative predictive value, and/or overall accuracy of at least about 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%.
In some instances, the data obtained from applying the learning statistical classifier system or systems can be processed using a processing algorithm. Such a processing algorithm can be selected, for example, from the group consisting of a multilayer perceptron, backpropagation network, and Levenberg-Marquardt algorithm. In other instances, a combination of such processing algorithms can be used, such as in a parallel or serial fashion.
In certain embodiments, the methods of the present invention further comprise classifying the non-IBS sample as a normal, inflammatory bowel disease (IBD), or non-IBD sample. Classification of the non-IBS sample can be performed, for example, using at least one of the diagnostic markers described above.
In certain other embodiments, the methods of the present invention further comprise sending the IBS classification results to a clinician, e.g., a gastroenterologist or a general practitioner. In another embodiment, the methods of the present invention provide a diagnosis in the form of a probability that the individual has IBS. For example, the individual can have about a 0%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or greater probability of having IBS. In yet another embodiment, the methods of the present invention further provide a prognosis of IBS in the individual. For example, the prognosis can be surgery, development of a category or clinical subtype of IBS, development of one or more symptoms, or recovery from the disease.
In some embodiments, the diagnosis of an individual as having IBS is followed by administering to the individual a therapeutically effective amount of a drug useful for treating one or more symptoms associated with IBS. Suitable IBS drugs include, but are not limited to, serotonergic agents, antidepressants, chloride channel activators, chloride channel blockers, guanylate cyclase agonists, antibiotics, opioid agonists, neurokinin antagonists, antispasmodic or anticholinergic agents, belladonna alkaloids, barbiturates, GLP-1 analogs, CRF antagonists, probiotics, free bases thereof, pharmaceutically acceptable salts thereof, derivatives thereof, analogs thereof, and combinations thereof. Other IBS drugs include bulking agents, dopamine antagonists, carminatives, tranquilizers, dextofisopam, phenytoin, timolol, and diltiazem. Additionally, amino acids like glutamine and glutamic acid which regulate intestinal permeability by affecting neuronal or glial cell signaling can be administered to treat patients with IBS.
In other embodiments, the methods of the present invention further comprise classifying the IBS sample as an IBS-constipation (IBS-C), IBS-diarrhea (IBS-D), IBS-mixed (IBS-M), IBS-alternating (IBS-A), or post-infectious IBS (IBS-PI) sample. In certain instances, the classification of the IBS sample into a category, form, or clinical subtype of IBS is based upon the presence or level of at least one, two, three, four, five, six, seven, eight, nine, ten, or more classification markers. Non-limiting examples of classification markers are described below. Preferably, at least one form of IBS is distinguished from at least one other form of IBS based upon the presence or level of leptin. In certain instances, the methods of the present invention can be used to differentiate an IBS-C sample from an IBS-A and/or IBS-D sample in an individual previously identified as having IBS. In certain other instances, the methods of the present invention can be used to classify a sample from an individual not previously diagnosed with IBS as an IBS-A sample, IBS-C sample, IBS-D sample, or non-IBS sample.
In certain embodiments, the methods further comprise sending the results from the classification to a clinician. In certain other embodiments, the methods further provide a diagnosis in the form of a probability that the individual has IBS-A, IBS-C, IBS-D, IBS-M, or IBS-PI. The methods of the present invention can further comprise administering to the individual a therapeutically effective amount of a drug useful for treating IBS-A, IBS-C, IBS-D, IBS-M, or IBS-PI. Suitable drugs include, but are not limited to, tegaserod (Zelnorm™), alosetron (Lotronex®), lubiprostone (Amitiza™), rifamixin (Xifaxan™), MD-1100, probiotics, and a combination thereof. In instances where the sample is classified as an IBS-A or IBS-C sample and/or the individual is diagnosed with IBS-A or IBS-C, a therapeutically effective dose of tegaserod or other 5-HT4 agonist (e.g., mosapride, renzapride, AG1-001, etc.) can be administered to the individual. In some instances, when the sample is classified as IBS-C and/or the individual is diagnosed with IBS-C, a therapeutically effective amount of lubiprostone or other chloride channel activator, rifamixin or other antibiotic capable of controlling intestinal bacterial overgrowth, MD-1100 or other guanylate cyclase agonist, asimadoline or other opioid agonist, or talnetant or other neurokinin antagonist can be administered to the individual. In other instances, when the sample is classified as IBS-D and/or the individual is diagnosed with IBS-D, a therapeutically effective amount of alosetron or other 5-HT3 antagonist (e.g., ramosetron, DDP-225, etc.), crofelemer or other chloride channel blocker, talnetant or other neurokinin antagonist (e.g., saredutant, etc.), or an antidepressant such as a tricyclic antidepressant can be administered to the individual.
In additional embodiments, the methods of the present invention further comprise ruling out intestinal inflammation. Non-limiting examples of intestinal inflammation include acute inflammation, diverticulitis, ileal pouch-anal anastomosis, microscopic colitis, infectious diarrhea, and combinations thereof. In some instances, the intestinal inflammation is ruled out based upon the presence or level of C-reactive protein (CRP), lactoferrin, calprotectin, or combinations thereof.
In another aspect, the present invention provides a method for classifying whether a sample from an individual is associated with IBS, the method comprising:
In some embodiments, the diagnostic marker profile is determined by detecting the presence or level of at least one, two, three, four, five, six, seven, eight, nine, ten, or more diagnostic markers selected from the group consisting of a cytokine (e.g., IL-8, IL-1β, TWEAK, leptin, OPG, MIP-3β, GROα, CXCL4/PF-4, and/or CXCL7/NAP-2), growth factor (e.g., EGF, VEGF, PEDF, BDNF, and/or SDGF), anti-neutrophil antibody (e.g., ANCA, pANCA, cANCA, NSNA, and/or SAPPA), ASCA (e.g., ASCA-IgA, ASCA-IgG, and/or ASCA-IgM), antimicrobial antibody (e.g., anti-OmpC antibody, anti-flagellin antibody, and/or anti-I2 antibody), lactoferrin, anti-tTG antibody, lipocalin (e.g., NGAL, NGAL/MMP-9 complex), MMP (e.g., MMP-9), TIMP (e.g., TIMP-1), alpha-globulin (e.g., alpha-2-macroglobulin, haptoglobin, and/or orosomucoid), actin-severing protein (e.g., gelsolin), S100 protein (e.g., calgranulin), fibrinopeptide (e.g., FIBA), CGRP, tachykinin (e.g., Substance P), ghrelin, neurotensin, corticotropin-releasing hormone, IBS1, MUC20, VSIG2, CKB, M160, VSIG4, CASP1, NCF4, LYZ, KCNS3, PSME2, MS4A4A, HELLS, COP1, FCGR2A, RFC4, MCM5, TAP2, LRAP, L2DTL and combinations thereof. The presence or level of other diagnostic markers such as, for example, anti-lactoferrin antibody, L-selectin/CD62L, elastase, C-reactive protein (CRP), calprotectin, anti-U1-70 kDa autoantibody, zona occludens 1 (ZO-1), vasoactive intestinal peptide (VIP), serum amyloid A, and/or gastrin can also be determined.
In preferred embodiments, the diagnostic marker profile is determined by detecting the presence or level of IL-1β, NGAL, anti-Cbir1 antibodies, ANCA, BDNF, TWEAK, anti-tTG antibodies, GROα, TIMP-1, and ASCA in the individual's sample.
The diagnostic markers used for ruling out IBD can be the same as the diagnostic markers used for ruling in IBS. Alternatively, the diagnostic markers used for ruling out IBD can be different than the diagnostic markers used for ruling in IBS.
The sample used for detecting or determining the presence or level of at least one diagnostic marker is typically whole blood, plasma, serum, saliva, urine, stool (i.e., feces), tears, and any other bodily fluid, or a tissue sample (i.e., biopsy) such as a small intestine or colon sample. Preferably, the sample is serum, whole blood, plasma, stool, urine, or a tissue biopsy. In certain instances, the methods of the present invention further comprise obtaining the sample from the individual prior to detecting or determining the presence or level of at least one diagnostic marker in the sample.
In some embodiments, a panel for measuring one or more of the diagnostic markers described above may be constructed and used for ruling out IBD and/or ruling in IBS. One skilled in the art will appreciate that the presence or level of a plurality of diagnostic markers can be determined simultaneously or sequentially, using, for example, an aliquot or dilution of the individual's sample. As described above, the level of a particular diagnostic marker in the individual's sample is generally considered to be elevated when it is at least about 25%, 50%, 75%, 100%, 125%, 150%, 175%, 200%, 250%, 300%, 350%, 400%, 450%, 500%, 600%, 700%, 800%, 900%, or 1000% greater than the level of the same marker in a comparative sample or population of samples (e.g., greater than a median level). Similarly, the level of a particular diagnostic marker in the individual's sample is typically considered to be lowered when it is at least about 5%,10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95% less than the level of the same marker in a comparative sample or population of samples (e.g., less than a median level).
In certain instances, the presence or level of at least one diagnostic marker is determined using an assay such as a hybridization assay or an amplification-based assay. Examples of hybridization assays and amplification-based assays suitable for use in the methods of the present invention are described above. In certain other instances, the presence or level of at least one diagnostic marker is determined using an immunoassay or an immunohistochemical assay. Non-limiting examples of immunoassays and immunohistochemical assays suitable for use in the methods of the present invention are described above.
In some embodiments, the method of first ruling out IBD (i.e., classifying the sample as an IBD sample or non-IBD sample) and then ruling in IBS (i.e., classifying the non-IBD sample as an IBS sample or non-IBS sample) comprises determining a diagnostic marker profile optionally in combination with a symptom profile, wherein the symptom profile is determined by identifying the presence or severity of at least one symptom in the individual; classifying the sample as an IBD sample or non-IBD sample using a first statistical algorithm based upon the diagnostic marker profile and the symptom profile; and if the sample is classified as a non-IBD sample, classifying the non-IBD sample as an IBS sample or non-IBS sample using a second statistical algorithm based upon the same profiles as determined in step (a) or different profiles. One skilled in the art will appreciate that the diagnostic marker profile and the symptom profile can be determined simultaneously or sequentially in any order.
In other embodiments, the first statistical algorithm is a learning statistical classifier system selected from the group consisting of a random forest (RF), classification and regression tree (C&RT), boosted tree, neural network (NN), support vector machine (SVM), general chi-squared automatic interaction detector model, interactive tree, multiadaptive regression spline, machine learning classifier, and combinations thereof. In certain instances, the first statistical algorithm is a single learning statistical classifier system. Preferably, the single learning statistical classifier system comprises a tree-based statistical algorithm such as a RF or C&RT. In certain other instances, the first statistical algorithm is a combination of at least two learning statistical classifier systems, e.g., applied in tandem or parallel. As a non-limiting example, a RF can first be applied to generate a prediction or probability value based upon the diagnostic marker profile, alone or in combination with a symptom profile, and a NN (e.g., artificial NN) can then be applied to classify the sample as a non-IBD sample or IBD sample based upon the prediction or probability value and the same or different diagnostic marker profile or combination of profiles. The hybrid RF/NN learning statistical classifier system of the present invention typically classifies the sample as a non-IBD sample with a sensitivity, specificity, positive predictive value, negative predictive value, and/or overall accuracy of at least about 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%.
In yet other embodiments, the second statistical algorithm comprises any of the learning statistical classifier systems described above. In certain instances, the second statistical algorithm is a single learning statistical classifier system such as, for example, a tree-based statistical algorithm (e.g., RF or C&RT). In certain other instances, the second statistical algorithm is a combination of at least two learning statistical classifier systems, e.g., applied in tandem or parallel. As a non-limiting example, a RF can first be applied to generate a prediction or probability value based upon the diagnostic marker profile, alone or in combination with a symptom profile, and a NN (e.g., artificial NN) or SVM can then be applied to classify the non-IBD sample as a non-IBS sample or IBS sample based upon the prediction or probability value and the same or different diagnostic marker profile or combination of profiles. The hybrid RF/NN or RF/SVM learning statistical classifier system described herein typically classifies the sample as an IBS sample with a sensitivity, specificity, positive predictive value, negative predictive value, and/or overall accuracy of at least about 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%.
In some instances, the data obtained from applying the learning statistical classifier system or systems can be processed using a processing algorithm. Such a processing algorithm can be selected, for example, from the group consisting of a multilayer perceptron, backpropagation network, and Levenberg-Marquardt algorithm. In other instances, a combination of such processing algorithms can be used, such as in a parallel or serial fashion.
As described above, the methods of the present invention can further comprise sending the IBS classification results to a clinician, e.g., a gastroenterologist or a general practitioner. The methods can also provide a diagnosis in the form of a probability that the individual has IBS. For example, the individual can have about a 0%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or greater probability of having IBS. In some instances, the methods of the present invention further provide a prognosis of IBS in the individual. For example, the prognosis can be surgery, development of a category or clinical subtype of IBS, development of one or more symptoms, or recovery from the disease.
In some embodiments, the diagnosis of an individual as having IBS is followed by administering to the individual a therapeutically effective amount of a drug useful for treating one or more symptoms associated with IBS. Suitable IBS drugs are described above.
In other embodiments, the methods of the present invention further comprise classifying the IBS sample as an IBS-A, IBS-C, IBS-D, IBS-M, or IBS-PI sample. In certain instances, the classification of the IBS sample into a category, form, or clinical subtype of IBS is based upon the presence or level of at least one classification marker. Non-limiting examples of classification markers are described below. Preferably, at least one form of IBS is distinguished from at least one other form of IBS based upon the presence or level of leptin. The results from the classification can be sent to a clinician. In some instances, the methods can further provide a diagnosis in the form of a probability that the individual has IBS-A, IBS-C, IBS-D, IBS-M, or IBS-PI. In other instances, the methods can further comprise administering to the individual a therapeutically effective amount of a drug useful for treating IBS-A, IBS-C, IBS-D, IBS-M, or IBS-PI such as, for example, tegaserod (Zelnorm™), alosetron (Lotronex®), lubiprostone (Amitiza™), rifamixin (Xifaxan™), MD-1100, probiotics, and combinations thereof.
In additional embodiments, the methods of the present invention further comprise ruling out intestinal inflammation. Non-limiting examples of intestinal inflammation are described above. In certain instances, the intestinal inflammation is ruled out based upon the presence or level of CRP, lactoferrin, and/or calprotectin.
In yet another aspect, the present invention provides a method for monitoring the progression or regression of IBS in an individual, the method comprising:
In a related aspect, the present invention provides a method for monitoring drug efficacy in an individual receiving a drug useful for treating IBS, the method comprising:
In some embodiments, the diagnostic marker profile is determined by detecting the presence or level of at least one, two, three, four, five, six, seven, eight, nine, ten, or more diagnostic markers selected from the group consisting of a cytokine (e.g., IL-8, IL-1β, TWEAK, leptin, OPG, MIP-3β, GROα, CXCL4/PF-4, and/or CXCL7/NAP-2), growth factor (e.g., EGF, VEGF, PEDF, BDNF, and/or SDGF), anti-neutrophil antibody (e.g., ANCA, pANCA, cANCA, NSNA, and/or SAPPA), ASCA (e.g., ASCA-IgA, ASCA-IgG, and/or ASCA-IgM), antimicrobial antibody (e.g., anti-OmpC antibody, anti-flagellin antibody, and/or anti-I2 antibody), lactoferrin, anti-tTG antibody, lipocalin (e.g., NGAL, NGAL/MMP-9 complex), MMP (e.g., MMP-9), TIMP (e.g., TIMP-1), alpha-globulin (e.g., alpha-2-macroglobulin, haptoglobin, and/or orosomucoid), actin-severing protein (e.g., gelsolin), 5100 protein (e.g., calgranulin), fibrinopeptide (e.g., FIBA), CGRP, tachykinin (e.g., Substance P), ghrelin, neurotensin, corticotropin-releasing hormone, IBS1, MUC20, VSIG2, CKB, M160, VSIG4, CASP1, NCF4, LYZ, KCNS3, PSME2, MS4A4A, HELLS, COP1, FCGR2A, RFC4, MCM5, TAP2, LRAP, L2DTL and combinations thereof. The presence or level of other diagnostic markers such as, for example, anti-lactoferrin antibody, L-selectin/CD62L, elastase, C-reactive protein (CRP), calprotectin, anti-U 1-70 kDa autoantibody, zona occludens 1 (ZO-1), vasoactive intestinal peptide (VIP), serum amyloid A, and/or gastrin can also be determined.
In preferred embodiments, the diagnostic marker profile is determined by detecting the presence or level of IL-10, NGAL, anti-Cbir1 antibodies, ANCA, BDNF, TWEAK, anti-tTG antibodies, GROα, TIMP-1, and ASCA in the individual's sample.
The sample used for detecting or determining the presence or level of at least one diagnostic marker is typically whole blood, plasma, serum, saliva, urine, stool (i.e., feces), tears, and any other bodily fluid, or a tissue sample (i.e., biopsy) such as a small intestine or colon sample. Preferably, the sample is serum, whole blood, plasma, stool, urine, or a tissue biopsy. In certain instances, the methods of the present invention further comprise obtaining the sample from the individual prior to detecting or determining the presence or level of at least one diagnostic marker in the sample.
In some embodiments, a panel for measuring one or more of the diagnostic markers described above may be constructed and used for determining the presence or severity of IBS or for determining the effectiveness of an IBS drug. One skilled in the art will appreciate that the presence or level of a plurality of diagnostic markers can be determined simultaneously or sequentially, using, for example, an aliquot or dilution of the individual's sample. As described above, the level of a particular diagnostic marker in the individual's sample is generally considered to be elevated when it is at least about 25%, 50%, 75%, 100%, 125%, 150%, 175%, 200%, 250%, 300%, 350%, 400%, 450%, 500%, 600%, 700%, 800%, 900%, or 1000% greater than the level of the same marker in a comparative sample or population of samples (e.g., greater than a median level). Similarly, the level of a particular diagnostic marker in the individual's sample is typically considered to be lowered when it is at least about 5%,10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95% less than the level of the same marker in a comparative sample or population of samples (e.g., less than a median level).
In certain instances, the presence or level of at least one diagnostic marker is determined using an assay such as a hybridization assay or an amplification-based assay. Examples of hybridization assays and amplification-based assays suitable for use in the methods of the present invention are described above. Alternatively, the presence or level of at least one diagnostic marker is determined using an immunoassay or an immunohistochemical assay. Non-limiting examples of immunoassays and immunohistochemical assays suitable for use in the methods of the present invention are described above.
In certain embodiments, the method of monitoring the progression or regression of IBS comprises determining a diagnostic marker profile optionally in combination with a symptom profile, wherein the symptom profile is determined by identifying the presence or severity of at least one symptom in the individual; and determining the presence or severity of IBS in the individual using an algorithm based upon the diagnostic marker profile and the symptom profile. In certain other embodiments, the method of monitoring IBS drug efficacy comprises determining a diagnostic marker profile optionally in combination with a symptom profile, wherein the symptom profile is determined by identifying the presence or severity of at least one symptom in the individual; and determining the effectiveness of the drug using an algorithm based upon the diagnostic marker profile and the symptom profile. One skilled in the art will appreciate that the diagnostic marker profile and the symptom profile can be determined simultaneously or sequentially in any order.
In some embodiments, determining the presence or severity of IBS or the effectiveness of an IBS drug is based upon the diagnostic marker profile, alone or in combination with a symptom profile, in conjunction with a statistical algorithm. In certain instances, the statistical algorithm is a learning statistical classifier system. The learning statistical classifier system comprises any of the learning statistical classifier systems described above.
In certain instances, the statistical algorithm is a single learning statistical classifier system. Preferably, the single learning statistical classifier system is a tree-based statistical algorithm (e.g., RF, C&RT, etc.). In certain other instances, the statistical algorithm is a combination of at least two learning statistical classifier systems. Preferably, the combination of learning statistical classifier systems comprises a RF and NN (e.g., artificial NN, etc.), e.g., applied in tandem or parallel. As a non-limiting example, a RF can first be applied to generate a prediction or probability value based upon the diagnostic marker profile, alone or in combination with a symptom profile, and a NN can then be applied to determine the presence or severity of IBS in the individual or IBS drug efficacy based upon the prediction or probability value and the same or different diagnostic marker profile or combination of profiles.
In some instances, the data obtained from applying the learning statistical classifier system or systems can be processed using a processing algorithm. Such a processing algorithm can be selected, for example, from the group consisting of a multilayer perceptron, backpropagation network, and Levenberg-Marquardt algorithm. In other instances, a combination of such processing algorithms can be used, such as in a parallel or serial fashion.
In certain embodiments, the methods of the present invention can further comprise comparing the presence or severity of IBS in the individual determined in step (b) to the presence or severity of IBS in the individual at an earlier time. As a non-limiting example, the presence or severity of IBS determined for an individual receiving an IBS drug can be compared to the presence or severity of IBS determined for the same individual before initiation of use of the IBS drug or at an earlier time in therapy. In certain other embodiments, the methods of the present invention can comprise determining the effectiveness of the IBS drug by comparing the effectiveness of the IBS drug determined in step (b) to the effectiveness of the IBS drug in the individual at an earlier time in therapy. In additional embodiments, the methods can further comprise sending the IBS monitoring results to a clinician, e.g., a gastroenterologist or a general practitioner.
In a further aspect, the present invention provides a computer-readable medium including code for controlling one or more processors to classify whether a sample from an individual is associated with IBS, the code comprising:
In some embodiments, the diagnostic marker profile indicates the presence or level of at least one, two, three, four, five, six, seven, eight, nine, ten, or more diagnostic markers selected from the group consisting of a cytokine (e.g., IL-8, IL-1β, TWEAK, leptin, OPG, MIP-3β, GROα, CXCL4/PF-4, and/or CXCL7/NAP-2), growth factor (e.g., EGF, VEGF, PEDF, BDNF, and/or SDGF), anti-neutrophil antibody (e.g., ANCA, pANCA, cANCA, NSNA, and/or SAPPA), ASCA (e.g., ASCA-IgA, ASCA-IgG, and/or ASCA-IgM), antimicrobial antibody (e.g., anti-OmpC antibody, anti-flagellin antibody, and/or anti-I2 antibody), lactoferrin, anti-tTG antibody, lipocalin (e.g., NGAL, NGAL/MMP-9 complex), MMP (e.g., MMP-9), TIMP (e.g., TIMP-1), alpha-globulin (e.g., alpha-2-macroglobulin, haptoglobin, and/or orosomucoid), actin-severing protein (e.g., gelsolin), S100 protein (e.g., calgranulin), fibrinopeptide (e.g., FIBA), CGRP, tachykinin (e.g., Substance P), ghrelin, neurotensin, corticotropin-releasing hormone, IBS1, MUC20, VSIG2, CKB, M160, VSIG4, CASP1, NCF4, LYZ, KCNS3, PSME2, MS4A4A, HELLS, COP1, FCGR2A, RFC4, MCM5, TAP2, LRAP, L2DTL and combinations thereof. The presence or level of other diagnostic markers such as, for example, anti-lactoferrin antibody, L-selectin/CD62L, elastase, C-reactive protein (CRP), calprotectin, anti-U1-70 kDa autoantibody, zona occludens 1 (ZO-1), vasoactive intestinal peptide (VIP), serum amyloid A, and/or gastrin can also be indicative of the diagnostic marker profile.
In preferred embodiments, the diagnostic marker profile indicates the presence or level of IL-1β, NGAL, anti-Cbir1 antibodies, ANCA, BDNF, TWEAK, anti-tTG antibodies, GROα, TIMP-1, and ASCA in the individual's sample.
In other embodiments, the computer-readable medium for ruling in IBS comprises instructions to apply a statistical process to a data set comprising a diagnostic marker profile optionally in combination with a symptom profile which indicates the presence or severity of at least one symptom in the individual to produce a statistically derived decision classifying the sample as an IBS sample or non-IBS sample based upon the diagnostic marker profile and the symptom profile. One skilled in the art will appreciate that the statistical process can be applied to the diagnostic marker profile and the symptom profile simultaneously or sequentially in any order.
In one embodiment, the statistical process is a learning statistical classifier system. Examples of learning statistical classifier systems suitable for use in the present invention are described above. In certain instances, the statistical process is a single learning statistical classifier system such as, for example, a RF or C&RT. In certain other instances, the statistical process is a combination of at least two learning statistical classifier systems. As a non-limiting example, the combination of learning statistical classifier systems comprises a RF and a NN, e.g., applied in tandem. In some instances, the data obtained from applying the learning statistical classifier system or systems can be processed using a processing algorithm.
In a related aspect, the present invention provides a computer-readable medium including code for controlling one or more processors to classify whether a sample from an individual is associated with IBS, the code comprising:
In some embodiments, the diagnostic marker profile indicates the presence or level of at least one, two, three, four, five, six, seven, eight, nine, ten, or more diagnostic markers selected from the group consisting of a cytokine (e.g., IL-8, IL-1β, TWEAK, leptin, OPG, MIP-3β, GROα, CXCL4/PF-4, and/or CXCL7/NAP-2), growth factor (e.g., EGF, VEGF, PEDF, BDNF, and/or SDGF), anti-neutrophil antibody (e.g., ANCA, pANCA, cANCA, NSNA, and/or SAPPA), ASCA (e.g., ASCA-IgA, ASCA-IgG, and/or ASCA-IgM), antimicrobial antibody (e.g., anti-OmpC antibody, anti-flagellin antibody, and/or anti-I2 antibody), lactoferrin, anti-tTG antibody, lipocalin (e.g., NGAL, NGAL/MMP-9 complex), MMP (e.g., MMP-9), TIMP (e.g., TIMP-1), alpha-globulin (e.g., alpha-2-macroglobulin, haptoglobin, and/or orosomucoid), actin-severing protein (e.g., gelsolin), S 100 protein (e.g., calgranulin), fibrinopeptide (e.g., FIBA), CGRP, tachykinin (e.g., Substance P), ghrelin, neurotensin, corticotropin-releasing hormone, IBS1, MUC20, VSIG2, CKB, M160, VSIG4, CASP1, NCF4, LYZ, KCNS3, PSME2, MS4A4A, HELLS, COP1, FCGR2A, RFC4, MCM5, TAP2, LRAP, L2DTL and combinations thereof. The presence or level of other diagnostic markers such as, for example, anti-lactoferrin antibody, L-selectin/CD62L, elastase, C-reactive protein (CRP), calprotectin, anti-U1-70 kDa autoantibody, zona occludens 1 (ZO-1), vasoactive intestinal peptide (VIP), serum amyloid A, and/or gastrin can also be indicative of the diagnostic marker profile.
In preferred embodiments, the diagnostic marker profile indicates the presence or level of IL-1β, NGAL, anti-Cbir1 antibodies, ANCA, BDNF, TWEAK, anti-tTG antibodies, GROα, TIMP-1, and ASCA in the individual's sample.
In other embodiments, the computer-readable medium for first ruling out IBD and then ruling in IBS comprises instructions to apply a first statistical process to a data set comprising a diagnostic marker profile optionally in combination with a symptom profile which indicates the presence or severity of at least one symptom in the individual to produce a statistically derived decision classifying the sample as an IBD sample or non-IBD sample based upon the diagnostic marker profile and the symptom profile; and if the sample is classified as a non-IBD sample, instructions to apply a second statistical process to the same or different data set to produce a second statistically derived decision classifying the non-IBD sample as an IBS sample or non-IBS sample. One skilled in the art will appreciate that the first and/or second statistical process can be applied to the diagnostic marker profile and the symptom profile simultaneously or sequentially in any order.
In one embodiment, the first and second statistical processes are implemented in different processors. Alternatively, the first and second statistical processes are implemented in a single processor. In another embodiment, the first statistical process is a learning statistical classifier system. Examples of learning statistical classifier systems suitable for use in the present invention are described above. In certain instances, the first and/or second statistical process is a single learning statistical classifier system such as, for example, a RF or C&RT. In certain other instances, the first and/or second statistical process is a combination of at least two learning statistical classifier systems. As a non-limiting example, the combination of learning statistical classifier systems comprises a RF and a NN or SVM, e.g., applied in tandem. In some instances, the data obtained from applying the learning statistical classifier system or systems can be processed using a processing algorithm.
In an additional aspect, the present invention provides a system for classifying whether a sample from an individual is associated with IBS, the system comprising:
In some embodiments, the diagnostic marker profile indicates the presence or level of at least one, two, three, four, five, six, seven, eight, nine, ten, or more diagnostic markers selected from the group consisting of a cytokine (e.g., IL-8, IL-1β, TWEAK, leptin, OPG, MIP-3β, GROα, CXCL4/PF-4, and/or CXCL7/NAP-2), growth factor (e.g., EGF, VEGF, PEDF, BDNF, and/or SDGF), anti-neutrophil antibody (e.g., ANCA, pANCA, cANCA, NSNA, and/or SAPPA), ASCA (e.g., ASCA-IgA, ASCA-IgG, and/or ASCA-IgM), antimicrobial antibody (e.g., anti-OmpC antibody, anti-flagellin antibody, and/or anti-I2 antibody), lactoferrin, anti-tTG antibody, lipocalin (e.g., NGAL, NGAL/MMP-9 complex), MMP (e.g., MMP-9), TIMP (e.g., TIMP-1), alpha-globulin (e.g., alpha-2-macroglobulin, haptoglobin, and/or orosomucoid), actin-severing protein (e.g., gelsolin), S100 protein (e.g., calgranulin), fibrinopeptide (e.g., FIBA), CGRP, tachykinin (e.g., Substance P), ghrelin, neurotensin, corticotropin-releasing hormone, IBS1, MUC20, VSIG2, CKB, M160, VSIG4, CASP1, NCF4, LYZ, KCNS3, PSME2, MS4A4A, HELLS, COP1, FCGR2A, RFC4, MCM5, TAP2, LRAP, L2DTL and combinations thereof. The presence or level of other diagnostic markers such as, for example, anti-lactoferrin antibody, L-selectin/CD62L, elastase, C-reactive protein (CRP), calprotectin, anti-U1-70 kDa autoantibody, zona occludens 1 (ZO-1), vasoactive intestinal peptide (VIP), serum amyloid A, and/or gastrin can also be indicative of the diagnostic marker profile.
In preferred embodiments, the diagnostic marker profile indicates the presence or level of IL-1β, NGAL, anti-Cbir1 antibodies, ANCA, BDNF, TWEAK, anti-tTG antibodies, GROα, TIMP-1, and ASCA in the individual's sample.
In other embodiments, the system for ruling in IBS comprises a data acquisition module configured to produce a data set comprising a diagnostic marker profile optionally in combination with a symptom profile which indicates the presence or severity of at least one symptom in the individual; a data processing module configured to process the data set by applying a statistical process to the data set to produce a statistically derived decision classifying the sample as an IBS sample or non-IBS sample based upon the diagnostic marker profile and the symptom profile; and a display module configured to display the statistically derived decision.
In one embodiment, the statistical process is a learning statistical classifier system. Examples of learning statistical classifier systems suitable for use in the present invention are described above. In certain instances, the statistical process is a single learning statistical classifier system such as, for example, a RF or C&RT. In certain other instances, the statistical process is a combination of at least two learning statistical classifier systems, e.g., applied in tandem or parallel. In some embodiments, the data obtained from applying the learning statistical classifier system or systems can be processed using a processing algorithm.
In a related aspect, the present invention provides a system for classifying whether a sample from an individual is associated with IBS, the system comprising:
In some embodiments, the diagnostic marker profile indicates the presence or level of at least one, two, three, four, five, six, seven, eight, nine, ten, or more diagnostic markers selected from the group consisting of a cytokine (e.g., IL-8, IL-1β, TWEAK, leptin, OPG, MIP-3β, GROα, CXCL4/PF-4, and/or CXCL7/NAP-2), growth factor (e.g., EGF, VEGF, PEDF, BDNF, and/or SDGF), anti-neutrophil antibody (e.g., ANCA, pANCA, cANCA, NSNA, and/or SAPPA), ASCA (e.g., ASCA-IgA, ASCA-IgG, and/or ASCA-IgM), antimicrobial antibody (e.g., anti-OmpC antibody, anti-flagellin antibody, and/or anti-I2 antibody), lactoferrin, anti-tTG antibody, lipocalin (e.g., NGAL, NGAL/MMP-9 complex), MMP (e.g., MMP-9), TIMP (e.g., TIMP-1), alpha-globulin (e.g., alpha-2-macroglobulin, haptoglobin, and/or orosomucoid), actin-severing protein (e.g., gelsolin), S100 protein (e.g., calgranulin), fibrinopeptide (e.g., FIBA), CGRP, tachykinin (e.g., Substance P), ghrelin, neurotensin, corticotropin-releasing hormone, IBS1, MUC20, VSIG2, CKB, M160, VSIG4, CASP1, NCF4, LYZ, KCNS3, PSME2, MS4A4A, HELLS, COP1, FCGR2A, RFC4, MCM5, TAP2, LRAP, L2DTL and combinations thereof. The presence or level of other diagnostic markers such as, for example, anti-lactoferrin antibody, L-selectin/CD62L, elastase, C-reactive protein (CRP), calprotectin, anti-U1-70 kDa autoantibody, zona occludens 1 (ZO-1), vasoactive intestinal peptide (VIP), serum amyloid A, and/or gastrin can also be indicative of the diagnostic marker profile.
In preferred embodiments, the diagnostic marker profile indicates the presence or level of IL-1β, NGAL, anti-Cbir1 antibodies, ANCA, BDNF, TWEAK, anti-tTG antibodies, GROα, TIMP-1, and ASCA in the individual's sample.
In other embodiments, the system for first ruling out IBD and then ruling in IBS comprises a data acquisition module configured to produce a data set comprising a diagnostic marker profile optionally in combination with a symptom profile which indicates the presence or severity of at least one symptom in the individual; a data processing module configured to process the data set by applying a first statistical process to the data set to produce a first statistically derived decision classifying the sample as an IBD sample or non-IBD sample based upon the diagnostic marker profile and the symptom profile; if the sample is classified as a non-IBD sample, a data processing module configured to apply a second statistical process to the same or different data set to produce a second statistically derived decision classifying the non-IBD sample as an IBS sample or non-IBS sample; and a display module configured to display the first and/or the second statistically derived decision.
In one embodiment, the first and/or second statistical process is a learning statistical classifier system. Examples of learning statistical classifier systems suitable for use in the present invention are described above. In certain instances, the first and/or second statistical process is a single learning statistical classifier system such as, for example, a RF or C&RT. In certain other instances, the first and/or second statistical process is a combination of at least two learning statistical classifier systems, e.g., applied in tandem or parallel. In some instances, the data obtained from applying the learning statistical classifier system or systems can be processed using a processing algorithm. In another embodiment, the first and second statistical processes are implemented in different processors. Alternatively, the first and second statistical processes are implemented in a single processor.
A variety of structural or metabolic diseases and disorders can cause signs or symptoms that are similar to IBS. As non-limiting examples, patients with diseases and disorders such as inflammatory bowel disease (IBD), Celiac disease (CD), acute inflammation, diverticulitis, ileal pouch-anal anastomosis, microscopic colitis, chronic infectious diarrhea, lactase deficiency, cancer (e.g., colorectal cancer), a mechanical obstruction of the small intestine or colon, an enteric infection, ischemia, maldigestion, malabsorption, endometriosis, and unidentified inflammatory disorders of the intestinal tract can present with abdominal discomfort associated with mild to moderate pain and a change in the consistency and/or frequency of stools that are similar to IBS. Additional IBS-like symptoms can include chronic diarrhea or constipation or an alternating form of each, weight loss, abdominal distention or bloating, and mucus in the stool.
Most IBD patients can be classified into one of two distinct clinical subtypes, Crohn's disease and ulcerative colitis. Crohn's disease is an inflammatory disease affecting the lower part of the ileum and often involving the colon and other regions of the intestinal tract. Ulcerative colitis is characterized by an inflammation localized mostly in the mucosa and submucosa of the large intestine. Patients suffering from these clinical subtypes of IBD typically have IBS-like symptoms such as, for example, abdominal pain, chronic diarrhea, weight loss, and cramping.
The clinical presentation of Celiac disease is also characterized by IBS-like symptoms such as abdominal discomfort associated with chronic diarrhea, weight loss, and abdominal distension. Celiac disease is an immune-mediated disorder of the intestinal mucosa that is typically associated with villous atrophy, crypt hyperplasia, and/or inflammation of the mucosal lining of the small intestine. In addition to the malabsorption of nutrients, individuals with Celiac disease are at risk for mineral deficiency, vitamin deficiency, osteoporosis, autoimmune diseases, and intestinal malignancies (e.g., lymphoma and carcinoma). It is thought that exposure to proteins such as gluten (e.g., glutenin and prolamine proteins which are present in wheat, rye, barley, oats, millet, triticale, spelt, and kamut), in the appropriate genetic and environmental context, is responsible for causing Celiac disease.
Other diseases and disorders characterized by intestinal inflammation that present with IBS-like symptoms include, for example, acute inflammation, diverticulitis, ileal pouch-anal anastomosis, microscopic colitis, and chronic infectious diarrhea, as well as unidentified inflammatory disorders of the intestinal tract. Patients experiencing episodes of acute inflammation typically have elevated C-reactive protein (CRP) levels in addition to IBS-like symptoms. CRP is produced by the liver during the acute phase of the inflammatory process and is usually released about 24 hours post-commencement of the inflammatory process. Patients suffering from diverticulitis, ileal pouch-anal anastomosis, microscopic colitis, and chronic infectious diarrhea typically have elevated fecal lactoferrin and/or calprotectin levels in addition to IBS-like symptoms. Lactoferrin is a glycoprotein secreted by mucosal membranes and is the major protein in the secondary granules of leukocytes. Leukocytes are commonly recruited to inflammatory sites where they are activated, releasing granule content to the surrounding area. This process increases the concentration of lactoferrin in the stool.
Increased lactoferrin levels are observed in patients with ileal pouch-anal anastomosis (i.e., a pouch is created following complete resection of colon in severe cases of Crohn's disease) when compared to other non-inflammatory conditions of the pouch, like irritable pouch syndrome. Elevated levels of lactoferrin are also observed in patients with diverticulitis, a condition in which bulging pouches (i.e., diverticula) in the digestive tract become inflamed and/or infected, causing severe abdominal pain, fever, nausea, and a marked change in bowel habits. Microscopic colitis is a chronic inflammatory disorder that is also associated with increased fecal lactoferrin levels. Microscopic colitis is characterized by persistent watery diarrhea (non-bloody), abdominal pain usually associated with weight loss, a normal mucosa during colonoscopy and radiological examination, and very specific histopathological changes. Microscopic colitis consists of two diseases, collagenous colitis and lymphocytic colitis. Collagenous colitis is of unknown etiology and is found in patients with long-term watery diarrhea and a normal colonoscopy examination. Both collagenous colitis and lymphocytic colitis are characterized by increased lymphocytes in the lining of the colon. Collagenous colitis is further characterized by a thickening of the sub-epithelial collagen layer of the colon. Chronic infectious diarrhea is an illness that is also associated with increased fecal lactoferrin levels. Chronic infectious diarrhea is usually caused by a bacterial, viral, or protozoan infection, with patients presenting with IBS-like symptoms such as diarrhea and abdominal pain. Increased lactoferrin levels are also observed in patients with IBD.
In addition to determining CRP and/or lactoferrin and/or calprotectin levels, diseases and disorders associated with intestinal inflammation can also be ruled out by detecting the presence of blood in the stool, such as fecal hemoglobin. Intestinal bleeding that occurs without the patient's knowledge is called occult or hidden bleeding. The presence of occult bleeding (e.g., fecal hemoglobin) is typically observed in a stool sample from the patient. Other conditions such as ulcers (e.g., gastric, duodenal), cancer (e.g., stomach cancer, colorectal cancer), and hemorrhoids can also present with IBS-like symptoms including abdominal pain and a change in the consistency and/or frequency of stools.
In addition, fecal calprotectin levels can also be assessed. Calprotectin is a calcium binding protein with antimicrobial activity derived predominantly from neutrophils and monocytes. Calprotectin has been found to have clinical relevance in cystic fibrosis, rheumatoid arthritis, IBD, colorectal cancer, HIV, and other inflammatory diseases. Its level has been measured in serum, plasma, oral, cerebrospinal and synovial fluids, urine, and feces. Advantages of fecal calprotectin in GI disorders have been recognized: stable for 3-7 days at room temperature enabling sample shipping through regular mail; correlated to fecal alpha 1-antitrypsin in patients with Crohn's disease; and elevated in a great majority of patients with gastrointestinal carcinomas and IBD. It was found that fecal calprotectin correlates well with endoscopic and histological gradings of disease activity in ulcerative colitis, and with fecal excretion of indium-111-labelled neutrophilic granulocytes, which is a standard of disease activity in IBD.
In view of the foregoing, it is clear that a wide array of diseases and disorders can cause IBS-like symptoms, thereby creating a substantial obstacle for definitively classifying a sample as an IBS sample. However, the present invention overcomes this limitation by classifying a sample from an individual as an IBS sample using, for example, a statistical algorithm, or by excluding (i.e., ruling out) those diseases and disorders that share a similar clinical presentation as IBS and identifying (i.e., ruling in) IBS in a sample using, for example, a combination of statistical algorithms.
A variety of diagnostic markers are suitable for use in the methods, systems, and code of the present invention for classifying a sample from an individual as an IBS sample or for ruling out one or more diseases or disorders associated with IBS-like symptoms in a sample from an individual. Examples of diagnostic markers include, without limitation, cytokines, growth factors, anti-neutrophil antibodies, anti-Saccharomyces cerevisiae antibodies, antimicrobial antibodies, anti-tissue transglutaminase (tTG) antibodies, lipocalins, matrix metalloproteinases (MMPs), complexes of lipocalin and MMP, tissue inhibitor of metalloproteinases (TIMPs), globulins (e.g., alpha-globulins), actin-severing proteins, S100 proteins, fibrinopeptides, calcitonin gene-related peptide (CGRP), tachykinins, ghrelin, neurotensin, corticotropin-releasing hormone (CRH), IBS1, MUC20, VSIG2, CKB, M160, VSIG4, CASP1, NCF4, LYZ, KCNS3, PSME2, MS4A4A, HELLS, COP1, FCGR2A, RFC4, MCM5, TAP2, LRAP, L2DTL, elastase, C-reactive protein (CRP), lactoferrin, anti-lactoferrin antibodies, calprotectin, hemoglobin, NOD2/CARD15, serotonin reuptake transporter (SERT), tryptophan hydroxylase-1,5-hydroxytryptamine (5-HT), lactulose, and combinations thereof. Additional diagnostic markers for predicting IBS in accordance with the present invention can be selected using the techniques described in Example 14. One skilled in the art will also know of other diagnostic markers suitable for use in the present invention.
A. Cytokines
The determination of the presence or level of at least one cytokine in a sample is particularly useful in the present invention. As used herein, the term “cytokine” includes any of a variety of polypeptides or proteins secreted by immune cells that regulate a range of immune system functions and encompasses small cytokines such as chemokines. The term “cytokine” also includes adipocytokines, which comprise a group of cytokines secreted by adipocytes that function, for example, in the regulation of body weight, hematopoiesis, angiogenesis, wound healing, insulin resistance, the immune response, and the inflammatory response.
In certain aspects, the presence or level of at least one cytokine including, but not limited to, TNF-α, TNF-related weak inducer of apoptosis (TWEAK), osteoprotegerin (OPG), IFN-α, IFN-β, IFN-γ, IL-1α, IL-1β, IL-1 receptor antagonist (IL-1ra), IL-2, IL-4, IL-5, IL-6, soluble IL-6 receptor (sIL-6R), IL-7, IL-8, IL-9, IL-10, IL-12, IL-13, IL-15, IL-17, IL-23, and IL-27 is determined in a sample. In certain other aspects, the presence or level of at least one chemokine such as, for example, CXCL1/GRO1/GROα, CXCL2/GRO2, CXCL3/GRO3, CXCL4/PF-4, CXCL5/ENA-78, CXCL6/GCP-2, CXCL7/NAP-2, CXCL9/MIG, CXCL10/IP-10, CXCL11/I-TAC, CXCL12/SDF-1, CXCL13/BCA-1, CXCL14/BRAK, CXCL15, CXCL16, CXCL17/DMC, CCL1, CCL2/MCP-1, CCL3/MIP-1α, CCL4/MIP-1f3, CCL5/RANTES, CCL6/C10, CCL7/MCP-3, CCL8/MCP-2, CCL9/CCL10, CCL11/Eotaxin, CCL12/MCP-5, CCL13/MCP-4, CCL14/HCC-1, CCL15/MIP-5, CCL16/LEC, CCL17/TARC, CCL18/MIP-4, CCL19/MIP-3β, CCL20/MIP-3α, CCL21/SLC, CCL22/MDC, CCL23/MPIF1, CCL24/Eotaxin-2, CCL25/TECK, CCL26/Eotaxin-3, CCL27/CTACK, CCL28/MEC, CL1, CL2, and CX3CL1 is determined in a sample. In certain further aspects, the presence or level of at least one adipocytokine including, but not limited to, leptin, adiponectin, resistin, active or total plasminogen activator inhibitor-1 (PAI-1), visfatin, and retinol binding protein 4 (RBP4) is determined in a sample. Preferably, the presence or level of IL-8, IL-1β, TWEAK, leptin, OPG, MIP-3β, GROα, CXCL4/PF-4, and/or CXCL7/NAP-2 is determined.
In certain instances, the presence or level of a particular cytokine is detected at the level of mRNA expression with an assay such as, for example, a hybridization assay or an amplification-based assay. In certain other instances, the presence or level of a particular cytokine is detected at the level of protein expression using, for example, an immunoassay (e.g., ELISA) or an immunohistochemical assay. Suitable ELISA kits for determining the presence or level of a cytokine such as IL-8, IL-1β, MIP-3β, GROα, CXCL4/PF-4, or CXCL7/NAP-2 in a serum, plasma, saliva, or urine sample are available from, e.g., R&D Systems, Inc. (Minneapolis, Minn.), Neogen Corp. (Lexington, Ky.), Alpco Diagnostics (Salem, N.H.), Assay Designs, Inc. (Ann Arbor, Mich.), BD Biosciences Pharmingen (San Diego, Calif.), Invitrogen (Camarillo, Calif.), Calbiochem (San Diego, Calif.), CHEMICON International, Inc. (Temecula, Calif.), Antigenix America Inc. (Huntington Station, N.Y.), QIAGEN Inc. (Valencia, Calif.), Bio-Rad Laboratories, Inc. (Hercules, Calif.), and/or Bender MedSystems Inc. (Burlingame, Calif.).
The human IL-8 polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—000575 (SEQ ID NO:1). The human IL-8 mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—000584 (SEQ ID NO:2). One skilled in the art will appreciate that IL-8 is also known as CXCL8, K60, NAF, GCP1, LECT, LUCT, NAP1, 3-10C, GCP-1, LYNAP, MDNCF, MONAP, NAP-1, SCYB8, TSG-1, AMCF-I, and b-ENAP.
The human IL-1β polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—000567 (SEQ ID NO:3). The human IL-1β mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—000576 (SEQ ID NO:4). One skilled in the art will appreciate that IL-1β is also known as IL1F2 and IL-1beta.
The human TWEAK polypeptide sequence is set forth in, e.g., Genbank Accession Nos. NP—003800 (SEQ ID NO:5) and AAC51923. The human TWEAK mRNA (coding) sequence is set forth in, e.g., Genbank Accession Nos. NM—003809 (SEQ ID NO:6) and BC104420. One skilled in the art will appreciate that TWEAK is also known as tumor necrosis factor ligand superfamily member 12 (TNFSF12), APO3 ligand (APO3L), CD255, DR3 ligand, growth factor-inducible 14 (Fn14) ligand, and UNQ181/PRO207.
The human leptin polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—000221 (SEQ ID NO:7). The human leptin mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—000230 (SEQ ID NO:8). One skilled in the art will appreciate that leptin is also known as OB, OBS, and FLJ94114.
The human osteoprotegerin polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—002537 (SEQ ID NO:9). The human osteoprotegerin mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—002546 (SEQ ID NO:10). One skilled in the art will appreciate that osteoprotegerin is also known as OPG, tumor necrosis factor receptor superfamily member 11b (TNFRSF11B), TR1, OCIF, and MGC29565.
The human MIP-3β polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—006265 (SEQ ID NO:11). The human MIP-3β mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—006274 (SEQ ID NO:12). One skilled in the art will appreciate that MIP-3β is also known as CCL19, ELC, CKb11, MIP3B, MIP-3b, SCYA19, and MGC34433.
The human GROα polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—001502 (SEQ ID NO:13). The human GROα mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—001511 (SEQ ID NO:14). One skilled in the art will appreciate that GROα is also known as CXCL1, GRO1, FSP, GROα, melanoma growth stimulating activity (MGSA), NAP-3, SCYB1, MGSA-a, and MGSA alpha.
The human platelet factor-4 (PF-4) polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—002610 (SEQ ID NO:15). The human PF-4 mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—002619 (SEQ ID NO:16). One skilled in the art will appreciate that PF-4 is also known as CXCL4, SCYB4, and MGC138298.
The human NAP-2 polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—002695 (SEQ ID NO:17). The human NAP-2 mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—002704 (SEQ ID NO:18). One skilled in the art will appreciate that NAP-2 is also known as pro-platelet basic protein (PPBP), CXCL7, PBP, TC1, TC2, TGB, LDGF, MDGF, TGB1, B-TG1, CTAP3, SCYB7, THBGB, LA-PF4, THBGB1, Beta-TG, CTAPIII, and CTAP-III.
B. Growth Factors
The determination of the presence or level of one or more growth factors in a sample is also useful in the present invention. As used herein, the term “growth factor” includes any of a variety of peptides, polypeptides, or proteins that are capable of stimulating cellular proliferation and/or cellular differentiation.
In certain aspects, the presence or level of at least one growth factor including, but not limited to, epidermal growth factor (EGF), heparin-binding epidermal growth factor (HB-EGF), vascular endothelial growth factor (VEGF), pigment epithelium-derived factor (PEDF; also known as SERPINF1), amphiregulin (AREG; also known as schwannoma-derived growth factor (SDGF)), basic fibroblast growth factor (bFGF), hepatocyte growth factor (HGF), transforming growth factor-α (TGF-α), transforming growth factor-β (TGF-β), bone morphogenetic proteins (e.g., BMP1-BMP15), platelet-derived growth factor (PDGF), nerve growth factor (NGF), β-nerve growth factor (β-NGF), neurotrophic factors (e.g., brain-derived neurotrophic factor (BDNF), neurotrophin 3 (NT3), neurotrophin 4 (NT4), etc.), growth differentiation factor-9 (GDF-9), granulocyte-colony stimulating factor (G-CSF), granulocyte-macrophage colony stimulating factor (GM-CSF), myostatin (GDF-8), erythropoietin (EPO), and thrombopoietin (TPO) is determined in a sample. Preferably, the presence or level of EGF, VEGF, PEDF, amphiregulin (SDGF), and/or BDNF is determined.
In certain instances, the presence or level of a particular growth factor is detected at the level of mRNA expression with an assay such as, for example, a hybridization assay or an amplification-based assay. In certain other instances, the presence or level of a particular growth factor is detected at the level of protein expression using, for example, an immunoassay (e.g., ELISA) or an immunohistochemical assay. Suitable ELISA kits for determining the presence or level of a growth factor such as EGF, VEGF, PEDF, SDGF, or BDNF in a serum, plasma, saliva, or urine sample are available from, e.g., Antigenix America Inc. (Huntington Station, N.Y.), Promega (Madison, Wis.), R&D Systems, Inc. (Minneapolis, Minn.), Invitrogen (Camarillo, Calif.), CHEMICON International, Inc. (Temecula, Calif.), Neogen Corp. (Lexington, Ky.), PeproTech (Rocky Hill, N.J.), Alpco Diagnostics (Salem, N.H.), Pierce Biotechnology, Inc. (Rockford, Ill.), and/or Abazyme (Needham, Mass.).
The human epidermal growth factor (EGF) polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—001954 (SEQ ID NO:19). The human EGF mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—001963 (SEQ ID NO:20). One skilled in the art will appreciate that EGF is also known as beta-urogastrone, URG, and HOMG4.
The human vascular endothelial growth factor (VEGF) polypeptide sequence is set forth in, e.g., Genbank Accession Nos. NP—001020537 (SEQ ID NO:21), NP—001020538, NP—001020539, NP—001020540, NP—001020541, NP—001028928, and NP—003367. The human VEGF mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—001025366 (SEQ ID NO:22), NM—001025367, NM—001025368, NM—001025369, NM—001025370, NM—001033756, and NM—003376. One skilled in the art will appreciate that VEGF is also known as VPF, VEGFA, VEGF-A, and MGC70609.
The human pigment epithelium-derived factor (PEDF) polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—002606 (SEQ ID NO:23). The human PEDF mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—002615 (SEQ ID NO:24). One skilled in the art will appreciate that PEDF is also known as serpin peptidase inhibitor clade F (alpha-2 antiplasmin, pigment epithelium derived factor) member 1, SERPINF1, EPC-1, and PIG35.
The human brain-derived neurotrophic factor (BDNF) polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—733931 (SEQ ID NO:25), NP—733928, NP—733927, NP—001700, NP—733929, and NP—733930. The human BDNF mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—170735 (SEQ ID NO:26), NM—170732, NM—170731, NM—001709, NM—170733, and NM—170734. One skilled in the art will appreciate that BDNF is also known as MGC34632.
The human schwannoma-derived growth factor (SDGF) polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—001648 (SEQ ID NO:27). The human SDGF mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—001657 (SEQ ID NO:28). One skilled in the art will appreciate that SDGF is also known as amphiregulin, AREG, AR, CRDGF, and MGC13647.
C. Lipocalins
The determination of the presence or level of one or more lipocalins in a sample is also useful in the present invention. As used herein, the term “lipocalin” includes any of a variety of small extracellular proteins that are characterized by several common molecular recognition properties: the ability to bind a range of small hydrophobic molecules; binding to specific cell-surface receptors; and the formation of complexes with soluble macromolecules (see, e.g., Flowers, Biochem. J, 318:1-14 (1996)). The varied biological functions of lipocalins are mediated by one or more of these properties. The lipocalin protein family exhibits great functional diversity, with roles in retinol transport, invertebrate cryptic coloration, olfaction and pheromone transport, and prostaglandin synthesis. Lipocalins have also been implicated in the regulation of cell homoeostasis and the modulation of the immune response, and, as carrier proteins, to act in the general clearance of endogenous and exogenous compounds. Although lipocalins have great diversity at the sequence level, their three-dimensional structure is a unifying characteristic. Lipocalin crystal structures are highly conserved and comprise a single eight-stranded continuously hydrogen-bonded antiparallel beta-barrel, which encloses an internal ligand-binding site.
In certain aspects, the presence or level of at least one lipocalin including, but not limited to, neutrophil gelatinase-associated lipocalin (NGAL; also known as human neutrophil lipocalin (HNL) or lipocalin-2), von Ebner's gland protein (VEGP; also known as lipocalin-1), retinol-binding protein (RBP), purpurin (PURP), retinoic acid-binding protein (RABP), α2u-globulin (A2U), major urinary protein (MUP), bilin-binding protein (BBP), α-crustacyanin, pregnancy protein 14 (PP14), β-lactoglobulin (Blg), α1-microglobulin (A1M), the gamma chain of C8 (C8γ), Apolipoprotein D (ApoD), lazarillo (LAZ), prostaglandin D2 synthase (PGDS), quiescence-specific protein (QSP), choroid plexus protein, odorant-binding protein (OBP), α1-acid glycoprotein (AGP), probasin (PBAS), aphrodisin, orosomucoid, and progestagen-associated endometrial protein (PAEP) is determined in a sample. In certain other aspects, the presence or level of at least one lipocalin complex including, for example, a complex of NGAL and a matrix metalloproteinase (e.g., NGAL/MMP-9 complex) is determined. Preferably, the presence or level of NGAL or a complex thereof with MMP-9 is determined.
In certain instances, the presence or level of a particular lipocalin is detected at the level of mRNA expression with an assay such as, for example, a hybridization assay or an amplification-based assay. In certain other instances, the presence or level of a particular lipocalin is detected at the level of protein expression using, for example, an immunoassay (e.g., ELISA) or an immunohistochemical assay. Suitable ELISA kits for determining the presence or level of a lipocalin such as NGAL in a serum, plasma, or urine sample are available from, e.g., AntibodyShop A/S (Gentofte, Denmark), LabClinics SA (Barcelona, Spain), Lucerna-Chem AG (Luzern, Switzerland), R&D Systems, Inc. (Minneapolis, Minn.), and Assay Designs, Inc. (Ann Arbor, Mich.). Suitable ELISA kits for determining the presence or level of the NGAL/MMP-9 complex are available from, e.g., R&D Systems, Inc. (Minneapolis, Minn.). Additional NGAL and NGAL/MMP-9 complex ELISA techniques are described in, e.g., Kjeldsen et al., Blood, 83:799-807 (1994); and Kjeldsen et al., J. Immunol. Methods, 198:155-164 (1996).
The human neutrophil gelatinase-associated lipocalin (NGAL) polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—005555 (SEQ ID NO:29). The human NGAL mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—005564 (SEQ ID NO:30). One skilled in the art will appreciate that NGAL is also known as lipocalin 2 and LCN2.
D. Matrix Metalloproteinases
The determination of the presence or level of at least one matrix metalloproteinase (MMP) in a sample is also useful in the present invention. As used herein, the term “matrix metalloproteinase” or “MMP” includes zinc-dependent endopeptidases capable of degrading a variety of extracellular matrix proteins, cleaving cell surface receptors, releasing apoptotic ligands, and/or regulating chemokines. MMPs are also thought to play a major role in cell behaviors such as cell proliferation, migration (adhesion/dispersion), differentiation, angiogenesis, and host defense.
In certain aspects, the presence or level of at least one at least one MMP including, but not limited to, MMP-1 (interstitial collagenase), MMP-2 (gelatinase-A), MMP-3 (stromelysin-1), MMP-7 (matrilysin), MMP-8 (neutrophil collagenase), MMP-9 (gelatinase-B), MMP-10 (stromelysin-2), MMP-11 (stromelysin-3), MMP-12 (macrophage metalloelastase), MMP-13 (collagenase-3), MMP-14, MMP-15, MMP-16, MMP-17, MMP-18 (collagenase-4), MMP-19, MMP-20 (enamelysin), MMP-21, MMP-23, MMP-24, MMP-25, MMP-26 (matrilysin-2), MMP-27, and MMP-28 (epilysin) is determined in a sample. Preferably, the presence or level of MMP-9 is determined.
In certain instances, the presence or level of a particular MMP is detected at the level of mRNA expression with an assay such as, for example, a hybridization assay or an amplification-based assay. In certain other instances, the presence or level of a particular MMP is detected at the level of protein expression using, for example, an immunoassay (e.g., ELISA) or an immunohistochemical assay. Suitable ELISA kits for determining the presence or level of an MMP such as MMP-9 in a serum or plasma sample are available from, e.g., Calbiochem (San Diego, Calif.), CHEMICON International, Inc. (Temecula, Calif.), and R&D Systems, Inc. (Minneapolis, Minn.).
The human matrix metalloproteinase-9 (MMP-9) polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—004985 (SEQ ID NO:31). The human MMP-9 mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—004994 (SEQ ID NO:32). One skilled in the art will appreciate that MMP-9 is also known as matrix metallopeptidase-9, gelatinase B, 92 kDa gelatinase, 92 kDa type IV collagenase, GELB, and CLG4B.
E. Tissue Inhibitor of Metalloproteinases
The determination of the presence or level of at least one tissue inhibitor of metalloproteinase (TIMP) in a sample is also useful in the present invention. As used herein, the term “tissue inhibitor of metalloproteinase” or “TIMP” includes proteins capable of inhibiting MMPs.
In certain aspects, the presence or level of at least one at least one TIMP including, but not limited to, TIMP-1, TIMP-2, TIMP-3,and TIMP-4 is determined in a sample. Preferably, the presence or level of TIMP-1 is determined.
In certain instances, the presence or level of a particular TIMP is detected at the level of mRNA expression with an assay such as, for example, a hybridization assay or an amplification-based assay. In certain other instances, the presence or level of a particular TIMP is detected at the level of protein expression using, for example, an immunoassay (e.g., ELISA) or an immunohistochemical assay. Suitable ELISA kits for determining the presence or level of a TIMP such as TIMP-1 in a serum or plasma sample are available from, e.g., Alpco Diagnostics (Salem, N.H.), Calbiochem (San Diego, Calif.), Invitrogen (Camarillo, Calif.), CHEMICON International, Inc. (Temecula, Calif.), and R&D Systems, Inc. (Minneapolis, Minn.).
The human tissue inhibitor of metalloproteinase-1 (TIMP-1) polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—003245 (SEQ ID NO:33). The human TIMP-1 mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—003254 (SEQ ID NO:34). One skilled in the art will appreciate that TIMP-1 is also known as EPA, EPO, HCI, CLGI, TIMP, and FLJ90373.
F. Globulins
The determination of the presence or level of at least one globulin in a sample is also useful in the present invention. As used herein, the term “globulin” includes any member of a heterogeneous series of families of serum proteins which migrate less than albumin during serum electrophoresis. Protein electrophoresis is typically used to categorize globulins into the following three categories: alpha-globulins (i.e., alpha-1-globulins or alpha-2-globulins); beta-globulins; and gamma-globulins.
Alpha-globulins comprise a group of globular proteins in plasma which are highly mobile in alkaline or electrically-charged solutions. They generally function to inhibit certain blood protease and inhibitor activity. Examples of alpha-globulins include, but are not limited to, alpha-2-macroglobulin (a2-MG), haptoglobin (Hp), orosomucoid, alpha-1-antitrypsin, alpha-1-antichymotrypsin, alpha-2-antiplasmin, antithrombin, ceruloplasmin, heparin cofactor II, retinol binding protein, and transcortin. Preferably, the presence or level of a2-MG, haptoglobin, and/or orosomucoid is determined. In certain instances, one or more haptoglobin allotypes such as, for example, Hp precursor, Hpβ, Hpα1, and Hpα2, are determined.
In certain instances, the presence or level of a particular globulin is detected at the level of mRNA expression with an assay such as, for example, a hybridization assay or an amplification-based assay. In certain other instances, the presence or level of a particular globulin is detected at the level of protein expression using, for example, an immunoassay (e.g., ELISA) or an immunohistochemical assay. Suitable ELISA kits for determining the presence or level of a globulin such as α2-MG, haptoglobin, or orosomucoid in a serum, plasma, or urine sample are available from, e.g., GenWay Biotech, Inc. (San Diego, Calif.) and/or Immundiagnostik AG (Bensheim, Germany).
The human alpha-2-macroglobulin (α2-MG) polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—000005 (SEQ ID NO:35). The human α2-MG mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—000014 (SEQ ID NO:36). One skilled in the art will appreciate that α2-MG is also known as A2M, CPAMD5, FWP007, S863-7, alpha 2M, and DKFZp779B086.
The human haptoglobin precursor alpha-2 (Hpα2) polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—005134 (SEQ ID NO:37) and NP—001119574. The human Hpα2 mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—005143 (SEQ ID NO:38) and NM—001126102. One skilled in the art will appreciate that Hpα2 is also known as haptoglobin, HP, BP, HPA1S, MGC111141, and HP2-alpha-2.
The human orosomucoid polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—000598 (SEQ ID NO:39). The human orosomucoid mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—000607 (SEQ ID NO:40). One skilled in the art will appreciate that orosomucoid is also known as ORM, orosomucoid 1, ORM1, AGP1, and AGP-A.
G. Actin-Severing Proteins
The determination of the presence or level of at least one actin-severing protein in a sample is also useful in the present invention. As used herein, the teen “actin-severing protein” includes any member of a family of proteins involved in actin remodeling and regulation of cell motility. Non-limiting examples of actin-severing proteins include gelsolin (also known as brevin or actin-depolymerizing factor), villin, fragmin, and adseverin. For example, gelsolin is a protein of leukocytes, platelets, and other cells which severs actin filaments in the presence of submicromolar calcium, thereby solating cytoplasmic actin gels.
In certain instances, the presence or level of a particular actin-severing protein is detected at the level of mRNA expression with an assay such as, for example, a hybridization assay or an amplification-based assay. In certain other instances, the presence or level of a particular actin-severing protein is detected at the level of protein expression using, for example, an immunoassay (e.g., ELISA) or an immunohistochemical assay. Suitable ELISA techniques for determining the presence or level of an actin-severing protein such as gelsolin in a plasma sample are described in, e.g., Smith et al., J. Lab. Clin. Med., 110:189-195 (1987); and Hiyoshi et al., Biochem. Mol. Biol. Int., 32:755-762 (1994).
The human gelsolin polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—000168 (SEQ ID NO:41) and NP—937895. The human gelsolin mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—000177 (SEQ ID NO:42) and NM—198252. One skilled in the art will appreciate that gelsolin is also known as GSN and DKFZp313L0718.
H. 5100 Proteins
The determination of the presence or level of at least one S100 protein in a sample is also useful in the present invention. As used herein, the term “S 100 protein” includes any member of a family of low molecular mass acidic proteins characterized by cell-type-specific expression and the presence of 2 EF-hand calcium-binding domains. There are at least 21 different types of S100 proteins in humans. The name is derived from the fact that S100 proteins are 100% soluble in ammonium sulfate at neutral pH. Most S100 proteins are homodimeric, consisting of two identical polypeptides held together by non-covalent bonds. Although S100 proteins are structurally similar to calmodulin, they differ in that they are cell-specific, expressed in particular cells at different levels depending on environmental factors. S-100 proteins are normally present in cells derived from the neural crest (e.g., Schwann cells, melanocytes, glial cells), chondrocytes, adipocytes, myoepithelial cells, macrophages, Langerhans cells, dendritic cells, and keratinocytes. S100 proteins have been implicated in a variety of intracellular and extracellular functions such as the regulation of protein phosphorylation, transcription factors, Ca2+ homeostasis, the dynamics of cytoskeleton constituents, enzyme activities, cell growth and differentiation, and the inflammatory response.
Calgranulin is an S100 protein that is expressed in multiple cell types, including renal epithelial cells and neutrophils, and are abundant in infiltrating monocytes and granulocytes under conditions of chronic inflammation. Examples of calgranulins include, without limitation, calgranulin A (also known as S100A8 or MRP-8), calgranulin B (also known as S100A9 or MRP-14), and calgranulin C (also known as S100A12).
In certain instances, the presence or level of a particular S100 protein is detected at the level of mRNA expression with an assay such as, for example, a hybridization assay or an amplification-based assay. In certain other instances, the presence or level of a particular S100 protein is detected at the level of protein expression using, for example, an immunoassay (e.g., ELISA) or an immunohistochemical assay. Suitable ELISA kits for determining the presence or level of an S100 protein such as calgranulin A (S100A8) or calgranulin B (S100A9) in a serum, plasma, or urine sample are available from, e.g., Peninsula Laboratories Inc. (San Carlos, Calif.) and Hycult biotechnology b.v. (Uden, The Netherlands).
Calprotectin, the complex of S100A8 and S100A9, is a calcium- and zinc-binding protein in the cytosol of neutrophils, monocytes, and keratinocytes. Calprotectin is a major protein in neutrophilic granulocytes and macrophages and accounts for as much as 60% of the total protein in the cytosol fraction in these cells. It is therefore a surrogate marker of neutrophil turnover. Its concentration in stool correlates with the intensity of neutrophil infiltration of the intestinal mucosa and with the severity of inflammation. In some instances, calprotectin can be measured with an ELISA using small (50-100 mg) fecal samples (see, e.g., Johne et al., Scand J Gastroenterol., 36:291-296 (2001)).
The human S100 calcium binding protein A8 (S100A8) polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—002955 (SEQ ID NO:43). The human S100A8 mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—002964 (SEQ ID NO:44). One skilled in the art will appreciate that S100A8 is also known as calgranulin A, MRP-8, P8, MIF, NIF, CAGA, CFAG, CGLA, L1Ag, CP-10, MA387, and 60B8AG.
I. Anti-Neutrophil Antibodies
The determination of ANCA levels and/or the presence or absence of pANCA in a sample is also useful in the present invention. As used herein, the term “anti-neutrophil cytoplasmic antibody” or “ANCA” includes antibodies directed to cytoplasmic and/or nuclear components of neutrophils. ANCA activity can be divided into several broad categories based upon the ANCA staining pattern in neutrophils: (1) cytoplasmic neutrophil staining without perinuclear highlighting (cANCA); (2) perinuclear staining around the outside edge of the nucleus (pANCA); (3) perinuclear staining around the inside edge of the nucleus (NSNA); and (4) diffuse staining with speckling across the entire neutrophil (SAPPA). In certain instances, pANCA staining is sensitive to DNase treatment. The term ANCA encompasses all varieties of anti-neutrophil reactivity, including, but not limited to, cANCA, pANCA, NSNA, and SAPPA. Similarly, the term ANCA encompasses all immunoglobulin isotypes including, without limitation, immunoglobulin A and G.
ANCA levels in a sample from an individual can be determined, for example, using an immunoassay such as an enzyme-linked immunosorbent assay (ELISA) with alcohol-fixed neutrophils. The presence or absence of a particular category of ANCA such as pANCA can be determined, for example, using an immunohistochemical assay such as an indirect fluorescent antibody (IFA) assay. Preferably, the presence or absence of pANCA in a sample is determined using an immunofluorescence assay with DNase-treated, fixed neutrophils. In addition to fixed neutrophils, antibodies directed against human antibodies can be used for detection. Antigens specific for ANCA are also suitable for determining ANCA levels, including, without limitation, unpurified or partially purified neutrophil extracts; purified proteins, protein fragments, or synthetic peptides such as histone H1 or ANCA-reactive fragments thereof (see, e.g., U.S. Pat. No. 6,074,835); histone H1-like antigens, porin antigens, Bacteroides antigens, or ANCA-reactive fragments thereof (see, e.g., U.S. Pat. No. 6,033,864); secretory vesicle antigens or ANCA-reactive fragments thereof (see, e.g., U.S. patent application Ser. No. 08/804,106); and anti-ANCA idiotypic antibodies. One skilled in the art will appreciate that the use of additional antigens specific for ANCA is within the scope of the present invention.
J. Anti-Saccharomyces Cerevisiae Antibodies
The determination of ASCA (e.g., ASCA-IgA and/or ASCA-IgG) levels in a sample is also useful in the present invention. As used herein, the term “anti-Saccharomyces cerevisiae immunoglobulin A” or “ASCA-IgA” includes antibodies of the immunoglobulin A isotype that react specifically with S. cerevisiae. Similarly, the term “anti-Saccharomyces cerevisiae immunoglobulin G” or “ASCA-IgG” includes antibodies of the immunoglobulin G isotype that react specifically with S. cerevisiae.
The determination of whether a sample is positive for ASCA-IgA or ASCA-IgG is made using an antibody specific for human antibody sequences or an antigen specific for ASCA. Such an antigen can be any antigen or mixture of antigens that is bound specifically by ASCA-IgA and/or ASCA-IgG. Although ASCA antibodies were initially characterized by their ability to bind S. cerevisiae, those of skill in the art will understand that an antigen that is bound specifically by ASCA can be obtained from S. cerevisiae or from a variety of other sources so long as the antigen is capable of binding specifically to ASCA antibodies. Accordingly, exemplary sources of an antigen specific for ASCA, which can be used to determine the levels of ASCA-IgA and/or ASCA-IgG in a sample, include, without limitation, whole killed yeast cells such as Saccharomyces or Candida cells; yeast cell wall mannan such as phosphopeptidomannan (PPM); oligosachharides such as oligomannosides; neoglycolipids; anti-ASCA idiotypic antibodies; and the like. Different species and strains of yeast, such as S. cerevisiae strain Su1, Su2, CBS1315, or BM 156, or Candida albicans strain VW32, are suitable for use as an antigen specific for ASCA-IgA and/or ASCA-IgG. Purified and synthetic antigens specific for ASCA are also suitable for use in determining the levels of ASCA-IgA and/or ASCA-IgG in a sample. Examples of purified antigens include, without limitation, purified oligosaccharide antigens such as oligomannosides. Examples of synthetic antigens include, without limitation, synthetic oligomannosides such as those described in U.S. Patent Publication No. 20030105060, e.g., D-Man β(1-2) D-Man β(1-2) D-Man β(1-2) D-Man-OR, D-Man α(1-2) D-Man α(1-2) D-Man a(1-2) D-Man-OR, and D-Man α(1-3) D-Man α(1-2) D-Man α(1-2) D-Man-OR, wherein R is a hydrogen atom, a C1 to C20 alkyl, or an optionally labeled connector group.
Preparations of yeast cell wall mannans, e.g., PPM, can be used in determining the levels of ASCA-IgA and/or ASCA-IgG in a sample. Such water-soluble surface antigens can be prepared by any appropriate extraction technique known in the art, including, for example, by autoclaving, or can be obtained commercially (see, e.g., Lindberg et al., Gut, 33:909-913 (1992)). The acid-stable fraction of PPM is also useful in the statistical algorithms of the present invention (Sendid et al., Clin. Diag. Lab. Immunol., 3:219-226 (1996)). An exemplary PPM that is useful in determining ASCA levels in a sample is derived from S. uvarum strain ATCC #38926.
Purified oligosaccharide antigens such as oligomannosides can also be useful in determining the levels of ASCA-IgA and/or ASCA-IgG in a sample. The purified oligomannoside antigens are preferably converted into neoglycolipids as described in, for example, Faille et al., Eur. J. Microbiol. Infect. Dis., 11:438-446 (1992). One skilled in the art understands that the reactivity of such an oligomannoside antigen with ASCA can be optimized by varying the mannosyl chain length (Frosh et al., Proc Natl. Acad. Sci. USA, 82:1194-1198 (1985)); the anomeric configuration (Fukazawa et al., In “Immunology of Fungal Disease,” E. Kurstak (ed.), Marcel Dekker Inc., New York, pp. 37-62 (1989); Nishikawa et al., Microbiol. Immunol., 34:825-840 (1990); Poulain et al., Eur. J. Clin. Microbiol., 23:46-52 (1993); Shibata et al., Arch. Biochem. Biophys., 243:338-348 (1985); Trinel et al., Infect. Immun., 60:3845-3851 (1992)); or the position of the linkage (Kikuchi et al., Planta, 190:525-535 (1993)).
Suitable oligomannosides for use in the methods of the present invention include, without limitation, an oligomannoside having the mannotetraose Man(1-3) Man(1-2) Man(1-2) Man. Such an oligomannoside can be purified from PPM as described in, e.g., Faille et al., supra. An exemplary neoglycolipid specific for ASCA can be constructed by releasing the oligomannoside from its respective PPM and subsequently coupling the released oligomannoside to 4-hexadecylaniline or the like.
K. Anti-Microbial Antibodies
The determination of anti-OmpC antibody levels in a sample is also useful in the present invention. As used herein, the term “anti-outer membrane protein C antibody” or “anti-OmpC antibody” includes antibodies directed to a bacterial outer membrane porin as described in, e.g., PCT Patent Publication No. WO 01/89361. The term “outer membrane protein C” or “OmpC” refers to a bacterial porin that is immunoreactive with an anti-OmpC antibody.
The level of anti-OmpC antibody present in a sample from an individual can be determined using an OmpC protein or a fragment thereof such as an immunoreactive fragment thereof. Suitable OmpC antigens useful in determining anti-OmpC antibody levels in a sample include, without limitation, an OmpC protein, an OmpC polypeptide having substantially the same amino acid sequence as the OmpC protein, or a fragment thereof such as an immunoreactive fragment thereof. As used herein, an OmpC polypeptide generally describes polypeptides having an amino acid sequence with greater than about 50% identity, preferably greater than about 60% identity, more preferably greater than about 70% identity, still more preferably greater than about 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% amino acid sequence identity with an OmpC protein, with the amino acid identity determined using a sequence alignment program such as CLUSTALW. Such antigens can be prepared, for example, by purification from enteric bacteria such as E. coli, by recombinant expression of a nucleic acid such as Genbank Accession No. K00541, by synthetic means such as solution or solid phase peptide synthesis, or by using phage display.
The determination of anti-I2 antibody levels in a sample is also useful in the present invention. As used herein, the term “anti-I2 antibody” includes antibodies directed to a microbial antigen sharing homology to bacterial transcriptional regulators as described in, e.g., U.S. Pat. No. 6,309,643. The term “12” refers to a microbial antigen that is immunoreactive with an anti-I2 antibody. The microbial I2 protein is a polypeptide of 100 amino acids sharing some similarity weak homology with the predicted protein 4 from C. pasteurianum, Rv3557c from Mycobacterium tuberculosis, and a transcriptional regulator from Aquifex aeolicus. The nucleic acid and protein sequences for the I2 protein are described in, e.g., U.S. Pat. No. 6,309,643.
The level of anti-I2 antibody present in a sample from an individual can be determined using an I2 protein or a fragment thereof such as an immunoreactive fragment thereof. Suitable I2 antigens useful in determining anti-I2 antibody levels in a sample include, without limitation, an I2 protein, an I2 polypeptide having substantially the same amino acid sequence as the I2 protein, or a fragment thereof such as an immunoreactive fragment thereof. Such I2 polypeptides exhibit greater sequence similarity to the I2 protein than to the C. pasteurianum protein 4 and include isotype variants and homologs thereof. As used herein, an I2 polypeptide generally describes polypeptides having an amino acid sequence with greater than about 50% identity, preferably greater than about 60% identity, more preferably greater than about 70% identity, still more preferably greater than about 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% amino acid sequence identity with a naturally-occurring I2 protein, with the amino acid identity determined using a sequence alignment program such as CLUSTALW. Such I2 antigens can be prepared, for example, by purification from microbes, by recombinant expression of a nucleic acid encoding an I2 antigen, by synthetic means such as solution or solid phase peptide synthesis, or by using phage display.
The determination of anti-flagellin antibody levels in a sample is also useful in the present invention. As used herein, the term “anti-flagellin antibody” includes antibodies directed to a protein component of bacterial flagella as described in, e.g., PCT Patent Publication No. WO 03/053220 and U.S. Patent Publication No. 20040043931. The term “flagellin” refers to a bacterial flagellum protein that is immunoreactive with an anti-flagellin antibody. Microbial flagellins are proteins found in bacterial flagellum that arrange themselves in a hollow cylinder to form the filament.
The level of anti-flagellin antibody present in a sample from an individual can be determined using a flagellin protein or a fragment thereof such as an immunoreactive fragment thereof. Suitable flagellin antigens useful in determining anti-flagellin antibody levels in a sample include, without limitation, a flagellin protein such as Cbir-1 flagellin, flagellin X, flagellin A, flagellin B, fragments thereof, and combinations thereof, a flagellin polypeptide having substantially the same amino acid sequence as the flagellin protein, or a fragment thereof such as an immunoreactive fragment thereof. As used herein, a flagellin polypeptide generally describes polypeptides having an amino acid sequence with greater than about 50% identity, preferably greater than about 60% identity, more preferably greater than about 70% identity, still more preferably greater than about 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% amino acid sequence identity with a naturally-occurring flagellin protein, with the amino acid identity determined using a sequence alignment program such as CLUSTALW. Such flagellin antigens can be prepared, e.g., by purification from bacterium such as Helicobacter Bilis, Helicobacter mustelae, Helicobacter pylori, Butyrivibrio fibrisolvens, and bacterium found in the cecum, by recombinant expression of a nucleic acid encoding a flagellin antigen, by synthetic means such as solution or solid phase peptide synthesis, or by using phage display.
L. Other Diagnostic Markers
The determination of the presence or level of fibrinogen or a proteolytic product thereof such as a fibrinopeptide in a sample is also useful in the present invention. Fibrinogen is a plasma glycoprotein synthesized in the liver composed of 3 structurally different subunits: alpha (FGA); beta (FGB); and gamma (FGG). Thrombin causes a limited proteolysis of the fibrinogen molecule, during which fibrinopeptides A and B are released from the N-terminal regions of the alpha and beta chains, respectively. Fibrinopeptides A and B, which have been sequenced in many species, may have a physiological role as vasoconstrictors and may aid in local hemostasis during blood clotting. In one embodiment, human fibrinopeptide A comprises the sequence: Ala-Asp-Ser-Gly-Glu-Gly-Asp-Phe-Leu-Ala-Glu-Gly-Gly-Gly-Val-Arg (SEQ ID NO:91). In another embodiment, human fibrinopeptide B comprises the sequence: Glp-Gly-Val-Asn-Asp-Asn-Glu-Glu-Gly-Phe-Phe-Ser-Ala-Arg (SEQ ID NO:92). An ELISA kit available from American Diagnostica Inc. (Stamford, Conn.) can be used to detect the presence or level of human fibrinopeptide A in plasma or other biological fluids.
The human fibrinogen (FGA) polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—000499 (SEQ ID NO:45). A human FGA variant mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—000508 (SEQ ID NO:46), NM—001033952, and NM—001033953. One skilled in the art will appreciate that FGA is also known as fibrinopeptide, Fib2, MGC 119422, MGC 119423, and MGC 119425.
The determination of the presence or level of lactoferrin in a sample is also useful in the present invention. In certain instances, the presence or level of lactoferrin is detected at the level of mRNA expression with an assay such as, for example, a hybridization assay or an amplification-based assay. In certain other instances, the presence or level of lactoferrin is detected at the level of protein expression using, for example, an immunoassay (e.g., ELISA) or an immunohistochemical assay. A lactoferrin ELISA kit available from Calbiochem (San Diego, Calif.) can be used to detect human lactoferrin in a plasma, urine, bronchoalveolar lavage, or cerebrospinal fluid sample. Similarly, an ELISA kit available from U.S. Biological (Swampscott, Mass.) can be used to determine the level of lactoferrin in a plasma sample. U.S. Patent Publication No. 20040137536 describes an ELISA assay for determining the presence of elevated lactoferrin levels in a stool sample. Likewise, U.S. Patent Publication No. 20040033537 describes an ELISA assay for determining the concentration of endogenous lactoferrin in a stool, mucus, or bile sample. In some embodiments, then presence or level of anti-lactoferrin antibodies can be detected in a sample using, e.g., lactoferrin protein or a fragment thereof.
The human lactoferrin polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—002334 (SEQ ID NO:47). The human lactoferrin mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—002343 (SEQ ID NO:48). One skilled in the art will appreciate that lactoferrin is also known as LF, lactotransferrin, LTF, HLF2, and GIG12.
In certain embodiments, the determination of the presence or level of calcitonin gene-related peptide (CGRP) in a sample is useful in the present invention. Calcitonin is a 32-amino acid peptide hormone synthesized by the parafollicular cells of the thyroid. It causes reduction in serum calcium, an effect opposite to that of parathyroid hormone. CGRP is derived, with calcitonin, from the CT/CGRP gene located on chromosome 11. CGRP is a 37-amino acid peptide and is a potent endogenous vasodilator. CGRP is primarily produced in nervous tissue; however, its receptors are expressed throughout the body. An ELISA kit available from Cayman Chemical Co. (Ann Arbor, Mich.) can be used to detect the presence or level of human CGRP in a variety of samples including plasma, serum, nervous tissue, CSF, and culture media.
The human calcitonin gene-related peptide (CGRP) polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—001732 (SEQ ID NO:49), NP—001029124, and NP—001029125. The human CGRP mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—001741 (SEQ ID NO:50), NM—001033952, and NM—001033953. One skilled in the art will appreciate that CGRP is also known as calcitonin-related polypeptide alpha, CALCA, CT, KC, CALC1, CGRP1, CGRP-I, and MGC126648.
In other embodiments, the determination of the presence or level of an anti-tissue transglutaminase (tTG) antibody in a sample is useful in the present invention. As used herein, the term “anti-tTG antibody” includes any antibody that recognizes tissue transglutaminase (tTG) or a fragment thereof. Transglutaminases are a diverse family of Ca2+-dependent enzymes that are ubiquitous and highly conserved across species. Of all the transglutaminases, tTG is the most widely distributed. In certain instances, the anti-tTG antibody is an anti-tTG IgA antibody, anti-tTG IgG antibody, or mixtures thereof. An ELISA kit available from ScheBo Biotech USA Inc. (Marietta, Ga.) can be used to detect the presence or level of human anti-tTG IgA antibodies in a blood sample.
The determination of the presence of polymorphisms in the NOD2/CARD15 gene in a sample is also useful in the present invention. For example, polymorphisms in the NOD2 gene such as a C2107T nucleotide variant that results in a R703W protein variant can be identified in a sample from an individual (see, e.g., U.S. Patent Publication No. 20030190639). In an alternative embodiment, NOD2 mRNA levels can be used as a diagnostic marker of the present invention to aid in classifying IBS.
The determination of the presence of polymorphisms in the serotonin reuptake transporter (SERT) gene in a sample is also useful in the present invention. For example, polymorphisms in the promoter region of the SERT gene have effects on transcriptional activity, resulting in altered 5-HT reuptake efficiency. It has been shown that a strong genotypic association was observed between the SERT-P deletion/deletion genotype and the IBS phenotype (see, e.g., Yeo Gut, 53:1396-1399 (2004)). In an alternative embodiment, SERT mRNA levels can be used as a diagnostic marker of the present invention to aid in classifying IBS (see, e.g., Gershon, J. Clin. Gastroenterol., 39(5 Suppl.):5184-193 (2005)).
In certain aspects, the level of tryptophan hydroxylase-1 mRNA is a diagnostic marker. For example, tryptophan hydroxylase-1 mRNA has been shown to be significantly reduced in IBS (see, e.g., Coats, Gastroenterology, 126:1897-1899 (2004)). In certain other aspects, a lactulose breath test to measure methane, which is indicative of bacterial overgrowth, can be used as a diagnostic marker for IBS.
Additional diagnostic markers include, but are not limited to, IBS1, MUC20, VSIG2, CKB, M160, VSIG4, CASP1, NCF4, LYZ, KCNS3, PSME2, MS4A4A, HELLS, COP1, FCGR2A, RFC4, MCM5, TAP2, LRAP, L2DTL and combinations thereof. Non-limiting examples of other diagnostic markers include L-selectin/CD62L, anti-U1-70 kDa autoantibodies, zona occludens 1 (ZO-1), vasoactive intestinal peptide (VIP), serum amyloid A, gastrin, NB3 gene polymorphisms, NCH gene polymorphisms, fecal leukocytes, α2A and α2C adrenoreceptor gene polymorphisms, IL-10 gene polymorphisms, TNF-α gene polymorphisms, TGF-β1 gene polymorphisms, α-adrenergic receptors, G-proteins, 5-HT2A gene polymorphisms, 5-HTT LPR gene polymorphisms, 5-HT4 receptor gene polymorphisms, zonulin, the 33-mer peptide (Shan et al., Science, 297:2275-2279 (2002); PCT Patent Publication No. WO 03/068170) and combinations thereof.
The human IBS1 polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—056208 (SEQ ID NO:51). The human IBS1 mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—015393 (SEQ ID NO:52). One skilled in the art will appreciate that IBS1 is also known as DKFZP564O0823.
The human mucin 20 (MUC20) polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—689886 (SEQ ID NO:53) and NP—001091986. The human MUC20 mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—152673 (SEQ ID NO:54) and NM—001098516. One skilled in the art will appreciate that MUC20 is also known as FLJ14408 and KIAA1359.
The human V-set and immunoglobulin domain containing 2 (VSIG2) polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—055127 (SEQ ID NO:55). The human VSIG2 mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—014312 (SEQ ID NO:56). One skilled in the art will appreciate that VSIG2 is also known as CTH, CTXL, and 2210413P10Rik.
The human creatine kinase, brain (CKB) polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—001814 (SEQ ID NO:57). The human CKB mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—001823 (SEQ ID NO:58). One skilled in the art will appreciate that CKB is also known as B-CK and CKBB.
The human CD163 molecule-like 1 (CD163L1) polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—777601 (SEQ ID NO:59). The human CD163L1 mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—174941 (SEQ ID NO:60). One skilled in the art will appreciate that CD163L1 is also known as M160, scavenger receptor cysteine-rich type 1 protein M160, and CD163B.
The human V-set and immunoglobulin domain containing 4 (VSIG4) polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—009199 (SEQ ID NO:61) and NP—001093901. The human VSIG4 mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—007268 (SEQ ID NO:62) and NM—001100431. One skilled in the art will appreciate that VSIG4 is also known as CRIg and Z391G.
The human caspase 1, apoptosis-related cysteine peptidase (CASP1) polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—001214 (SEQ ID NO:63), NP—150634, NP—150635, NP—150636, and NP—150637. The human CASP1 mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. NM—001223 (SEQ ID NO:64), NM—033292, NM—033293, NM—033294, and NM—033295. One skilled in the art will appreciate that CASP1 is also known as interleukin 1 beta convertase, IL1BC, ICE, and P45.
The human neutrophil cytosolic factor 4 (NCF4) polypeptide sequence is set forth in, e.g., Genbank Accession No. NP—000622 (SEQ ID NO:65) and . The human NCF4 mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. (SEQ ID NO:66) and. One skilled in the art will appreciate that NCF4 is also known as neutrophil NADPH oxidase factor 4, NCF, MGC3810, P4OPHOX, and SH3PXD4.
The human lysozyme polypeptide sequence is set forth in, e.g., Genbank Accession No. AAH04147.1 (SEQ ID NO:67), AAA59535.1, AAA59536.1, AAA36188.1, AAC63078.1. The human lysozyme mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. BC004147.2 (SEQ ID NO:68), AK130127.1, AK130149.1, CR607267.1, CR615077.1, J03801.1, M19045.1, M21119.1, U25677.1. One skilled in the art will appreciate that lysozyme is also known as lysozyme C and 1,4-beta-N-acetylmuramidase C.
The human potassium voltage-gated channel, delayed-rectifier, subfamily S, member 3 (KCNS3) polypeptide sequence is set forth in, e.g., Genbank Accession No. AAC13164.1 (SEQ ID NO:69), AAH04148.1, AAH04987.1, and AAH15947.1. The human KCNS3 mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. AF043472.1 (SEQ ID NO:70), AK075088.1, AK225833.1, BC004148.2, BC004987.1, BC015947.2, and CR615536.1. One skilled in the art will appreciate that KCNS3 is also known as KV9.3 and MGC9481.
The human proteasome activator subunit 2 (PSME2) polypeptide sequence is set forth in, e.g., Genbank Accession No. AAX11425.1 (SEQ ID NO:71), AAH04368.1, AAH19885.1, AAH72025.1, CAD61943.1, CAG46458.1, CAG46543.1, and BAA08205.1. The human PSME2 mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. AY771595.1 (SEQ ID NO:72), AK026580.1, AK225876.1, AY771595.1, BC004368.1, BC072025.1, BX161498.1, CR541657, CR541743.1, CR594185.1, CR600073.1, CR601043.1, CR615548.1, CR618033.1, CR620148.1, D45258.1. One skilled in the art will appreciate that PSME2 is also known as PA28B, REGbeta, and PA28beta.
The human membrane-spanning 4-domains, subfamily A, member 4 (MS4A4A) polypeptide sequence is set forth in, e.g., Genbank Accession No. BAB18738.1 (SEQ ID NO:73), BAB61018.1, AAF65507.1, AAK37594.1, AAL56220.1, AAL08486.1, BAC11389.1, BAF84778.1, AAH20648.1. The human MS4A4A mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. AB013102.1 (SEQ ID NO:74), AB002821.1, AF068288.1, AF237912.1, AF350500.1, AF354928.1, AK075081.1, AK292089.1, BC020648.1, CR605689.1, CR622830.1. One skilled in the art will appreciate that MS4A4A is also known as MS4A4, MS4A7, 4SPAN1, CD20L1, CD20-L1, HDCME31P, and MGC22311.
The human helicase, lymphoid-specific (HELLS) polypeptide sequence is set forth in, e.g., Genbank Accession No. BAE45737.1 (SEQ ID NO:75), BAD10844.1, BAD10845.1, BAD10846.1, BAD10847.1, BAD10848.1, BAD10849.1, BAD10850.1, BAD10851.1, BAD24804.1, BAD24805.1, AAF82262.1, BAA91550.1, AAG01987.1, AAH15477.1, AAH29381.1, AAH30963.1, AAH31004.1, AAI05607.1, and CAD97978.1. The human HELLS mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. AB074174.1 (SEQ ID NO:76), AB102716.1, AB102717.1, AB102718.1, AB102719.1, AB102720.1, AB102721.1, AB102722.1, AB102723.1, AB113248.1, AB113249.1, AF155827.1, AK001201.1, AK022928.1, AY007108.1, BC015477.1, BC029381.1, BC030963.1, BC031004.1, BC068440.1, BC105606.1, BC111789.1, and BX538033.1. One skilled in the art will appreciate that HELLS is also known as LSH, PASG, SMARCA6, FLJ10339, and Nbla10143.
The human caspase-1 dominant-negative inhibitor pseudo-ICE (COP1) polypeptide sequence is set forth in, e.g., Genbank Accession No. AAK71682.1 (SEQ ID NO:77), AAW78563.1, AAI17479.1, and AAI17481.1. The human COP1 mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. AF367017.1 (SEQ ID NO:78), AK125640.1, AY885669.1, BC033638.2, BC070196.1, BC104635.1, BC117478.1, and BC117480.1. One skilled in the art will appreciate that COP1 is also known as COP, and PSEUDO-ICE.
The human Fc fragment of IgG, low affinity IIa, receptor (FCGR2A) polypeptide sequence is set forth in, e.g., Genbank Accession No. AAL78867.1 (SEQ ID NO:79), AAH19931.1, AAH20823.1, AAA35932.1, AAA36050.1, AAA35827.1, and CAA68672.1. The human FCGR2A mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. AF416711.1 (SEQ ID NO:80), AI250177.1, AK225438.1, AK225601.1, AK226059.1, BC019931.1, BC020823.1, CR593871.1, CR624955.1, J03619.1, M28697.1, M31932.1, X62572.1, and Y00644.1. One skilled in the art will appreciate that FCGR2A is also known as CD32, FCG2, FcGR, CD32A, CDw32, FCGR2, IGFR2, FCGR2A1, MGC23887, and MGC30032.
The human replication factor C (activator 1) 4 (RFC4) polypeptide sequence is set forth in, e.g., Genbank Accession No. AAH17452.1 (SEQ ID NO:81), AAH24022.1, AAP35633.1, and CAG38798.1. The human RFC4 mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. BC017452.1 (SEQ ID NO:82), AA521171.1, BC024022.1, BM837975.1, BT006987.1, CR536561.1, CR594581.1, CR604460.1, CR608475.1, CR616552.1, and CR625223.1. One skilled in the art will appreciate that RFC4 is also known as A1, RFC37, and MGC27291.
The human minichromosome maintenance complex component 5 (MCM5) polypeptide sequence is set forth in, e.g., Genbank Accession No. BAD92849.1 (SEQ ID NO:83), BAD97043.1, BAF83825.1, AAH00142.1, AAH03656.1, CAG30403.1, BAA12176.1, and CAA52802.1. The human MCM5 mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. AB209612.1 (SEQ ID NO:84), AK223323.1, AK291136.1, BC000142.1, BC003656.2, CR456517.1, D83986.1, and X74795.2. One skilled in the art will appreciate that MCM5 is also known as CDC46, MGC5315, and P1-CDC46.
The human transporter 2, ATP-binding cassette, sub-family B (TAP2) polypeptide sequence is set forth in, e.g., Genbank Accession No. BAB71769.1 (SEQ ID NO:85), BAD92190.1, AAD31384.1, AAD12059.1, AAD32715.1, AAD50509.1, BAD96543.1, BAD97020.1, BAF85652.1, AAP88908.1, AAA58648.1, AAA58649.1, AAA59841.1, AAA79901.1, CAA80522.1, and CAA80523.1. The human TAP2 mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. AB073779.1 (SEQ ID NO:86), AB208953.1, AF078671, AF105151., AF152583.1, AF176984.1, AK222823.1, AK223300.1, AK292963.1, BT009906.1, L09191.1, L10287.1, M74447.1, U07844.1, Z22935.1, and Z22936.1. One skilled in the art will appreciate that TAP2 is also known as MDR/TAP, APT2, PSF2, ABC18, ABCB3, RING11, and D6S217E.
The human endoplasmic reticulum aminopeptidase 2 (ERAP2) polypeptide sequence is set forth in, e.g., Genbank Accession No. BAC78818.1 (SEQ ID NO:87), BAD90015.1, AAG28383.1, AAK37776.1, AAH17927.1, and AAH65240.1. The human ERAP2 mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. AB109031.1 (SEQ ID NO:88), AB163917.1, AF191545.1, AY028805.1, BC017927.2, and BC065240.1. One skilled in the art will appreciate that ERAP2 is also known as LRAP, L-RAP, FLJ23633, FLJ23701, and F1123807.
The human denticleless homolog (L2DTL) polypeptide sequence is set forth in, e.g., Genbank Accession No. AAF35182.1 (SEQ ID NO:89), AAK54706.1, BAA91355.1, BAA91552.1, BAA91586.1, BAB55267.1, BAF85032.1, AAH33297.1, AAH33540.1, and ABG23317.1. The human L2DTL mRNA (coding) sequence is set forth in, e.g., Genbank Accession No. AF195765.1 (SEQ ID NO:90), AF345896.1, AK000742.1, AK001206.1, AK001261.1, AK027651.1, AK292343.1, BC033297.1, BC033540.1, and DQ641253. One skilled in the art will appreciate that L2DTL is also known as CDT2, RAMP, DCAF2, and DTL.
A variety of classification markers are suitable for use in the methods, systems, and code of the present invention for classifying IBS into a category, form, or clinical subtype such as, for example, IBS-constipation (IBS-C), IBS-diarrhea (IBS-D), IBS-mixed (IBS-M), IBS-alternating (IBS-A), or post-infectious IBS (IBS-PI). Examples of classification markers include, without limitation, any of the diagnostic markers described above (e.g., leptin, serotonin reuptake transporter (SERT), tryptophan hydroxylase-1,5-hydroxytryptamine (5-HT), and the like), as well as antrum mucosal protein 8, keratin-8, claudin-8, zonulin, corticotropin-releasing hormone receptor-1 (CRHR1), corticotropin-releasing hormone receptor-2 (CRHR2), and the like.
For instance, Example 1 illustrates that measuring leptin levels is particularly useful for distinguishing IBS-C patient samples from IBS-A and IBS-D patient samples. In addition, mucosal SERT and tryptophan hydroxylase-1 expression have been shown to be decreased in IBS-C and IBS-D (see, e.g., Gershon, J. Clin. Gastroenterol., 39(5 Suppl):S184-193 (2005)). Furthermore, IBS-C patients show impaired postprandial 5-HT release, whereas IBS-PI patients have higher peak levels of 5-HT (see, e.g., Dunlop, Clin Gastroenterol Hepatol., 3:349-357 (2005)).
Any of a variety of assays, techniques, and kits known in the art can be used to determine the presence or level of one or more markers in a sample to classify whether the sample is associated with IBS.
The present invention relies, in part, on determining the presence or level of at least one marker in a sample obtained from an individual. As used herein, the term “determining the presence of at least one marker” includes determining the presence of each marker of interest by using any quantitative or qualitative assay known to one of skill in the art. In certain instances, qualitative assays that determine the presence or absence of a particular trait, variable, or biochemical or serological substance (e.g., protein or antibody) are suitable for detecting each marker of interest. In certain other instances, quantitative assays that determine the presence or absence of RNA, protein, antibody, or activity are suitable for detecting each marker of interest. As used herein, the term “determining the level of at least one marker” includes determining the level of each marker of interest by using any direct or indirect quantitative assay known to one of skill in the art. In certain instances, quantitative assays that determine, for example, the relative or absolute amount of RNA, protein, antibody, or activity are suitable for determining the level of each marker of interest. One skilled in the art will appreciate that any assay useful for determining the level of a marker is also useful for determining the presence or absence of the marker.
As used herein, the term “antibody” includes a population of immunoglobulin molecules, which can be polyclonal or monoclonal and of any isotype, or an immunologically active fragment of an immunoglobulin molecule. Such an immunologically active fragment contains the heavy and light chain variable regions, which make up the portion of the antibody molecule that specifically binds an antigen. For example, an immunologically active fragment of an immunoglobulin molecule known in the art as Fab, Fab′ or F(ab′)2 is included within the meaning of the term antibody.
Flow cytometry can be used to determine the presence or level of one or more markers in a sample. Such flow cytometric assays, including bead based immunoassays, can be used to determine, e.g., antibody marker levels in the same manner as described for detecting serum antibodies to Candida albicans and HIV proteins (see, e.g., Bishop and Davis, J. Immunol. Methods, 210:79-87 (1997); McHugh et al., J. Immunol. Methods, 116:213 (1989); Scillian et al., Blood, 73:2041 (1989)).
Phage display technology for expressing a recombinant antigen specific for a marker can also be used to determine the presence or level of one or more markers in a sample. Phage particles expressing an antigen specific for, e.g., an antibody marker can be anchored, if desired, to a multi-well plate using an antibody such as an anti-phage monoclonal antibody (Felici et al., “Phage-Displayed Peptides as Tools for Characterization of Human Sera” in Abelson (Ed.), Methods in Enzymol., 267, San Diego: Academic Press, Inc. (1996)).
A variety of immunoassay techniques, including competitive and non-competitive immunoassays, can be used to determine the presence or level of one or more markers in a sample (see, e.g., Self and Cook, Curr. Opin. Biotechnol., 7:60-65 (1996)). The teen immunoassay encompasses techniques including, without limitation, enzyme immunoassays (EIA) such as enzyme multiplied immunoassay technique (EMIT), enzyme-linked immunosorbent assay (ELISA), antigen capture ELISA, sandwich ELISA, IgM antibody capture ELISA (MAC ELISA), and microparticle enzyme immunoassay (MEIA); capillary electrophoresis immunoassays (CEIA); radioimmunoassays (RIA); immunoradiometric assays (IRMA); fluorescence polarization immunoassays (FPIA); and chemiluminescence assays (CL). If desired, such immunoassays can be automated. Immunoassays can also be used in conjunction with laser induced fluorescence (see, e.g., Schmalzing and Nashabeh, Electrophoresis, 18:2184-2193 (1997); Bao, J. Chromatogr. B. Biomed. Sci., 699:463-480 (1997)). Liposome immunoassays, such as flow-injection liposome immunoassays and liposome immunosensors, are also suitable for use in the present invention (see, e.g., Rongen et al., J. Immunol. Methods, 204:105-133 (1997)). In addition, nephelometry assays, in which the formation of protein/antibody complexes results in increased light scatter that is converted to a peak rate signal as a function of the marker concentration, are suitable for use in the present invention. Nephelometry assays are commercially available from Beckman Coulter (Brea, Calif.; Kit #449430) and can be performed using a Behring Nephelometer Analyzer (Fink et al., J. Clin. Chem. Clin. Biol. Chem., 27:261-276 (1989)).
Antigen capture ELISA can be useful for determining the presence or level of one or more markers in a sample. For example, in an antigen capture ELISA, an antibody directed to a marker of interest is bound to a solid phase and sample is added such that the marker is bound by the antibody. After unbound proteins are removed by washing, the amount of bound marker can be quantitated using, e.g., a radioimmunoassay (see, e.g., Harlow and Lane, Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory, New York, 1988)). Sandwich ELISA can also be suitable for use in the present invention. For example, in a two-antibody sandwich assay, a first antibody is bound to a solid support, and the marker of interest is allowed to bind to the first antibody. The amount of the marker is quantitated by measuring the amount of a second antibody that binds the marker. The antibodies can be immobilized onto a variety of solid supports, such as magnetic or chromatographic matrix particles, the surface of an assay plate (e.g., microtiter wells), pieces of a solid substrate material or membrane (e.g., plastic, nylon, paper), and the like. An assay strip can be prepared by coating the antibody or a plurality of antibodies in an array on a solid support. This strip can then be dipped into the test sample and processed quickly through washes and detection steps to generate a measurable signal, such as a colored spot.
A radioimmunoassay using, for example, an iodine-125 (125I) labeled secondary antibody (Harlow and Lane, supra) is also suitable for determining the presence or level of one or more markers in a sample. A secondary antibody labeled with a chemiluminescent marker can also be suitable for use in the present invention. A chemiluminescence assay using a chemiluminescent secondary antibody is suitable for sensitive, non-radioactive detection of marker levels. Such secondary antibodies can be obtained commercially from various sources, e.g., Amersham Lifesciences, Inc. (Arlington Heights, Ill.).
The immunoassays described above are particularly useful for determining the presence or level of one or more markers in a sample. As a non-limiting example, an ELISA using an IL-8-binding molecule such as an anti-IL-8 antibody or an extracellular IL-8-binding protein (e.g., IL-8 receptor) is useful for determining whether a sample is positive for IL-8 protein or for determining IL-8 protein levels in a sample. A fixed neutrophil ELISA is useful for determining whether a sample is positive for ANCA or for determining ANCA levels in a sample. Similarly, an ELISA using yeast cell wall phosphopeptidomannan is useful for determining whether a sample is positive for ASCA-IgA and/or ASCA-IgG, or for determining ASCA-IgA and/or ASCA-IgG levels in a sample. An ELISA using OmpC protein or a fragment thereof is useful for determining whether a sample is positive for anti-OmpC antibodies, or for determining anti-OmpC antibody levels in a sample. An ELISA using I2 protein or a fragment thereof is useful for determining whether a sample is positive for anti-I2 antibodies, or for determining anti-I2 antibody levels in a sample. An ELISA using flagellin protein (e.g., Cbir-1 flagellin) or a fragment thereof is useful for determining whether a sample is positive for anti-flagellin antibodies, or for determining anti-flagellin antibody levels in a sample. In addition, the immunoassays described above are particularly useful for determining the presence or level of other diagnostic markers in a sample.
Specific immunological binding of the antibody to the marker of interest can be detected directly or indirectly. Direct labels include fluorescent or luminescent tags, metals, dyes, radionuclides, and the like, attached to the antibody. An antibody labeled with iodine-125 (125I) can be used for determining the levels of one or more markers in a sample. A chemiluminescence assay using a chemiluminescent antibody specific for the marker is suitable for sensitive, non-radioactive detection of marker levels. An antibody labeled with fluorochrome is also suitable for determining the levels of one or more markers in a sample. Examples of fluorochromes include, without limitation, DAPI, fluorescein, Hoechst 33258, R-phycocyanin, B-phycoerythrin, R-phycoerythrin, rhodamine, Texas red, and lissamine. Secondary antibodies linked to fluorochromes can be obtained commercially, e.g., goat F(ab′)2 anti-human IgG-FITC is available from Tago Immunologicals (Burlingame, Calif.).
Indirect labels include various enzymes well-known in the art, such as horseradish peroxidase (HRP), alkaline phosphatase (AP), β-galactosidase, urease, and the like. A horseradish-peroxidase detection system can be used, for example, with the chromogenic substrate tetramethylbenzidine (TMB), which yields a soluble product in the presence of hydrogen peroxide that is detectable at 450 nm. An alkaline phosphatase detection system can be used with the chromogenic substrate p-nitrophenyl phosphate, for example, which yields a soluble product readily detectable at 405 nm. Similarly, a β-galactosidase detection system can be used with the chromogenic substrate o-nitrophenyl-β-D-galactopyranoside (ONPG), which yields a soluble product detectable at 410 nm. An urease detection system can be used with a substrate such as urea-bromocresol purple (Sigma Immunochemicals; St. Louis, Mo.). A useful secondary antibody linked to an enzyme can be obtained from a number of commercial sources, e.g., goat F(ab′)2 anti-human IgG-alkaline phosphatase can be purchased from Jackson ImmunoResearch (West Grove, Pa.).
A signal from the direct or indirect label can be analyzed, for example, using a spectrophotometer to detect color from a chromogenic substrate; a radiation counter to detect radiation such as a gamma counter for detection of 125I; or a fluorometer to detect fluorescence in the presence of light of a certain wavelength. For detection of enzyme-linked antibodies, a quantitative analysis of the amount of marker levels can be made using a spectrophotometer such as an EMAX Microplate Reader (Molecular Devices; Menlo Park, Calif.) in accordance with the manufacturer's instructions. If desired, the assays of the present invention can be automated or performed robotically, and the signal from multiple samples can be detected simultaneously.
Quantitative western blotting can also be used to detect or determine the presence or level of one or more markers in a sample. Western blots can be quantitated by well-known methods such as scanning densitometry or phosphorimaging. As a non-limiting example, protein samples are electrophoresed on 10% SDS-PAGE Laemmli gels. Primary murine monoclonal antibodies are reacted with the blot, and antibody binding can be confirmed to be linear using a preliminary slot blot experiment. Goat anti-mouse horseradish peroxidase-coupled antibodies (BioRad) are used as the secondary antibody, and signal detection performed using chemiluminescence, for example, with the Renaissance chemiluminescence kit (New England Nuclear; Boston, Mass.) according to the manufacturer's instructions. Autoradiographs of the blots are analyzed using a scanning densitometer (Molecular Dynamics; Sunnyvale, Calif.) and normalized to a positive control. Values are reported, for example, as a ratio between the actual value to the positive control (densitometric index). Such methods are well known in the art as described, for example, in Parra et al., J. Vasc. Surg., 28:669-675 (1998).
Alternatively, a variety of immunohistochemical assay techniques can be used to determine the presence or level of one or more markers in a sample. The term immunohistochemical assay encompasses techniques that utilize the visual detection of fluorescent dyes or enzymes coupled (i.e., conjugated) to antibodies that react with the marker of interest using fluorescent microscopy or light microscopy and includes, without limitation, direct fluorescent antibody assay, indirect fluorescent antibody (IFA) assay, anticomplement immunofluorescence, avidin-biotin immunofluorescence, and immunoperoxidase assays. An IFA assay, for example, is useful for determining whether a sample is positive for ANCA, the level of ANCA in a sample, whether a sample is positive for pANCA, the level of pANCA in a sample, and/or an ANCA staining pattern (e.g., cANCA, pANCA, NSNA, and/or SAPPA staining pattern). The concentration of ANCA in a sample can be quantitated, e.g., through endpoint titration or through measuring the visual intensity of fluorescence compared to a known reference standard.
Alternatively, the presence or level of a marker of interest can be determined by detecting or quantifying the amount of the purified marker. Purification of the marker can be achieved, for example, by high pressure liquid chromatography (HPLC), alone or in combination with mass spectrometry (e.g., MALDI/MS, MALDI-TOF/MS, SELDI-TOF/MS, tandem MS, etc.). Qualitative or quantitative detection of a marker of interest can also be determined by well-known methods including, without limitation, Bradford assays, Coomassie blue staining, silver staining, assays for radiolabeled protein, and mass spectrometry.
The analysis of a plurality of markers may be carried out separately or simultaneously with one test sample. For separate or sequential assay of markers, suitable apparatuses include clinical laboratory analyzers such as the ElecSys (Roche), the AxSym (Abbott), the Access (Beckman), the ADVIA®, the CENTAUR® (Bayer), and the NICHOLS ADVANTAGE® (Nichols Institute) immunoassay systems. Preferred apparatuses or protein chips perform simultaneous assays of a plurality of markers on a single surface. Particularly useful physical formats comprise surfaces having a plurality of discrete, addressable locations for the detection of a plurality of different markers. Such formats include protein microarrays, or “protein chips” (see, e.g., Ng et al., J. Cell Mol. Med., 6:329-340 (2002)) and certain capillary devices (see, e.g., U.S. Pat. No. 6,019,944). In these embodiments, each discrete surface location may comprise antibodies to immobilize one or more markers for detection at each location. Surfaces may alternatively comprise one or more discrete particles (e.g., microparticles or nanoparticles) immobilized at discrete locations of a surface, where the microparticles comprise antibodies to immobilize one or more markers for detection.
In addition to the above-described assays for determining the presence or level of various markers of interest, analysis of marker mRNA levels using routine techniques such as Northern analysis, reverse-transcriptase polymerase chain reaction (RT-PCR), or any other methods based on hybridization to a nucleic acid sequence that is complementary to a portion of the marker coding sequence (e.g., slot blot hybridization) are also within the scope of the present invention. Applicable PCR amplification techniques are described in, e.g., Ausubel et al., Current Protocols in Molecular Biology, John Wiley & Sons, Inc. New York (1999), Chapter 7 and Supplement 47; Theophilus et al., “PCR Mutation Detection Protocols,” Humana Press, (2002); and Innis et al., PCR Protocols, San Diego, Academic Press, Inc. (1990). General nucleic acid hybridization methods are described in Anderson, “Nucleic Acid Hybridization,” BIOS Scientific Publishers, 1999. Amplification or hybridization of a plurality of transcribed nucleic acid sequences (e.g., mRNA or cDNA) can also be performed from mRNA or cDNA sequences arranged in a microarray. Microarray methods are generally described in Hardiman, “Microarrays Methods and Applications: Nuts & Bolts,” DNA Press, 2003; and Baldi et al., “DNA Microarrays and Gene Expression: From Experiments to Data Analysis and Modeling,” Cambridge University Press, 2002.
Analysis of the genotype of a marker such as a genetic marker can be performed using techniques known in the art including, without limitation, polymerase chain reaction (PCR)-based analysis, sequence analysis, and electrophoretic analysis. A non-limiting example of a PCR-based analysis includes a Taqman® allelic discrimination assay available from Applied Biosystems. Non-limiting examples of sequence analysis include Maxam-Gilbert sequencing, Sanger sequencing, capillary array DNA sequencing, thermal cycle sequencing (Sears et al., Biotechniques, 13:626-633 (1992)), solid-phase sequencing (Zimmerman et al., Methods Mol. Cell Biol., 3:39-42 (1992)), sequencing with mass spectrometry such as matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF/MS; Fu et al., Nature Biotech., 16:381-384 (1998)), and sequencing by hybridization (Chee et al., Science, 274:610-614 (1996); Drmanac et al., Science, 260:1649-1652 (1993); Drmanac et al., Nature Biotech., 16:54-58 (1998)). Non-limiting examples of electrophoretic analysis include slab gel electrophoresis such as agarose or polyacrylamide gel electrophoresis, capillary electrophoresis, and denaturing gradient gel electrophoresis. Other methods for genotyping an individual at a polymorphic site in a marker include, e.g., the INVADER® assay from Third Wave Technologies, Inc., restriction fragment length polymorphism (RFLP) analysis, allele-specific oligonucleotide hybridization, a heteroduplex mobility assay, and single strand conformational polymorphism (SSCP) analysis.
Several markers of interest may be combined into one test for efficient processing of a multiple of samples. In addition, one skilled in the art would recognize the value of testing multiple samples (e.g., at successive time points, etc.) from the same subject. Such testing of serial samples can allow the identification of changes in marker levels over time. Increases or decreases in marker levels, as well as the absence of change in marker levels, can also provide useful information to classify IBS or to rule out diseases and disorders associated with IBS-like symptoms.
A panel for measuring one or more of the markers described above may be constructed to provide relevant information related to the approach of the present invention for classifying a sample as being associated with IBS. Such a panel may be constructed to determine the presence or level of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, or more individual markers. The analysis of a single marker or subsets of markers can also be carried out by one skilled in the art in various clinical settings. These include, but are not limited to, ambulatory, urgent care, critical care, intensive care, monitoring unit, inpatient, outpatient, physician office, medical clinic, and health screening settings.
The analysis of markers could be carried out in a variety of physical formats as well. For example, the use of microtiter plates or automation could be used to facilitate the processing of large numbers of test samples. Alternatively, single sample formats could be developed to facilitate treatment and diagnosis in a timely fashion.
In some aspects, the present invention provides methods, systems, and code for classifying whether a sample is associated with IBS by applying a statistical algorithm or process to classify the sample as an IBS sample or non-IBS sample. In other aspects, the present invention provides methods, systems, and code for classifying whether a sample is associated with IBS by applying a first statistical algorithm or process to classify the sample as a non-IBD sample or IBD sample (i.e., IBD rule-out step), followed by a second statistical algorithm or process to classify the non-IBD sample as an IBS sample or non-IBS sample (i.e., IBS rule-in step). Preferably, the statistical algorithms or processes independently comprise one or more learning statistical classifier systems. As described herein, a single learning statistical classifier system or a combination thereof advantageously provides improved sensitivity, specificity, negative predictive value, positive predictive value, and/or overall accuracy for classifying whether a sample is associated with IBS.
The term “statistical algorithm” or “statistical process” includes any of a variety of statistical analyses used to determine relationships between variables. In the present invention, the variables are the presence or level of at least one marker of interest and/or the presence or severity of at least one IBS-related symptom. Any number of markers and/or symptoms can be analyzed by applying a statistical algorithm described herein. For example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, or more biomarkers and/or symptoms can be included in a statistical algorithm. In one embodiment, logistic regression is applied. In another embodiment, linear regression is applied. In certain instances, the statistical algorithms of the present invention can apply a quantile measurement of a particular marker within a given population as a variable. Quantiles are a set of “cut points” that divide a sample of data into groups containing (as far as possible) equal numbers of observations. For example, quartiles are values that divide a sample of data into four groups containing (as far as possible) equal numbers of observations. The lower quartile is the data value a quarter way up through the ordered data set; the upper quartile is the data value a quarter way down through the ordered data set. Quintiles are values that divide a sample of data into five groups containing (as far as possible) equal numbers of observations. The present invention can also include the application of percentile ranges of marker levels (e.g., tertiles, quartile, quintiles, etc.), or their cumulative indices (e.g., quartile sums of marker levels, etc.) as variables in the algorithms (just as with continuous variables).
Preferably, the statistical algorithms of the present invention comprise one or more learning statistical classifier systems. As used herein, the term “learning statistical classifier system” includes a machine learning algorithmic technique capable of adapting to complex data sets (e.g., panel of markers of interest and/or list of IBS-related symptoms) and making decisions based upon such data sets. In some embodiments, a single learning statistical classifier system such as a classification tree (e.g., random forest) is applied. In other embodiments, a combination of 2, 3, 4, 5, 6, 7, 8, 9, 10, or more learning statistical classifier systems are applied, preferably in tandem. Examples of learning statistical classifier systems include, but are not limited to, those using inductive learning (e.g., decision/classification trees such as random forests, classification and regression trees (C&RT), boosted trees, etc.), Probably Approximately Correct (PAC) learning, connectionist learning (e.g., neural networks (NN), artificial neural networks (ANN), neuro fuzzy networks (NFN), network structures, perceptrons such as multi-layer perceptrons, multi-layer feed-forward networks, applications of neural networks, Bayesian learning in belief networks, etc.), reinforcement learning (e.g., passive learning in a known environment such as naïve learning, adaptive dynamic learning, and temporal difference learning, passive learning in an unknown environment, active learning in an unknown environment, learning action-value functions, applications of reinforcement learning, etc.), and genetic algorithms and evolutionary programming. Other learning statistical classifier systems include support vector machines (e.g., Kernel methods), multivariate adaptive regression splines (MARS), Levenberg-Marquardt algorithms, Gauss-Newton algorithms, mixtures of Gaussians, gradient descent algorithms, and learning vector quantization (LVQ).
Random forests are learning statistical classifier systems that are constructed using an algorithm developed by Leo Breiman and Adele Cutler. Random forests use a large number of individual decision trees and decide the class by choosing the mode (i.e., most frequently occurring) of the classes as determined by the individual trees. Random forest analysis can be performed, e.g., using the RandomForests software available from Salford Systems (San Diego, Calif.). See, e.g., Breiman, Machine Learning, 45:5-32 (2001); and http://stat-www.berkeley.edu/users/breiman/RandomForests/cc_home.htm, for a description of random forests.
Classification and regression trees represent a computer intensive alternative to fitting classical regression models and are typically used to determine the best possible model for a categorical or continuous response of interest based upon one or more predictors. Classification and regression tree analysis can be performed, e.g., using the CART software available from Salford Systems or the Statistica data analysis software available from StatSoft, Inc. (Tulsa, OK). A description of classification and regression trees is found, e.g., in Breiman et al. “Classification and Regression Trees,” Chapman and Hall, New York (1984); and Steinberg et al., “CART: Tree-Structured Non-Parametric Data Analysis,” Salford Systems, San Diego, (1995).
Neural networks are interconnected groups of artificial neurons that use a mathematical or computational model for information processing based on a connectionist approach to computation. Typically, neural networks are adaptive systems that change their structure based on external or internal information that flows through the network. Specific examples of neural networks include feed-forward neural networks such as perceptrons, single-layer perceptrons, multi-layer perceptrons, backpropagation networks, ADALINE networks, MADALINE networks, Learnmatrix networks, radial basis function (RBF) networks, and self-organizing maps or Kohonen self-organizing networks; recurrent neural networks such as simple recurrent networks and Hopfield networks; stochastic neural networks such as Boltzmann machines; modular neural networks such as committee of machines and associative neural networks; and other types of networks such as instantaneously trained neural networks, spiking neural networks, dynamic neural networks, and cascading neural networks. Neural network analysis can be performed, e.g., using the Statistica data analysis software available from StatSoft, Inc. See, e.g., Freeman et al., In “Neural Networks: Algorithms, Applications and Programming Techniques,” Addison-Wesley Publishing Company (1991); Zadeh, Information and Control, 8:338-353 (1965); Zadeh, “IEEE Trans. on Systems, Man and Cybernetics,” 3:28-44 (1973); Gersho et al., In “Vector Quantization and Signal Compression,” Kluywer Academic Publishers, Boston, Dordrecht, London (1992); and Hassoun, “Fundamentals of Artificial Neural Networks,” MIT Press, Cambridge, Mass., London (1995), for a description of neural networks.
Support vector machines are a set of related supervised learning techniques used for classification and regression and are described, e.g., in Cristianini et al., “An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods,” Cambridge University Press (2000). Support vector machine analysis can be performed, e.g., using the SVMlight software developed by Thorsten Joachims (Cornell University) or using the LIBSVM software developed by Chih-Chung Chang and Chih-Jen Lin (National Taiwan University).
The learning statistical classifier systems described herein can be trained and tested using a cohort of samples (e.g., serological samples) from healthy individuals, IBS patients, IBD patients, and/or Celiac disease patients. For example, samples from patients diagnosed by a physician, and preferably by a gastroenterologist as having IBD using a biopsy, colonoscopy, or an immunoassay as described in, e.g., U.S. Pat. No. 6,218,129, are suitable for use in training and testing the learning statistical classifier systems of the present invention. Samples from patients diagnosed with IBD can also be stratified into Crohn's disease or ulcerative colitis using an immunoassay as described in, e.g., U.S. Pat. Nos. 5,750,355 and 5,830,675. Samples from patients diagnosed with IBS using a published criteria such as the Manning, Rome I, Rome II, or Rome III diagnostic criteria are suitable for use in training and testing the learning statistical classifier systems of the present invention. Samples from healthy individuals can include those that were not identified as IBD and/or IBS samples. One skilled in the art will know of additional techniques and diagnostic criteria for obtaining a cohort of patient samples that can be used in training and testing the learning statistical classifier systems of the present invention.
As used herein, the term “sensitivity” refers to the probability that a diagnostic method, system, or code of the present invention gives a positive result when the sample is positive, e.g., having IBS. Sensitivity is calculated as the number of true positive results divided by the sum of the true positives and false negatives. Sensitivity essentially is a measure of how well a method, system, or code of the present invention correctly identifies those with IBS from those without the disease. The statistical algorithms can be selected such that the sensitivity of classifying IBS is at least about 40%, and can be, for example, at least about 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%. In preferred embodiments, the sensitivity of classifying IBS is at least about 50% when a single learning statistical classifier system is used (see, Example 16).
The term “specificity” refers to the probability that a diagnostic method, system, or code of the present invention gives a negative result when the sample is not positive, e.g., not having IBS. Specificity is calculated as the number of true negative results divided by the sum of the true negatives and false positives. Specificity essentially is a measure of how well a method, system, or code of the present invention excludes those who do not have IBS from those who have the disease. The statistical algorithms can be selected such that the specificity of classifying IBS is at least about 40%, for example, at least about 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%. In preferred embodiments, the specificity of classifying IBS is at least about 88% when a single learning statistical classifier system is used (see, Example 16).
As used herein, the term “negative predictive value” or “NPV” refers to the probability that an individual identified as not having IBS actually does not have the disease. Negative predictive value can be calculated as the number of true negatives divided by the sum of the true negatives and false negatives. Negative predictive value is determined by the characteristics of the diagnostic method, system, or code as well as the prevalence of the disease in the population analyzed. The statistical algorithms can be selected such that the negative predictive value in a population having a disease prevalence is in the range of about 40% to about 99% and can be, for example, at least about 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%. In preferred embodiments, the negative predictive value (NPV) of classifying IBS is at least about 64% when a single learning statistical classifier system is used (see, Example 16).
The term “positive predictive value” or “PPV” refers to the probability that an individual identified as having IBS actually has the disease. Positive predictive value can be calculated as the number of true positives divided by the sum of the true positives and false positives. Positive predictive value is determined by the characteristics of the diagnostic method, system, or code as well as the prevalence of the disease in the population analyzed. The statistical algorithms can be selected such that the positive predictive value in a population having a disease prevalence is in the range of about 40% to about 99% and can be, for example, at least about 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%. In preferred embodiments, the positive predictive value (PPV) of classifying IBS is at least about 81% when a single learning statistical classifier system is used (see, Example 16).
Predictive values, including negative and positive predictive values, are influenced by the prevalence of the disease in the population analyzed. In the methods, systems, and code of the present invention, the statistical algorithms can be selected to produce a desired clinical parameter for a clinical population with a particular IBS prevalence. For example, learning statistical classifier systems can be selected for an IBS prevalence of up to about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, or 70%, which can be seen, e.g., in a clinician's office such as a gastroenterologist's office or a general practitioner's office.
As used herein, the term “overall agreement” or “overall accuracy” refers to the accuracy with which a method, system, or code of the present invention classifies a disease state. Overall accuracy is calculated as the sum of the true positives and true negatives divided by the total number of sample results and is affected by the prevalence of the disease in the population analyzed. For example, the statistical algorithms can be selected such that the overall accuracy in a patient population having a disease prevalence is at least about 40%, and can be, for example, at least about 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%. In preferred embodiments, the overall accuracy of classifying IBS is at least about 70% when a single learning statistical classifier system is used (see, Example 16).
The network can be a LAN (local area network), WAN (wide area network), wireless network, point-to-point network, star network, token ring network, hub network, or other configuration. As the most common type of network in current use is a TCP/IP (Transfer Control Protocol and Internet Protocol) network such as the global internetwork of networks often referred to as the “Internet” with a capital “I,” that will be used in many of the examples herein, but it should be understood that the networks that the present invention might use are not so limited, although TCP/IP is the currently preferred protocol.
Several elements in the system shown in
According to one embodiment, each client system and all of its components are operator configurable using applications, such as a browser, including computer code run using a central processing unit such as an Intel® Pentium® processor or the like. Similarly, the intelligence module and all of its components might be operator configurable using application(s) including computer code run using a central processing unit (215) such as an Intel Pentium processor or the like, or multiple processor units. Computer code for operating and configuring the intelligence module to process data and test results as described herein is preferably downloaded and stored on a hard disk, but the entire program code, or portions thereof, may also be stored in any other volatile or non-volatile memory medium or device as is well known, such as a ROM or RAM, or provided on any other computer readable medium (260) capable of storing program code, such as a compact disk (CD) medium, digital versatile disk (DVD) medium, a floppy disk, ROM, RAM, and the like.
The computer code for implementing various aspects and embodiments of the present invention can be implemented in any programming language that can be executed on a computer system such as, for example, in C, C++, C#, HTML, Java, JavaScript, or any other scripting language, such as VBScript. Additionally, the entire program code, or portions thereof, may be embodied as a carrier signal, which may be transmitted and downloaded from a software source (e.g., server) over the Internet, or over any other conventional network connection as is well known (e.g., extranet, VPN, LAN, etc.) using any communication medium and protocols (e.g., TCP/I P, HTTP, HTTPS, Ethernet, etc.) as are well known.
According to one embodiment, the intelligence module implements a disease classification process for analyzing patient test results and/or questionnaire responses to determine whether a patient sample is associated with IBS. The data may be stored in one or more data tables or other logical data structures in memory (210) or in a separate storage or database system coupled with the intelligence module. One or more statistical processes are typically applied to a data set including test data for a particular patient. For example, the test data might include a diagnostic marker profile, which comprises data indicating the presence or level of at least one marker in a sample from the patient. The test data might also include a symptom profile, which comprises data indicating the presence or severity of at least one symptom associated with IBS that the patient is experiencing or has recently experienced. In one aspect, a statistical process produces a statistically derived decision classifying the patient sample as an IBS sample or non-IBS sample based upon the diagnostic marker profile and/or symptom profile. In another aspect, a first statistical process produces a first statistically derived decision classifying the patient sample as an IBD sample or non-IBD sample based upon the diagnostic marker profile and/or symptom profile. If the patient sample is classified as a non-IBD sample, a second statistical process is applied to the same or a different data set to produce a second statistically derived decision classifying the non-IBD sample as an IBS sample or non-IBS sample. The first and/or the second statistically derived decision may be displayed on a display device associated with or coupled to the intelligence module, or the decision(s) may be provided to and displayed at a separate system, e.g., a client system (230). The displayed results allow a physician to make a reasoned diagnosis or prognosis.
Once a sample from an individual has been classified as an IBS sample, the methods, systems, and code of the present invention can further comprise administering to the individual a therapeutically effective amount of a drug useful for treating one or more symptoms associated with IBS (i.e., an IBS drug). For therapeutic applications, the IBS drug can be administered alone or co-administered in combination with one or more additional IBS drugs and/or one or more drugs that reduce the side-effects associated with the IBS drug.
IBS drugs can be administered with a suitable pharmaceutical excipient as necessary and can be carried out via any of the accepted modes of administration. Thus, administration can be, for example, intravenous, topical, subcutaneous, transcutaneous, transdermal, intramuscular, oral, buccal, sublingual, gingival, palatal, intra-joint, parenteral, intra-arteriole, intradermal, intraventricular, intracranial, intraperitoneal, intralesional, intranasal, rectal, vaginal, or by inhalation. By “co-administer” it is meant that an IBS drug is administered at the same time, just prior to, or just after the administration of a second drug (e.g., another IBS drug, a drug useful for reducing the side-effects of the IBS drug, etc.).
A therapeutically effective amount of an IBS drug may be administered repeatedly, e.g., at least 2, 3, 4, 5, 6, 7, 8, or more times, or the dose may be administered by continuous infusion. The dose may take the form of solid, semi-solid, lyophilized powder, or liquid dosage forms, such as, for example, tablets, pills, pellets, capsules, powders, solutions, suspensions, emulsions, suppositories, retention enemas, creams, ointments, lotions, gels, aerosols, foams, or the like, preferably in unit dosage forms suitable for simple administration of precise dosages.
As used herein, the term “unit dosage form” refers to physically discrete units suitable as unitary dosages for human subjects and other mammals, each unit containing a predetermined quantity of an IBS drug calculated to produce the desired onset, tolerability, and/or therapeutic effects, in association with a suitable pharmaceutical excipient (e.g., an ampoule). In addition, more concentrated dosage forms may be prepared, from which the more dilute unit dosage forms may then be produced. The more concentrated dosage forms thus will contain substantially more than, e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more times the amount of the IBS drug.
Methods for preparing such dosage forms are known to those skilled in the art (see, e.g., R
Examples of suitable excipients include, but are not limited to, lactose, dextrose, sucrose, sorbitol, mannitol, starches, gum acacia, calcium phosphate, alginates, tragacanth, gelatin, calcium silicate, microcrystalline cellulose, polyvinylpyrrolidone, cellulose, water, saline, syrup, methylcellulose, ethylcellulose, hydroxypropylmethylcellulose, and polyacrylic acids such as Carbopols, e.g., Carbopol 941, Carbopol 980, Carbopol 981, etc. The dosage forms can additionally include lubricating agents such as talc, magnesium stearate, and mineral oil; wetting agents; emulsifying agents; suspending agents; preserving agents such as methyl-, ethyl-, and propyl-hydroxy-benzoates (i.e., the parabens); pH adjusting agents such as inorganic and organic acids and bases; sweetening agents; and flavoring agents. The dosage forms may also comprise biodegradable polymer beads, dextran, and cyclodextrin inclusion complexes.
For oral administration, the therapeutically effective dose can be in the form of tablets, capsules, emulsions, suspensions, solutions, syrups, sprays, lozenges, powders, and sustained-release formulations. Suitable excipients for oral administration include pharmaceutical grades of mannitol, lactose, starch, magnesium stearate, sodium saccharine, talcum, cellulose, glucose, gelatin, sucrose, magnesium carbonate, and the like.
In some embodiments, the therapeutically effective dose takes the form of a pill, tablet, or capsule, and thus, the dosage form can contain, along with an IBS drug, any of the following: a diluent such as lactose, sucrose, dicalcium phosphate, and the like; a disintegrant such as starch or derivatives thereof; a lubricant such as magnesium stearate and the like; and a binder such a starch, gum acacia, polyvinylpyrrolidone, gelatin, cellulose and derivatives thereof. An IBS drug can also be formulated into a suppository disposed, for example, in a polyethylene glycol (PEG) carrier.
Liquid dosage forms can be prepared by dissolving or dispersing an IBS drug and optionally one or more pharmaceutically acceptable adjuvants in a carrier such as, for example, aqueous saline (e.g., 0.9% w/v sodium chloride), aqueous dextrose, glycerol, ethanol, and the like, to form a solution or suspension, e.g., for oral, topical, or intravenous administration. An IBS drug can also be formulated into a retention enema.
For topical administration, the therapeutically effective dose can be in the form of emulsions, lotions, gels, foams, creams, jellies, solutions, suspensions, ointments, and transdermal patches. For administration by inhalation, an IBS drug can be delivered as a dry powder or in liquid form via a nebulizer. For parenteral administration, the therapeutically effective dose can be in the form of sterile injectable solutions and sterile packaged powders. Preferably, injectable solutions are formulated at a pH of from about 4.5 to about 7.5.
The therapeutically effective dose can also be provided in a lyophilized form. Such dosage forms may include a buffer, e.g., bicarbonate, for reconstitution prior to administration, or the buffer may be included in the lyophilized dosage form for reconstitution with, e.g., water. The lyophilized dosage form may further comprise a suitable vasoconstrictor, e.g., epinephrine. The lyophilized dosage form can be provided in a syringe, optionally packaged in combination with the buffer for reconstitution, such that the reconstituted dosage form can be immediately administered to an individual.
In therapeutic use for the treatment of IBS, an IBS drug can be administered at the initial dosage of from about 0.001 mg/kg to about 1000 mg/kg daily. A daily dose range of from about 0.01 mg/kg to about 500 mg/kg, from about 0.1 mg/kg to about 200 mg/kg, from about 1 mg/kg to about 100 mg/kg, or from about 10 mg/kg to about 50 mg/kg, can be used. The dosages, however, may be varied depending upon the requirements of the individual, the severity of IBS symptoms, and the IBS drug being employed. For example, dosages can be empirically determined considering the severity of IBS symptoms in an individual classified as having IBS according to the methods described herein. The dose administered to an individual, in the context of the present invention, should be sufficient to affect a beneficial therapeutic response in the individual over time. The size of the dose can also be determined by the existence, nature, and extent of any adverse side-effects that accompany the administration of a particular IBS drug in an individual. Determination of the proper dosage for a particular situation is within the skill of the practitioner. Generally, treatment is initiated with smaller dosages which are less than the optimum dose of the IBS drug. Thereafter, the dosage is increased by small increments until the optimum effect under circumstances is reached. For convenience, the total daily dosage may be divided and administered in portions during the day, if desired.
As used herein, the term “IBS drug” includes all pharmaceutically acceptable forms of a drug that is useful for treating one or more symptoms associated with IBS. For example, the IBS drug can be in a racemic or isomeric mixture, a solid complex bound to an ion exchange resin, or the like. In addition, the IBS drug can be in a solvated form. The term “IBS drug” is also intended to include all pharmaceutically acceptable salts, derivatives, and analogs of the IBS drug being described, as well as combinations thereof. For example, the pharmaceutically acceptable salts of an IBS drug include, without limitation, the tartrate, succinate, tartarate, bitartarate, dihydrochloride, salicylate, hemisuccinate, citrate, maleate, hydrochloride, carbamate, sulfate, nitrate, and benzoate salt forms thereof, as well as combinations thereof and the like. Any form of an IBS drug is suitable for use in the methods of the present invention, e.g., a pharmaceutically acceptable salt of an IBS drug, a free base of an IBS drug, or a mixture thereof.
Suitable drugs that are useful for treating one or more symptoms associated with IBS include, but are not limited to, serotonergic agents, antidepressants, chloride channel activators, chloride channel blockers, guanylate cyclase agonists, antibiotics, opioids, neurokinin antagonists, antispasmodic or anticholinergic agents, belladonna alkaloids, barbiturates, glucagon-like peptide-1 (GLP-1) analogs, corticotropin releasing factor (CRF) antagonists, probiotics, free bases thereof, pharmaceutically acceptable salts thereof, derivatives thereof, analogs thereof, and combinations thereof. Other IBS drugs include bulking agents, dopamine antagonists, carminatives, tranquilizers, dextofisopam, phenytoin, timolol, and diltiazem.
Serotonergic agents are useful for the treatment of IBS symptoms such as constipation, diarrhea, and/or alternating constipation and diarrhea. Non-limiting examples of serotonergic agents are described in Cash et al., Aliment. Pharmacol. Ther., 22:1047-1060 (2005), and include 5-HT3 receptor agonists (e.g., MKC-733, etc.), 5-HT4 receptor agonists (e.g., tegaserod (Zelnorm™), prucalopride, AG1-001, etc.), 5-HT3 receptor antagonists (e.g., alosetron (Lotronex®), cilansetron, ondansetron, granisetron, dolasetron, ramosetron, palonosetron, E-3620, DDP-225, DDP-733, etc.), mixed 5-HT3 receptor antagonists/5-HT4 receptor agonists (e.g., cisapride, mosapride, renzapride, etc.), free bases thereof, pharmaceutically acceptable salts thereof, derivatives thereof, analogs thereof, and combinations thereof. Additionally, amino acids like glutamine and glutamic acid which regulate intestinal permeability by affecting neuronal or glial cell signaling can be administered to treat patients with IBS.
Antidepressants such as selective serotonin reuptake inhibitor (SSRI) or tricyclic antidepressants are particularly useful for the treatment of IBS symptoms such as abdominal pain, constipation, and/or diarrhea. Non-limiting examples of SSRI antidepressants include citalopram, fluvoxamine, paroxetine, fluoxetine, sertraline, free bases thereof; pharmaceutically acceptable salts thereof, derivatives thereof, analogs thereof; and combinations thereof. Examples of tricyclic antidepressants include, but are not limited to, desipramine, nortriptyline, protriptyline, amitriptyline, clomipramine, doxepin, imipramine, trimipramine, maprotiline, amoxapine, clomipramine, free bases thereof, pharmaceutically acceptable salts thereof, derivatives thereof; analogs thereof; and combinations thereof.
Chloride channel activators are useful for the treatment of IBS symptoms such as constipation. A non-limiting example of a chloride channel activator is lubiprostone (Amitiza™), a free base thereof, a pharmaceutically acceptable salt thereof, a derivative thereof, or an analog thereof. In addition, chloride channel blockers such as crofelemer are useful for the treatment of IBS symptoms such as diarrhea. Guanylate cyclase agonists such as MD-1100 are useful for the treatment of constipation associated with IBS (see, e.g., Bryant et al., Gastroenterol., 128:A-257 (2005)). Antibiotics such as neomycin can also be suitable for use in treating constipation associated with IBS (see, e.g., Park et al., Gastroenterol., 128:A-258 (2005)). Non-absorbable antibiotics like rifaximin (Xifaxan™) are suitable to treat small bowel bacterial overgrowth and/or constipation associated with IBS (see, e.g., Sharara et al., Am. J. Gastroenterol., 101:326-333 (2006)).
Opioids such as kappa opiods (e.g., asimadoline) may be useful for treating pain and/or constipation associated with IBS. Neurokinin (NK) antagonists such as talnetant, saredutant, and other NK2 and/or NK3 antagonists may be useful for treating IBS symptoms such as oversensitivity of the muscles in the colon, constipation, and/or diarrhea. Antispasmodic or anticholinergic agents such as dicyclomine may be useful for treating IBS symptoms such as spasms in the muscles of the gut and bladder. Other antispasmodic or anticholinergic agents such as belladonna alkaloids (e.g., atropine, scopolamine, hyoscyamine, etc.) can be used in combination with barbiturates such as phenobarbital to reduce bowel spasms associated with IBS. GLP-1 analogs such as GTP-010 may be useful for treating IBS symptoms such as constipation. CRF antagonists such as astressin and probiotics such as VSL#3® may be useful for treating one or more IBS symptoms. One skilled in the art will know of additional IBS drugs currently in use or in development that are suitable for treating one or more symptoms associated with IBS.
An individual can also be monitored at periodic time intervals to assess the efficacy of a certain therapeutic regimen once a sample from the individual has been classified as an IBS sample. For example, the levels of certain markers change based on the therapeutic effect of a treatment such as a drug. The patient is monitored to assess response and understand the effects of certain drugs or treatments in an individualized approach. Additionally, patients may not respond to a drug, but the markers may change, suggesting that these patients belong to a special population (not responsive) that can be identified by their marker levels. These patients can be discontinued on their current therapy and alternative treatments prescribed.
The following examples are offered to illustrate, but not to limit, the claimed invention.
This example illustrates that determining the presence or level of leptin is useful for classifying a patient sample as an IBS sample, e.g., by ruling in IBS. The concentration of leptin was measured in serum samples from normal, IBS, IBD (i.e., CD, UC), and Celiac disease patients using an immunoassay (i.e., ELISA). As shown in
Leptin is also useful for distinguishing between various forms of IBS.
This example illustrates that determining the presence or level of TWEAK is useful for classifying a patient sample as an IBS sample, e.g., by ruling in IBS. The concentration of TWEAK was measured in samples from normal, GI control, IBS, and IBD (i.e., CD, UC) patients using an immunoassay (i.e., ELISA). As shown in
This example illustrates that determining the presence or level of IL-8 is useful for classifying a patient sample as an IBS sample, e.g., by ruling in IBS. The concentration of IL-8 was measured in samples from normal, GI control, IBS, IBD (i.e., CD, UC), and Celiac disease patients using an immunoassay (i.e., ELISA). As shown in
This example illustrates that determining the presence or level of EGF is useful for classifying a patient sample as an IBS sample, e.g., by ruling in IBS or ruling out IBD. The concentration of EGF was measured in samples from normal, GI control, IBS, IBD (i.e., CD, UC), and Celiac disease patients using an immunoassay (i.e., ELISA). As shown in
This example illustrates that determining the presence or level of NGAL is useful for classifying a patient sample as an IBS sample, e.g., by ruling in IBS. The concentration of NGAL was measured in samples from normal, IBS, IBD, and Celiac disease patients using an immunoassay (i.e., ELISA). As shown in
This example illustrates that determining the presence or level of MMP-9 is useful for classifying a patient sample as an IBS sample, e.g., by ruling in IBS or ruling out IBD. The concentration of MMP-9 was measured in samples from normal, GI control, IBS, and IBD patients using an immunoassay (i.e., ELISA). As shown in
This example illustrates that determining the presence or level of a complex of NGAL and MMP-9 (i.e., NGAL/MMP-9 complex) is useful for classifying a patient sample as an IBS sample, e.g., by ruling in IBS or ruling out IBD. The concentration of NGAL/MMP-9 complex was measured in samples from normal, IBS, and IBD patients using an immunoassay (i.e., ELISA). As shown in
This example illustrates that determining the presence or level of Substance P is useful for classifying a patient sample as an IBS sample, e.g., by ruling in IBS. The concentration of Substance P was measured in samples from normal, IBS, IBD (i.e., CD, UC), and Celiac disease patients using an immunoassay (i.e., ELISA). As shown in
Serum samples from patients are obtained retrospectively from multiple centers. Diagnoses are provided for all samples by the Principal Investigator at each site following biopsies and/or colonoscopy results. Approximately 1 ml samples are drawn into SST or serum separators at the sites. The tubes are spun and frozen at −70° C. until shipment. Samples are shipped with cold packs and upon receipt are spun again and frozen at −70° C. until testing.
Serum levels of ANCA, ASCA-G, anti-Omp-C antibodies, anti-Cbir1 antibodies, and IL-8 are carried out using an ELISA or an immunofluorescence assay. The analytical performance of these assays has previously been validated. IL-8 levels are measured with a commercial ELISA kit (Invitrogen).
In this study, a novel approach is developed that applies two different learning statistical classifiers (e.g., random forests (RF) and artificial neural networks (ANN)) to predict IBS based upon the levels and/or presence of a panel of serological markers. These learning statistical classifiers use multivariate statistical methods like, for example, multilayer perceptrons with feed forward Back Propagation, that can adapt to complex data and make decisions based strictly on the data presented, without the constraints of regular statistical classifiers. In particular, a combinatorial approach that makes use of multiple discriminant functions by analyzing marker levels with more than one learning statistical classifier is created to further improve the sensitivity and specificity of the diagnostic test. One preferred method is a combination of RF and ANN applied in tandem. Overall accuracy is used to determine the clinical performance of the test in the validation population.
Marker values from patient samples are first split into training, testing, and validating cohorts. Different patient samples are used for training, testing, and for validation purposes.
The antibody levels from each of the 4 ELISA assays (predictors) and the diagnosis (0=Non-IBS, 1=IBS, 2=IBD, Dependent Variable) from a cohort of patient samples are used as input for the RF software module. Multiple RF models are created and analyzed for accuracy of IBS prediction using the test cohort. The best predictive RF models are selected and tested for accuracy of IBS prediction using data from the validation cohort.
Several RF models are used to predict IBS, IBD, or non-IBS from the training set. The output data are used as input for training neural networks. The outputs from the RF software module include a prediction value (i.e., 0 [non-IBS], 1 [IBS], or 2 [IBD]) and 3 probability or confidence values (one for each prediction). The three probability values are used together with the levels of the markers, as predictor values for further statistical analysis using ANN. A schematic representation of data processing is illustrated in
The values of the markers and the probabilities of non-IBS, IBS, and IBD predictions obtained from the RF model (Salford Systems; San Diego, Calif.) are used as predictors and the diagnosis as a dependent variable to create multiple ANN with the use of the neural networks software. The Intelligent Problem Solver module of the neural networks software package (Statistica; StatSoft, Inc.; Tulsa, Okla.) is used to create ANN models in a feed-forward, backpropagation, and classification mode with the training cohort. More than 1,000 ANN are created using the input from various RF models. The best models are selected based on the lowest error of IBS prediction on the test dataset.
A diagram of an ANN is shown in
The selected algorithm is then validated with a cohort of samples that has not been used in the training and testing sets (i.e., the validation set). The data obtained from this test is used to calculate all accuracy parameters for the algorithm.
Additionally, final validation and calculation of accuracy is performed on data from a sample cohort non-overlapping with the training and testing sets.
The sensitivity and specificity of IBS prediction is high. Accurate identification of IBS is revealed by sensitivities and specificities near or above 90%. The hybrid RF/ANN model predicts IBS with a high level of accuracy.
Patient samples are analyzed using a random forest (RF) statistical algorithm. The samples are split into training, testing, and validating cohorts. Different patient samples are used for training, testing, and for validation purposes.
Serum levels of IL-8, lactoferrin, ANCA, ASCA-G, and anti-Omp-C antibodies are carried out using an ELISA as described above.
In this study, a novel approach is developed that applies a single learning statistical classifier (i.e., random forests) to predict IBS based upon the levels and/or presence of a panel of serological markers. The antibody levels from each of the ELISA assays and the diagnosis from the train/test cohort of patient samples are used as input for the RF software module (Salford Systems; San Diego, Calif.). Multiple RF models are created and analyzed for accuracy of IBS prediction using the train/test cohort. The best predictive RF models are selected and tested for accuracy of IBS prediction using data from the validation cohort.
The selected RF algorithm is then validated with a cohort of samples that has not been used in the training and testing sets (i.e., the validation set). The data obtained from this test is used to calculate all accuracy parameters for the algorithm.
The sensitivity and specificity of IBS prediction are high. Accurate identification of IBS was revealed by sensitivities and specificities near or above 85%. The RF model predicts IBS with a high level of accuracy.
Samples are analyzed using a classification tree statistical algorithm. These cases can have serological marker information for IL-8, ANCA ELISA, anti-Omp-C antibodies, ASCA-A, ASCA-G, anti-Cbir1 antibodies, pANCA, and/or lactoferrin.
In this study, a novel approach is developed that uses a single learning statistical classifier (i.e., classification trees) to predict IBS based upon the levels and/or presence of a panel of serological markers. In order to generate robust estimates of the efficacy of each classification method, a simulation with 500 iterations is performed. For each iteration, the data is divided into a training set and a validation set. Each time, 80% of the observations are randomly assigned to the training set and 20% of the observations are randomly assigned to the validation set. Using the training set, classification models are built using classification trees.
Classification trees are constructed by repeated binary splits of subsets of the data, beginning with the complete dataset. Each time a binary split is performed, there is an attempt to create descendent subsets that are “purer,” or more homogeneous, than the parent subset. This is done by computationally finding a split that achieves the largest decrease in the average impurity of the descendent subsets. Impurity is usually defined in operational terms by one of three metrics:
Though minimizing the misclassification rate is the overall goal, it is considered a poor criterion for the split search because it produces only a one-step optimization. The Gini index and entropy criterion produce similar results for two-class problems (Hastie et al., The Elements of Statistical Learning, New York; Springer (2001)). The nodes created by each binary split are recursively split until one of the following three conditions becomes true:
Once a terminal point has been reached for every node, the tree is pruned upward. This procedure creates a sequence of smaller and smaller trees. The overall impurity of each of these trees can be measured and the one with the smallest total impurity selected. This may be regarded as the “best” classification tree (Breiman et al., Classification and Regression Trees, Wadsworth; Belmont, Calif. (1984)).
Once the “best” tree is selected, the predicted class of each of the terminal nodes is determined by a simple majority “vote” of each observation in the node. In order to classify a new case, the new observation is simply sent down the tree. The predicted class of the new observation is the predicted class of the terminal node in which it is placed. Further discussion and examples may be found, e.g., in Hastie et al., supra; and Venables et al., Modern Applied Statistics with S-Plus, 4th edition; New York; Springer (2002).
This example illustrates a questionnaire that is useful for identifying the presence or severity of one or more IBS-related symptoms in an individual. The questionnaire can be completed by the individual at the clinic or physician's office, or can be brought home and submitted when the individual returns to the clinic or physician's office, e.g., to have his or her blood drawn.
In some embodiments, the questionnaire comprises a first section containing a set of questions asking the individual to provide answers regarding the presence or severity of one or more symptoms associated with IBS. The questionnaire generally includes questions directed to identifying the presence, severity, frequency, and/or duration of IBS-related symptoms such as chest pain, chest discomfort, heartburn, uncomfortable fullness after having a regular-sized meal, inability to finish a regular-sized meal, abdominal pain, abdominal discomfort, constipation, diarrhea, bloating, and/or abdominal distension.
In certain instances, the first section of the questionnaire includes all or a subset of the questions from a questionnaire developed by the Rome Foundation Board based on the Rome III criteria, available at romecriteria.org/questionnaires. For example, the questionnaire can include all or a subset of the 93 questions set forth on pages 920-936 of the Rome III Diagnostic Questionnaire for the Adult Functional GI Disorders (Appendix C), available on the world wide web at romecriteria.org/pdfs/AdultFunctGlQ.pdf. Preferably, the first section of the questionnaire contains 16 of the 93 questions set forth in the Rome III Diagnostic Questionnaire (see, Table 2). Alternatively, the first section of the questionnaire can contain a subset (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15) of the 16 questions shown in Table 2. As a non-limiting example, the following 10 questions set forth in Table 2 can be included in the questionnaire: Question Nos. 2, 3, 5, 6, 9, 10, 11, 13, 15, and 16. One skilled in the art will appreciate that the first section of the questionnaire can comprise questions similar to the questions shown in Table 2 regarding pain, discomfort, and/or changes in stool consistency.
In other embodiments, the questionnaire comprises a second section containing a set of questions asking the individual to provide answers regarding the presence or severity of negative thoughts or feelings associated with having IBS-related pain or discomfort. For example, the questionnaire can include questions directed to identifying the presence, severity, frequency, and/or duration of anxiety, fear, nervousness, concern, apprehension, worry, stress, depression, hopelessness, despair, pessimism, doubt, and/or negativity when the individual is experiencing pain or discomfort associated with one or more symptoms of IBS.
In certain instances, the second section of the questionnaire includes all or a subset of the questions from a questionnaire described in Sullivan et al., The Pain Catastrophizing Scale: Development and Validation, Psychol. Assess., 7:524-532 (1995). For example, the questionnaire can include a set of questions to be answered by an individual according to a Pain Catastrophizing Scale (PCS), which indicates the degree to which the individual has certain negative thoughts and feelings when experiencing pain: 0=not at all; 1=to a slight degree; 2=to a moderate degree; 3=to a great degree; 4=all the time. The second section of the questionnaire can contain 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more questions or statements related to identifying the presence or severity of negative thoughts or feelings associated with having IBS-related pain or discomfort. As a non-limiting example, an individual can be asked to rate the degree to which he or she has one or more of the following thoughts and feelings when experiencing pain: “I worry all the time about whether the pain will end”; “I feel I can't stand it anymore”; “I become afraid that the pain will get worse”; “I anxiously want the pain to go away”; and “I keep thinking about how much it hurts.” One skilled in the art will understand that the questionnaire can comprise similar questions regarding negative thoughts or feelings associated with having IBS-related pain or discomfort.
In some embodiments, the questionnaire includes only questions from the first section of the questionnaire or a subset thereof (see, e.g., Table 2). In other embodiments, the questionnaire includes only questions from the second section of the questionnaire or a subset thereof
Upon completion of the questionnaire by the individual, the numbers corresponding to the answers to each question can be summed and the resulting value can be combined with the analysis of one or more diagnostic markers in a sample from the individual and processed using the statistical algorithms described herein to increase the accuracy of predicting IBS.
Alternatively, a “Yes” or “No” answer from the individual to the following question: “Are you currently experiencing any symptoms?” can be combined with the analysis of one or more of the biomarkers described herein and processed using a single statistical algorithm or a combination of statistical algorithms to increase the accuracy of predicting IBS.
This example illustrates techniques for the selection of features that can be included in the diagnostic marker and symptom profiles of the present invention for predicting IBS.
The goal of classification is to take an input vector X and assign it to one or more of K distinct classes Cj, where j is in the range (1 . . . K). (Bishop, Pattern Recognition and Machine Learning, Springer, p. 179 (2006)). In the context of a diagnostic test algorithm, the input vector may consist of a combination of quantitative measurements (e.g., biomarkers), nominal variables (e.g., gender), and ordinal variables (e.g., symptom presence or severity from survey responses). These components of the input vector may collectively be termed features. The input vector describes a patient for whom a diagnosis is desired. The output of the model is the diagnosis, a categorical variable (e.g., a binary variable, where 0=healthy and 1=disease).
A diagnostic test involves specifying the features of the input vector, and the algorithm used to predict the classifications. While it is possible to use a maximal model, in which all input features and their interactions are included, this is not preferred, for reasons of economy and parsimony (Crawley, Statistical Computing: An Introduction to Data Analysis using S-Plus, Wiley, p. 211 (2002)). Economy suggests that since gathering inputs entails costs, the cost of obtaining an input must be weighed against its benefit. Parsimony suggests that simpler models are preferable, and that inputs and/or terms which are insignificant should not be included, in order to optimize the clarity and reliability of the test.
A number of techniques may be used to select the features of the input vector which will be used in a diagnostic test. These techniques are discussed in the following paragraphs. Some input selection techniques are algorithm-independent, and may be used with any classification algorithm. Others are algorithm-specific. Examples of several algorithm-independent techniques, followed by techniques which are specifically applicable to random forest, logistic regression, or discriminant analysis algorithms are provided.
In considering generally applicable techniques, two families of approaches are available: statistical and stepwise-exploratory. If the input data fits certain assumptions (regarding normality and equality of variance), statistical techniques may be used, as described below. Stepwise methods may be used whether or not those assumptions are met by the data.
A number of classic standard tests may be used on features, both individually (univariate tests) and in groups (multivariate tests). For example, for quantitative biomarkers, the diagnostic classifications in the input data lead to group means which can be compared using t-tests. This requires that two assumptions are valid: the variable is normally distributed in each group; and the variance of the two groups are the same (Petrie & Sabin, Medical Statistics at a Glance, 2nd ed., Blackwell Publishing, p. 52 (2005)). This test has a multivariate analog: in a multivariate comparison, Hotelling's T2 test may be used (Flury, A First Course in Multivariate Statistics, Springer-Verlag, p. 402 (1997)).
If the required assumptions are not met, a number of nonparametric tests are available, such as the Mann-Whitney Rank-Sum test, the Wilcoxon rank sum test, and the Kruskal-Wallis statistic for three or more groups (Glantz, Primer of Biostatistics, 4th ed., McGraw-Hill, Chapter 10 (1997)).
For both the parametric and nonparametric tests, the results may be used to suggest which biomarkers (or groups of features) do or do not have significantly different mean scores for the diagnostic groups.
The following stepwise methods assume that an algorithm has been chosen (e.g., random forest, logistic regression), but these methods may be used with any algorithm, and they are in that sense algorithm-independent. In the context of the selected algorithm, it is desirable to choose a set of features from those available in the input vector. In order to use an exploratory technique, a scoring metric and a search method must be defined.
The first step is to choose a metric by which competing feature sets may be scored. One possible metric is accuracy, the percentage of correct predictions made by the classifier (both true positive and true negative). Alternatively, the scoring metric may be defined in terms of sensitivity (the percentage of individuals with disease who are classified as having the disease) and/or specificity (the percentage of individuals without disease who are classified as not having the disease) (Fisher & Belle, Biostatistics: A Methodology for the Health Sciences, Wiley-Interscience, p. 206 (1993)). Less commonly, the metric may also involve positive predictive value (ppv, the percentage of individuals with a positive test who have the disease) and negative predictive value (npv, the percentage of individuals with a negative test who do not have the disease).
The following is a list of available metrics: accuracy; sensitivity (alone); specificity (alone); the arithmetic mean of sensitivity and specificity; the geometric mean of sensitivity and specificity; the minimum of sensitivity and specificity; and the maximum of sensitivity and specificity. A similar set of metrics may be used with ppv and npv: ppv/npv alone; arithmetic mean; geometric mean; max; and min. It is also possible to define metrics which combine sensitivity, specificity, ppv, and npv (e.g., the arithmetic mean of those four values). It is also possible to define specific penalties for false positives and false negatives, in which case the score is to be minimized rather than maximized.
For any of the scoring metrics defined above, it is possible to evaluate any algorithm (including random forest, logistic regression, discriminant analysis, and others) by exhaustively enumerating every possible subset of features in the input vector. In cases where this is unacceptably computationally intensive, it is possible to conduct a stepwise search in which individual features are added (a forward search) or removed (a backwards search) one by one, in a series of rounds (Petrie & Sabin, Medical Statistics at a Glance, 2nd ed., Blackwell Publishing, p. 89 (2005)).
In a forward search, features (e.g., biomarkers, symptoms, etc.) are added one by one, in rounds. In the first round, an input vector consisting of one feature is evaluated on the training data, and the best feature (defined by the metric described above) is identified. In the second round, a new set of input features is constructed and evaluated. Each set has two features, one of which is the “best” feature from the first round of evaluation. The best pair of features from the second round is chosen, and becomes the basis for the third round, in which all input vectors have three features, two of which are the ones identified in the second round, and so forth. This procedure is carried out iteratively, with the number of rounds equal to the number of possible features in the input vector. At the conclusion, the best input vector (i.e., set of features), as defined by the metric, is selected.
A backward search is similar, but follows a process of model simplification rather than model expansion (Crawley, Statistics: An Introduction Using R, Wiley, p. 105 (2005)). The starting point is the input vector with a complete set of features. In each round, one parameter is chosen for deletion, as evaluated by the metric described above.
In addition to exhaustive forward and backward searches, it is possible to search stochastically. One method is to randomly generate a set of features, which are used as seeds. Each seed may then be evaluated both forward and backward, and the best resulting set of inputs may be used. An alternative method is to carry out multiple forward and/or backward searches, but in each round, rather than deterministically choosing the best feature addition or deletion, probabilistically choosing the feature to include or delete by a formula which monotonically decreases/increases the probability of addition/deletion based on the ranking in the last round.
Having discussed methods for feature selection which are applicable to any algorithm, this section describes methods which are specific to particular algorithms. Three representative algorithms are discussed: random forests; logistic regression; and discriminant analysis.
For random forests, two metrics are available to describe the importance of features: permutation importance (Strobl et al., BMC Bioinformatics, 8:25 (2007)) and gini importance (Breiman et al., Classification and Regression Trees, Chapman & Hall/CRC, p. 146 (1984)).
For permutation importance, the idea is to compare the scoring of a full forest to the scoring produced by a forest in which the input values for one feature have been scrambled. Intuitively, the more important the feature, the more the scoring will be reduced if the values of that feature have been randomly permuted. The decrease in score is the permutation importance; by evaluating all the features in this way, their importance may be ranked.
For gini importance, the idea is to take a weighted mean of the individual trees' improvement in the “gini gain” splitting criterion produced by each feature. Every time a split of a node is made on a certain feature, the gini impurity criterion for the two descendent nodes is less than the parent node. Adding up the gini decreases for each individual feature over all trees in the forest gives a measure of feature importance.
Logistic regression is used in cases where the dependent variable (e.g., diagnosis) is categorical/nominal. (Agresti, An Introduction to Categorical Data Analysis, 2nd ed., Wiley-Interscience, Chapter 4 (2007)). An extensive literature describes techniques for feature/model selection in multiple regression (Maindonald & Braun, Data Analysis and Graphics Using R, 2nd ed., Cambridge University Press, Chapter 6 (2003)).
In logistic and other types of regression, the significance of individual features may be assessed by testing the hypothesis that the corresponding regression coefficient is zero (Kachigan, Multivariate Statistical Analysis, A Conceptual Introduction, 2nd ed., Radius Press, p. 178 (1991)). It is also possible to assess a group of features on the basis of a deletion test, e.g., using an F test to assess the significance of the increase in deviance that results when a given term is removed from a regression model (Crawley, Statistics: An Introduction Using R, Wiley, p. 103 (2005); Devore, Probability and Statistics for Engineering and the Sciences, 4th ed., Brooks/Cole, p. 560 (1995)).
Discriminant analysis describes a set of techniques in which the parametric form of a discriminant function is assumed, and the parameters of the discriminant function are fitted. This is in contrast to techniques in which the parametric form of the underlying probability densities are assumed and fitted, rather than the discriminant function. The canonical example in this family of techniques is Fisher's linear discriminant analysis (LDA); related techniques and extensions include quadratic discriminant analysis (QDA), regularized discriminant analysis, mixture discriminant analysis, and others (Venables & Ripley, Modern Applied Statistics with S, 4th ed., Springer, Chapter 12 (2002)). Feature selection for LDA is discussed below; the discussion is also applicable to related techniques in this family.
In LDA, the coefficients of the linear discriminant are chosen to maximize the class separation, as measured by the ratio of the between-class variance and the within-class variance (Everitt & Dunn, Applied Multivariate Data Analysis, 2nd ed., Oxford University Press, p. 253 (2001)). In this context, the redundancy of features may be formally inferred (Flury, A First Course in Multivariate Statistics, Springer-Verlag, Sections 5.6 and 6.5 (1997)). This is done by testing the hypothesis that the relevant discriminant function coefficients are zero. By inference on the discriminant function coefficients, it is possible to construct tests of sufficiency/redundancy for possible groups of features.
A large number of other algorithms are available for diagnostic classification, including neural networks, support vector machines, CART (classification and regression trees), unsupervised clustering (k-means, Gaussian mixtures), k-nearest neighbors, and many others. For many of these algorithms, algorithm-specific techniques are available for evaluating and selecting features. In addition, some techniques focus on feature extraction (choosing a smaller number of features which may be linear or nonlinear combinations of the available features). These techniques include principal component analysis, independent component analysis, factor analysis, and other variations (Duda et al., Pattern Classification, 2nd ed., Wiley-Interscience, p. 568 (2001)).
This example illustrates techniques for use of a questionnaire to improve accuracy of an IBS diagnostic prediction algorithm.
In certain instances, identifying patients with IBS is more accurately predicted with the use of one or more questions as predictors to create an alternative algorithm or further input to provide added sensitivity and specificity.
In certain instances, questions were generated such as “Are you currently experiencing any symptoms?,” while others were extracted from known questionnaires such as Rome II, Rome III, the Pain Catastrophizing Scale (Sullivan et al., The Pain Catastrophizing Scale: Development and Validation, Psychol. Assess., 7:524-532 (1995)), and the like. Some questions had nominal answers (rates degree of some occurrence), while others were categorical (binary). In the Rome III questions, the nominal value of all answers from a patient were added to create a single score that was considered a simplified “disease severity” score. In certain embodiments, inclusion of this score together with the biomarker levels improved both the sensitivity and specificity of an algorithm.
In one embodiment, the score of each question (e.g., 0-4) was used as input (predictor) together with all biomarkers. Models were then created using Random Forests and Neural Networks. Both Random Forests and Neural Networks have the capability to determine the most significant questions that improve the accuracy of algorithm prediction. After having selected the best questions, one score was used to predict “disease severity,” or level of Catastrophizing, by summing the values of each question for a particular patient. The data that included the questionnaire scores were used to train algorithms using Random Forests, Neural Networks and other statistical classifiers. The questions from Rome II, Rome III, and the Pain Catastrophizing Scale improved the accuracy of prediction when used in combination with multiple biomarkers to identify patients with IBS. In addition, a single question, “Are you currently experiencing any symptoms?” (yes or no), was in some instances as important as the score sum of the answers to the questions in the questionnaire.
Table 3 shows that a symptom profile can improve the accuracy of IBS prediction. With the inclusion of various data from questionnaires as input predictors, specificity and sensitivity can both be improved.
As the data in Table 3 shows, the specificity is increased with the use of questionnaire data and on average, sensitivity is also increased. Sensitivity is the probability of a positive test among patients with IBS, whereas specificity is the probability of a negative test among patients without IBS.
The example illustrates the development of a novel diagnostic test that applies a single learning statistical classifier (i.e., random forests) to predict IBS based upon the levels and/or presence of a panel of 10 serological markers.
The development cohort for the IBS diagnostic test was composed of a total of 1721 serum samples, which were selected among adults (women, 70%) ranging from ≧18 to ≦70 years of age. All IBS samples (n=876) were collected from recognized GI experts in academic centers (60%) and from community GI clinics (40%) across the United States, thus ensuring optimal heterogeneity across GI practices. All IBS patients met Rome II or Rome III criteria and were diagnosed by a gastroenterologist at least 1 year prior to study enrollment. The training cohort, which was used to train the algorithm, consisted of 1205 unique samples. Test performance was validated using 516 unique samples with an overall IBS prevalence of 50%. Table 4 shows the composition of the cohort of samples used to create the IBS diagnostic test.
The ratios of samples from patients with IBS, IBD, Celiac disease, and functional GI disorders, as well as those from healthy individuals, were similar across cohorts. The assay values for the selected 10 IBS biomarkers were collected for both the training and validation cohort samples. The cohort sizes selected for the training and validation of the algorithm were based on standard statistical practices to ensure that the study cohort was large enough to adequately represent the IBS population. The training cohort (1205 samples) is large enough to state with 99% confidence that any IBS subpopulation with 10% or greater prevalence is well represented with this sample set. The required validation cohort size was calculated to be 499 samples using standard methodology that uses a specified confidence interval. The requirement of 499 samples was calculated by estimating an overall test accuracy of 75%±5% with 99% confidence. The final accuracy calculated on the validation cohort of 516 samples was 70%.
The following 10 biomarkers were assayed: (1) IL-113; (2) NGAL; (3) anti-Cbir1 antibodies; (4) ANCA; (5) BDNF; (6) TWEAK; (7) anti-tTG antibodies; (8) GROα; (9) TIMP-1; and (10) ASCA. Serum levels of each biomarker were determined using an ELISA as described above.
In this study, a novel approach was developed that applies a single statistical algorithm to predict IBS based upon the levels and/or presence of a panel of 10 serological markers. Sophisticated pattern recognition software called random forests (RF) was trained to differentiate the two populations of IBS and non-IBS and optimized for specificity. This resulted in an IBS diagnostic test with a low false positive rate (i.e., a specificity of 88%), a sensitivity of 50%, and an overall accuracy of 70%. Table 5 shows the clinical performance of the IBS diagnostic test in terms of its sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and overall accuracy.
Receiver Operating Characteristic (ROC) curves can help visualize the performance of a statistical classifier because the true positive rate (i.e., sensitivity) and the true negative rate (i.e., specificity) can be observed directly. These curves provide information about the performance of the IBS diagnostic test across all possible combinations of sensitivities and specificities. A quantitative measure of the performance of a test by ROC analysis can be measured by the area under the ROC curve (AUC). An AUC of 1 represents a perfect test, whereas an AUC of 0.5 represents a non-discriminating test. Thus, the AUC is a measure of differentiation power to correctly classify those with and without the disease.
The RF model developed in this study predicted IBS with a high level of accuracy (70%) and was optimized for a high specificity (88%).
Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, one of skill in the art will appreciate that certain changes and modifications may be practiced within the scope of the appended claims. In addition, each reference provided herein is incorporated by reference in its entirety to the same extent as if each reference was individually incorporated by reference.
The present application is a continuation-in-part of U.S. application Ser. No. 11/838,810, filed Aug. 14, 2007, which claims priority to U.S. Provisional Application Nos. 60/822,488, filed Aug. 15, 2006, 60/884,397, filed Jan. 10, 2007, and 60/895,962, filed Mar. 20, 2007, the disclosures of which are hereby incorporated by reference in their entireties for all purposes.
Number | Date | Country | |
---|---|---|---|
60895962 | Mar 2007 | US | |
60884397 | Jan 2007 | US | |
60822488 | Aug 2006 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11838810 | Aug 2007 | US |
Child | 12253177 | US |