Embodiments of the present invention relate generally to methods for diagnosis and monitoring of asthma, including but not limited to mild to moderate asthma, and its differentiation from other respiratory disorders by determining the expression profiles of asthma-specific genes in nasal brushing samples.
Asthma is a chronic respiratory disease that affects 8.6% of children and 7.4% of adults in the United States1. The true prevalence of asthma may be higher than these estimates. In one study of US middle school children, 11% reported physician-diagnosed asthma with current symptoms, while an additional 17% reported active asthma-like symptoms without a diagnosis of asthma2. Undiagnosed asthma leads to missed school and work, restricted activity, emergency department visits, and hospitalizations2, 3. Mild to moderate asthma in particular can be difficult to diagnose, as it intrinsically involves fluctuating symptoms and signs4. The airflow obstruction, bronchial hyper-responsiveness and airway inflammation that characterize asthma are challenging to assess routinely and easily4. Given the high prevalence of asthma, there is high potential impact of improved diagnostic tools on reducing morbidity and mortality from asthma. Biomarkers could improve the identification of mild/moderate asthma so that appropriate management can be pursued.
National and international guidelines recommend that the diagnosis of asthma should be based on a history of typical symptoms and objective findings of variable expiratory airflow limitation6, 7. However, obtaining such objective findings is challenging given currently available tools. Pulmonary function tests (PFTs) require equipment, expertise, and experience to execute well8, 9. Many individuals have difficulty with PFTs (e.g., spirometry) because they require coordinated breaths into a device. Results are unreliable if the procedure is done with poor technique8. Large epidemiologic studies of both children and adults substantiate that despite guidelines recommending objective tests such as PFTs to assess possible asthma, PFTs are not done in over half of patients suspected of having asthma8. Induced sputum and exhaled nitric oxide have been explored as asthma biomarkers, but their implementation requires technical expertise and does not yield better clinical results than physician-guided management alone10. Given the above, the reality is that most asthma is still clinically diagnosed and managed in children and adults based on self-report8, 9. This is suboptimal for mild/moderate asthma given its waxing/waning nature, and because self-reported symptoms and medication use are biased11. There is need to improve asthma diagnosis, and an accurate biomarker of mild/moderate asthma could help meet that need. The ideal biomarker of mild/moderate asthma would be (1) obtainable noninvasively, (2) obtainable quickly, (3) interpretable without substantial expertise or infrastructure.
A nasal biomarker of asthma is of high interest given the accessibility of the nose and shared airway biology between the upper and lower respiratory tracts12, 13, 14, 15. The easily accessible nasal passages are directly connected to the lungs and exposed to common environmental and microbial factors. An accurate nasal biomarker of asthma that could be quickly obtained by a simple nasal brush could improve asthma diagnosis in adult and pediatric populations.
An asthma-specific gene panel has high potential to be used as a non-invasive biomarker to aid in asthma diagnosis, as it can be quickly obtained by simple nasal brush, does not require machinery for collection, and is easily interpreted. As discussed herein, objective findings of asthma are often not obtainable. Patients with mild/moderate asthma may not be asymptomatic at the time of the clinical encounter, so they may have no detectable wheezing or cough on exam. In many cases, then, a clinician may diagnose asthma on the basis of history alone, and this contributes to the under-diagnosis and misclassification of asthma. Studies have shown that patients with active asthma under-perceive their symptoms and do not tell their primary care physician. An objective diagnostic tool that is easy and quick to obtain and interpret with minimal effort required by the provider and patient could improve asthma diagnosis so that appropriate management can be pursued. A nasal brush-based asthma gene panel meets these biomarker criteria and capitalizes on the common biology of the upper and lower airway, a concept supported by clinical practice and previous findings.
In finding nasal biomarkers of mild/moderate asthma (
What is needed, therefore, is a noninvasive, quick and simple method for reliably diagnosing and/or classifying asthma, including but not limited to mild to moderate asthma, as well as distinguishing asthma from other respiratory disorders, and subsequently treating the patient appropriately. It is to such a method that embodiments of the present invention are primarily directed.
As specified in the Background Section, there is a great need in the art to identify technologies for reliable, consistent, simple and non-invasive diagnosis of asthma, including but not limited to mild to moderate asthma, and use this understanding to develop novel diagnostic methods. The present invention satisfies this and other needs. Embodiments of the present invention relate generally to methods for diagnosis, classification and monitoring of asthma, including but not limited to mild to moderate asthma, and its differentiation from other respiratory disorders by determining the expression profiles of asthma-specific genes in nasal swab/scraping/brushing/wash/sponge samples.
In one aspect, the present invention provides a method for diagnosing asthma in a subject, comprising the steps of:
a) measuring the gene expression profile(s) of at least one of the genes in the asthma gene panel in a nasal swab/scraping/brushing/wash/sponge collected from the subject;
b) performing classification analysis on the gene counts obtained from the gene expression profile(s);
c) comparing the probability output obtained from the classification analysis to the optimal classification threshold; and
d) identifying the subject as (i) having asthma when the probability output is greater than or equal to the optimal classification threshold or (ii) not having asthma when the probability output is less than the optimal classification threshold.
In another aspect, the present invention provides a method for detection of asthma in a subject, comprising the steps of:
a) measuring the gene expression profile(s) of at least one of the genes in the asthma gene panel in a nasal swab/scraping/brushing/wash/sponge collected from the subject;
b) performing classification analysis on the gene counts obtained from the gene expression profile(s);
c) comparing the probability output obtained from the classification analysis to the optimal classification threshold; and
d) identifying the subject as (i) having asthma when the probability output is greater than or equal to the optimal classification threshold or (ii) not having asthma when the probability output is less than the optimal classification threshold.
In one aspect, the present invention provides a method for differentially diagnosing asthma from other respiratory disorders in a subject, comprising the steps of:
a) measuring the gene expression profile(s) of at least one of the genes in the asthma gene panel in a nasal swab/scraping/brushing/wash/sponge collected from the subject;
b) performing classification analysis on the gene counts obtained from the gene expression profile(s);
c) comparing the probability output obtained from the classification analysis to the optimal classification threshold; and
d) identifying the subject as (i) having asthma when the probability output is greater than or equal to the optimal classification threshold or (ii) not having asthma when the probability output is less than the optimal classification threshold.
In one aspect, the present invention provides a method for classifying a subject as having asthma or not having asthma, comprising the steps of:
a) measuring the gene expression profile(s) of at least one of the genes in the asthma gene panel in a nasal swab/scraping/brushing/wash/sponge collected from the subject;
b) performing classification analysis on the gene counts obtained from the gene expression profile(s);
c) comparing the probability output obtained from the classification analysis to the optimal classification threshold; and
d) identifying the subject as (i) having asthma when the probability output is greater than or equal to the optimal classification threshold or (ii) not having asthma when the probability output is less than the optimal classification threshold.
In another aspect, the present invention provides a method for monitoring asthma in a subject, comprising the steps of:
a) measuring the gene expression profile(s) of at least one of the genes in the asthma gene panel in a nasal swab/scraping/brushing/wash/sponge collected from the subject;
b) performing classification analysis on the gene counts obtained from the gene expression profile(s);
c) comparing the probability output obtained from the classification analysis to the optimal classification threshold; and
d) identifying the subject as (i) having asthma when the probability output is greater than or equal to the optimal classification threshold or (ii) not having asthma when the probability output is less than the optimal classification threshold.
In one aspect, the present invention provides a method for selecting a subject for a clinical trial for asthma therapeutic compositions and/or methods, comprising the steps of:
a) measuring the gene expression profile(s) of at least one of the genes in the asthma gene panel in a nasal swab/scraping/brushing/wash/sponge collected from the subject;
b) performing classification analysis on the gene counts obtained from the gene expression profile(s);
c) comparing the probability output obtained from the classification analysis to the optimal classification threshold; and
d) identifying the subject as (i) having asthma when the probability output is greater than or equal to the optimal classification threshold or (ii) not having asthma when the probability output is less than the optimal classification threshold.
In one aspect, the present invention provides a method for treating asthma in a subject, comprising the steps of:
a) measuring the gene expression profile(s) of at least one of the genes in the asthma gene panel in a nasal swab/scraping/brushing/wash/sponge collected from the subject;
b) performing classification analysis on the gene counts obtained from the gene expression profile(s);
c) comparing the probability output obtained from the classification analysis to the optimal classification threshold;
d) identifying the subject as (i) having asthma when the probability output is greater than or equal to the optimal classification threshold or (ii) not having asthma when the probability output is less than the optimal classification threshold; and
e) utilizing appropriate therapeutic compositions and/or methods if the subject has asthma.
In one aspect, the present invention provides a kit for diagnosing and/or detecting asthma in a subject, said kit comprising probes directed towards one or more of the genes in the asthma gene panel, as described in more detail herein, wherein the probes can be used to determine the expression levels of one or more of the genes in the asthma gene panel. The kit can also comprise (i) a detection means and/or (ii) an amplification means. The kit may further optionally include control probe sets for detection of control RNA in order to provide a control level as described herein.
In another aspect, the present invention provides a kit for diagnosing and/or detecting asthma in a subject, said kit comprising pairs of oligonucleotides directed towards one or more of the genes in the asthma gene panel, as described in more detail herein, wherein the pairs of oligonucleotides can be used to determine the expression levels of one or more of the genes in the asthma gene panel. The kit can also comprise (i) a detection means and/or (ii) an amplification means. The kit may further optionally include control primer/oligonucleotide sets for detection of control RNA in order to provide a control level as described herein.
In any of the above embodiments, step (a) further comprises the steps of (i) brushing, swabbing, scraping, washing or sponging the patient's nose, (ii) obtaining and appropriately preserving the nasal brushing/swab/scraping/wash/sponge sample, and (iii) assaying the gene expression profile of the cells and tissue contained in the sample, whether by isolating RNA as described herein or by use of a RNA profiling system that does not require a separate isolation step (such as, for example and not limitation, nanoString).
In any of the above embodiments, steps (b) and/or (c) and/or (d) are performed by a computer.
In any of the above embodiments, the classification analysis can comprise the Logistic Regression-Recursive Feature Elimination (LR-RFE) algorithm in combination with the Logistic algorithm as described in more detail below, with the gene expression profiles analyzed by this LR-RFE & Logistic model being the expression profiles of the genes in the LR-RFE & Logistic asthma gene panel. In this embodiment, the optimal classification threshold is about 0.76.
In any of the above embodiments, the classification analysis can alternatively comprise the LR-RFE & SVM-Linear combination model as described in more detail below, with the gene expression profiles analyzed by this model being the expression profiles of the genes in the LR-RFE & SVM-Linear asthma gene panel. The optimal classification threshold for this model is about 0.52.
In any of the above embodiments, the classification analysis can alternatively comprise the SVM-RFE & SVM-Linear model as described in more detail below, the gene expression profiles analyzed by this model being the expression profiles of the genes in the SVM-RFE & SVM-Linear asthma gene panel, and the optimal classification threshold for this model is about 0.64.
In any of the above embodiments, the classification analysis can alternatively comprise the SVM-RFE & Logistic model as described in more detail below, the gene expression profiles analyzed by this model being the expression profiles of the genes in the SVM-RFE & Logistic asthma gene panel, and the optimal classification threshold for this model is about 0.69.
In any of the above embodiments, the classification analysis can alternatively comprise the LR-RFE & AdaBoost model as described in more detail below, the gene expression profiles analyzed by this model being the expression profiles of the genes in the LR-RFE & AdaBoost asthma gene panel, and the optimal classification threshold for this model is about 0.49.
In any of the above embodiments, the classification analysis can alternatively comprise the LR-RFE & RandomForest model as described in more detail below, the gene expression profiles analyzed by this model being the expression profiles of the genes in the LR-RFE & RandomForest asthma gene panel, and the optimal classification threshold for this model is about 0.60.
In any of the above embodiments, the classification analysis can alternatively comprise the SVM-RFE & RandomForest model as described in more detail below, the gene expression profiles analyzed by this model being the expression profiles of the genes in the SVM-RFE & RandomForest asthma gene panel, and the optimal classification threshold for this model is about 0.50.
In any of the above embodiments, the classification analysis can alternatively comprise the SVM-RFE & AdaBoost model as described in more detail below, the gene expression profiles analyzed by this model being the expression profiles of the genes in the SVM-RFE & AdaBoost asthma gene panel, and the optimal classification threshold for this model is about 0.55.
In any of the above embodiments, the patient is a mammal. In any of the above embodiments, the patient is a human.
These and other objects, features and advantages of the present invention will become more apparent upon reading the following specification in conjunction with the accompanying description, claims and drawings.
The accompanying Figures, which are incorporated in and constitute a part of this specification, illustrate several aspects described below.
As specified in the Background Section, there is a great need in the art to identify technologies for reliable, consistent, simple and non-invasive diagnosis of asthma, including but not limited to mild to moderate asthma and use this understanding to develop novel diagnostic methods. The present invention satisfies this and other needs. Embodiments of the present invention relate generally to methods for diagnosis, classification and monitoring of asthma, including but not limited to mild to moderate asthma, and its differentiation from other respiratory disorders by determining the expression profiles of asthma-specific genes in nasal swab/scraping/brushing samples.
To facilitate an understanding of the principles and features of the various embodiments of the invention, various illustrative embodiments are explained below. Although exemplary embodiments of the invention are explained in detail, it is to be understood that other embodiments are contemplated. Accordingly, it is not intended that the invention is limited in its scope to the details of construction and arrangement of components set forth in the following description or examples. The invention is capable of other embodiments and of being practiced or carried out in various ways. Also, in describing the exemplary embodiments, specific terminology will be resorted to for the sake of clarity.
It must also be noted that, as used in the specification and the appended claims, the singular forms “a,” “an” and “the” include plural references unless the context clearly dictates otherwise. For example, reference to a component is intended also to include composition of a plurality of components. References to a composition containing “a” constituent is intended to include other constituents in addition to the one named. In other words, the terms “a,” “an,” and “the” do not denote a limitation of quantity, but rather denote the presence of “at least one” of the referenced item.
Also, in describing the exemplary embodiments, terminology will be resorted to for the sake of clarity. It is intended that each term contemplates its broadest meaning as understood by those skilled in the art and includes all technical equivalents which operate in a similar manner to accomplish a similar purpose.
Ranges may be expressed herein as from “about” or “approximately” or “substantially” one particular value and/or to “about” or “approximately” or “substantially” another particular value. When such a range is expressed, other exemplary embodiments include from the one particular value and/or to the other particular value. Further, the term “about” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within an acceptable standard deviation, per the practice in the art. Alternatively, “about” can mean a range of up to ±20%, preferably up to ±10%, more preferably up to ±5%, and more preferably still up to ±1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 2-fold, of a value. Where particular values are described in the application and claims, unless otherwise stated, the term “about” is implicit and in this context means within an acceptable error range for the particular value.
By “comprising” or “containing” or “including” is meant that at least the named compound, element, particle, or method step is present in the composition or article or method, but does not exclude the presence of other compounds, materials, particles, method steps, even if the other such compounds, material, particles, method steps have the same function as what is named.
Throughout this description, various components may be identified having specific values or parameters, however, these items are provided as exemplary embodiments. Indeed, the exemplary embodiments do not limit the various aspects and concepts of the present invention as many comparable parameters, sizes, ranges, and/or values may be implemented. The terms “first,” “second,” and the like, “primary,” “secondary,” and the like, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another.
It is noted that terms like “specifically,” “preferably,” “typically,” “generally,” and “often” are not utilized herein to limit the scope of the claimed invention or to imply that certain features are critical, essential, or even important to the structure or function of the claimed invention. Rather, these terms are merely intended to highlight alternative or additional features that may or may not be utilized in a particular embodiment of the present invention. It is also noted that terms like “substantially” and “about” are utilized herein to represent the inherent degree of uncertainty that may be attributed to any quantitative comparison, value, measurement, or other representation.
The dimensions and values disclosed herein are not to be understood as being strictly limited to the exact numerical values recited. Instead, unless otherwise specified, each such dimension is intended to mean both the recited value and a functionally equivalent range surrounding that value. For example, a dimension disclosed as “50 mm” is intended to mean “about 50 mm.”
It is also to be understood that the mention of one or more method steps does not preclude the presence of additional method steps or intervening method steps between those steps expressly identified. Similarly, it is also to be understood that the mention of one or more components in a composition does not preclude the presence of additional components than those expressly identified.
As used herein, the term “subject” or “patient” refers to mammals and includes, without limitation, human and veterinary animals. In a preferred embodiment, the subject is human.
In the context of the present invention insofar as it relates to asthma, the terms “treat”, “treatment”, and the like mean to relieve or alleviate at least one symptom associated with such condition, or to slow or reverse the progression of such condition. Within the meaning of the present invention, the term “treat” also denotes to arrest, delay the onset (i.e., the period prior to clinical manifestation of a disease) and/or reduce the risk of developing or worsening a disease. The terms “treat”, “treatment”, and the like regarding a state, disorder or condition may also include (1) preventing or delaying the appearance of at least one clinical or sub-clinical symptom of the state, disorder or condition developing in a subject that may be afflicted with or predisposed to the state, disorder or condition but does not yet experience or display clinical or subclinical symptoms of the state, disorder or condition; or (2) inhibiting the state, disorder or condition, i.e., arresting, reducing or delaying the development of the disease or a relapse thereof (in case of maintenance treatment) or at least one clinical or sub-clinical symptom thereof; or (3) relieving the disease, i.e., causing regression of the state, disorder or condition or at least one of its clinical or sub-clinical symptoms.
The term “a control level” as used herein encompasses predetermined standards (e.g., a published value in a reference) as well as levels determined experimentally in similarly processed samples from control subjects (e.g., BMI-, age-, and gender-matched subjects without asthma as determined by standard examination and diagnostic methods). The control level is included in the classification analyses as described herein.
RNA can be extracted from the collected tissue and/or cells (e.g., from nasal epithelial cells obtained from a nasal brushing, scraping, wash, sponge or swab) by any known method. For example, RNA may be purified from cells using a variety of standard procedures as described, for example, in RNA Methodologies, A Laboratory Guide for Isolation and Characterization, 2nd edition, 1998, Robert E. Farrell, Jr., Ed., Academic Press. In addition, various commercial products are available for RNA isolation. As would be understood by those skilled in the art, total RNA or polyA+RNA may be used for preparing gene expression profiles.
The expression levels (or expression profile) can be then determined using any of various techniques known in the art and described in detail elsewhere. Such methods generally include, for example and not limitation, polymerase-based assays such as RT-PCR (e.g., TAQMAN), hybridization-based assays such as DNA microarray analysis, flap-endonuclease-based assays (e.g., INVADER), direct mRNA capture (QUANTIGENE or HYBRID CAPTURE (Digene)), RNA sequencing (e.g., Illumina RNA sequencing platforms), and by the nanoString platform. See, for example, US 2010/0190173 for descriptions of representative methods that can be used to determine expression levels.
As used herein, the term “gene” refers to a DNA sequence expressed in a sample as an RNA transcript.
As used herein, “differentially expressed” or “differential expression” means that the level or abundance of an RNA transcripts (or abundance of an RNA population sharing a common target sequence (e.g., splice variant RNAs)) is higher or lower by at least a certain value in a test sample as compared to a control level.
As used herein, the term “asthma gene panel” refers to the unique set of 275 genes identified by all of the models and listed in Table 4 as the unique set of genes. Preferred subsets of the asthma gene panel that may be analyzed by different classifiers are also described in Table 4. Specifically, as used herein, the term “LR-RFE & Logistic asthma gene panel” refers to those 90 genes identified by the LR-RFE & Logistic models. The term “LR-RFE & SVM-Linear asthma gene panel” refers to those 90 genes identified by the LR-RFE & SVM-Linear models. The term “SVM-RFE & SVM-Linear asthma gene panel” refers to those 119 genes identified by the SVM-RFE & SVM-Linear models. The term “SVM-RFE & Logistic asthma gene panel” refers to those 119 genes identified by the SVM-RFE & Logistic models. The term “LR-RFE & AdaBoost asthma gene panel” refers to those 90 genes identified by the LR-RFE & AdaBoost models. The term “LR-RFE & RandomForest asthma gene panel” refers to those 90 genes identified by the LR-RFE & RandomForest models. The term “SVM-RFE & RandomForest asthma gene panel” refers to those 123 genes identified by the SVM-RFE & RandomForest models. The term “SVM-RFE & AdaBoost asthma gene panel” refers to those 212 genes identified by the SVM-RFE & AdaBoost models.
In various embodiments disclosed herein, the expression levels of different combinations of genes can be used to glean different information. For example, increased expression levels of certain genes such as C3 in an individual as compared to a control are associated with a diagnosis of mild/moderate asthma. Decreased expression levels of other genes such as DEFB1 in an individual as compared to a control are associated with a diagnosis of mild/moderate asthma. Expression of ORMDL3 in an individual as compared to a control is associated with a differential diagnosis of mild/moderate asthma relative to other respiratory disorders such as, for example and not limitation, rhinitis, respiratory infection, and cystic fibrosis.
In various embodiments, RNA expression profiling systems are utilized to quantify the gene expression profiles from the patient's nasal brushing/swab/scraping/washing/sponge, such as for example and not limitation, the nanoString profiling system. The output from such systems will provide a count of genes in the asthma gene panel, and such output is analyzed in an automated manner, such as by a computer, via the classifier and classification threshold as described herein. The results obtained from the classifier enable a clinician to diagnose the patient as having asthma or not.
After determining and analyzing the expression levels of the appropriate combination of genes in a patient's nasal brushing/swab/scraping/washing/sponge, the patient can be classified as having asthma or not having asthma. The classification may be determined computationally based upon known methods as described herein. Particularly preferred computational methods include the classifiers and optimal classification thresholds as described herein. The result of the computation may be displayed on a computer screen or presented in a tangible form, for example, as a probability (e.g., from 0 to 100%) of the patient having asthma and/or a certain severity of asthma. The report will aid a physician in diagnosis or treatment of the patient. For example, in certain embodiments, the patient's expression levels will be diagnostic of asthma or enable a differential diagnosis of asthma from other respiratory disorders such as rhinitis, irritation resulting from smoking, respiratory infection and cystic fibrosis, and the patient will subsequently be treated as appropriate. In other embodiments, the patient's expression levels of the appropriate combination of genes will not support a diagnosis of asthma, thereby allowing the physician to exclude asthma and/or mild to moderate asthma as a diagnosis. In some embodiments, the patient may be selected to participate in clinical trials involving treatment of asthma and/or related conditions based on the patient's gene expression profile.
In some embodiments, the classifier used is the LR-RFE & Logistic model, the gene expression profiles analyzed are the expression profiles of the genes in the LR-RFE & Logistic asthma gene panel, and the optimal classification threshold for this model is about 0.76.
In other embodiments, the classifier used is the LR-RFE & SVM-Linear model, the gene expression profiles analyzed are the expression profiles of the genes in the LR-RFE & SVM-Linear asthma gene panel, and the optimal classification threshold for this model is about 0.52.
In other embodiments, the classifier used is the SVM-RFE & SVM-Linear model, the gene expression profiles analyzed are the expression profiles of the genes in the SVM-RFE & SVM-Linear asthma gene panel, and the optimal classification threshold for this model is about 0.64.
In other embodiments, the classifier used is the SVM-RFE & Logistic model, the gene expression profiles analyzed are the expression profiles of the genes in the SVM-RFE & Logistic asthma gene panel, and the optimal classification threshold for this model is about 0.69.
In other embodiments, the classifier used is the LR-RFE & AdaBoost model, the gene expression profiles analyzed are the expression profiles of the genes in the LR-RFE & AdaBoost asthma gene panel, and the optimal classification threshold for this model is about 0.49.
In other embodiments, the classifier used is the LR-RFE & RandomForest model, the gene expression profiles analyzed are the expression profiles of the genes in the LR-RFE & RandomForest asthma gene panel, and the optimal classification threshold for this model is about 0.60.
In other embodiments, the classifier used is the SVM-RFE & RandomForest model, the gene expression profiles analyzed are the expression profiles of the genes in the SVM-RFE & RandomForest asthma gene panel, and the optimal classification threshold for this model is about 0.50.
In other embodiments, the classifier used is the SVM-RFE & AdaBoost model, the gene expression profiles analyzed are the expression profiles of the genes in the SVM-RFE & AdaBoost asthma gene panel, and the optimal classification threshold for this model is about 0.55.
In some embodiments, RNAs are purified prior to gene expression profile analysis. RNAs can be isolated and purified from nasal brushing/swab/scraping/wash/sponge by various methods, including the use of commercial kits (e.g., Qiagen RNeasy Mini Kit as described in Example 1 below). In some embodiments, RNA degradation in brushing/swab/scraping/wash/sponge samples and/or during RNA purification is reduced or eliminated. Useful methods for storing nasal brushing/swab/scraping/wash/sponge samples include, without limitation, use of RNALater as described herein. Useful methods for reducing or eliminating RNA degradation include, without limitation, adding RNase inhibitors (e.g., RNasin Plus [Promega], SUPERase-In [ABI], etc.), use of guanidine chloride, guanidine isothiocyanate, N-lauroylsarcosine, sodium dodecylsulphate (SDS), or a combination thereof. Reducing RNA degradation in nasal brushing/swab/scraping/wash/sponge samples is particularly important when sample storage and transportation is required prior to RNA purification.
In other embodiments, RNA is not purified prior to gene expression profile analysis. In such embodiments, RNA expression profiling platforms that can directly assay tissue and cells without a separate RNA isolation step are utilized (for example and not limitation, the nanoString system).
Examples of useful methods for measuring RNA level in nasal epithelial cells contained in nasal brushing/swab/scraping/wash/sponge include hybridization with selective probes (e.g., using Northern blotting, bead-based flow-cytometry, oligonucleotide microchip [microarray], or solution hybridization assays), polymerase chain reaction (PCR)-based detection (e.g., stem-loop reverse transcription-polymerase chain reaction [RT-PCR], quantitative RT-PCR based array method [qPCR-array]), direct sequencing, such as for example and not limitation, by RNA sequencing technologies (e.g., Illumina HiSeq 2500 platform, Helicos small RNA sequencing, miRNA BeadArray (Illumina), Roche 454 (FLX-Titanium), and ABI SOLiD), and the nanoString system. For review of additional applicable techniques see, e.g., Chen et al., BMC Genomics, 2009, 10:407; Kong et al., J Cell Physiol. 2009; 218:22-25.
In conjunction with the above diagnostic and screening methods, the present invention provides various kits comprising one or more primer and/or probe sets specific for the detection of target RNA. Such kits can further include primer and/or probe sets specific for the detection of other RNA that can aid in diagnosing, differentiating, and/or classifying asthma. In some embodiments, such kits can contain nucleic acid oligonucleotides for determining the level of expression of a particular combination of genes in a patient's nasal brushing/swab/scraping/wash/sponge sample. The kit may include one or more oligonucleotides that are complementary to one or more transcripts identified herein as being associated with asthma, and also may include oligonucleotides related to necessary or meaningful assay controls. A kit for evaluating an individual for asthma may include pairs of oligonucleotides (e.g., 4, 6, 8, 10, 12, 14 or more oligonucleotides). The oligonucleotides may be designed to detect expression levels in accordance with any assay format, including but not limited to those described herein. The kit may further optionally include control primer and/or probe sets for detection of control RNA in order to provide a control level as described herein.
A kit of the invention can also provide reagents for primer extension and amplification reactions. For example, in some embodiments, the kit may further include one or more of the following components: a reverse transcriptase enzyme, a DNA polymerase enzyme (such as, e.g., a thermostable DNA polymerase), a polymerase chain reaction buffer, a reverse transcription buffer, and deoxynucleoside triphosphates (dNTPs). Alternatively (or in addition), a kit can include reagents for performing a hybridization assay. The detecting agents can include nucleotide analogs and/or a labeling moiety, e.g., directly detectable moiety such as a fluorophore (fluorochrome) or a radioactive isotope, or indirectly detectable moiety, such as a member of a binding pair, such as biotin, or an enzyme capable of catalyzing a non-soluble colorimetric or luminometric reaction. In addition, the kit may further include at least one container containing reagents for detection of electrophoresed nucleic acids. Such reagents include those which directly detect nucleic acids, such as fluorescent intercalating agent or silver staining reagents, or those reagents directed at detecting labeled nucleic acids, such as, but not limited to, ECL reagents. A kit can further include RNA isolation or purification means as well as positive and negative controls. A kit can also include a notice associated therewith in a form prescribed by a governmental agency regulating the manufacture, use or sale of diagnostic kits. Detailed instructions for use, storage and trouble-shooting may also be provided with the kit. A kit can also be optionally provided in a suitable housing that is preferably useful for robotic handling in a high throughput setting.
The components of the kit may be provided as dried powder(s). When reagents and/or components are provided as a dry powder, the powder can be reconstituted by the addition of a suitable solvent. It is envisioned that the solvent may also be provided in another container. The container will generally include at least one vial, test tube, flask, bottle, syringe, and/or other container means, into which the solvent is placed, optionally aliquoted. The kits may also comprise a second container means for containing a sterile, pharmaceutically acceptable buffer and/or other solvent.
Where there is more than one component in the kit, the kit also will generally contain a second, third, or other additional container into which the additional components may be separately placed. However, various combinations of components may be comprised in a container.
Such kits may also include components that preserve or maintain DNA or RNA, such as reagents that protect against nucleic acid degradation. Such components may be nuclease or RNase-free or protect against RNases, for example. Any of the compositions or reagents described herein may be components in a kit.
In accordance with the present invention there may be employed conventional molecular biology, microbiology, and recombinant DNA techniques within the skill of the art. Such techniques are explained fully in the literature. See, e.g., Sambrook, Fritsch & Maniatis, Molecular Cloning: A Laboratory Manual, Second Edition (1989) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (herein “Sambrook et al., 1989”); DNA Cloning. A Practical Approach, Volumes I and II (D. N. Glover ed. 1985); Oligonucleotide Synthesis (M. J. Gait ed. 1984); Nucleic Acid Hybridization (B. D. Hames & S. J. Higgins eds. (1985); Transcription and Translation (B. D. Hames & S. J. Higgins, eds. (1984); Animal Cell Culture (R. I. Freshney, ed. (1986); Immobilized Cells and Enzymes (IRL Press, (1986); B. Perbal, A Practical Guide To Molecular Cloning (1984); F. M. Ausubel et al. (eds.), Current Protocols in Molecular Biology, John Wiley & Sons, Inc. (1994); among others.
The present invention is also described and demonstrated by way of the following examples. However, the use of these and other examples anywhere in the specification is illustrative only and in no way limits the scope and meaning of the invention or of any exemplified term. Likewise, the invention is not limited to any particular preferred embodiments described here. Indeed, many modifications and variations of the invention may be apparent to those skilled in the art upon reading this specification, and such variations can be made without departing from the invention in spirit or in scope. The invention is therefore to be limited only by the terms of the appended claims along with the full scope of equivalents to which those claims are entitled.
Experimental Design and Subjects
Subjects with mild/moderate asthma were a subset of participants of the Childhood Asthma Management Program (CAMP), a multicenter North American clinical trial of 1041 subjects that took place between 1991 and 201221,22. Findings from the CAMP cohort have defined current practice and guidelines for asthma care and research22. Participating subjects had asthma defined by symptoms greater than or equal to 2 times per week, use of an inhaled bronchodilator at least twice weekly or use of daily medication for asthma, and increased airway responsiveness to methacholine (PC20≤12.5 mg/ml). The subset of subjects included in this study were CAMP participants who presented for a visit between July 2011 and June 2012 at Brigham and Women's Hospital, one of eight study centers for this multicenter study.
Subjects without asthma or “no asthma” were recruited during the same time period (2011-2012) by advertisement at Brigham & Women's Hospital. Selection criteria were no personal history of asthma, no family history of asthma in first degree relatives, and self-described non-Hispanic white ethnicity. The rationale for limiting participation to non-Hispanic white individuals was to allow for optimal comparison to 968 CAMP subjects of Caucasian background who participated in the CAMP Genetics Ancillary study, which was focused on this population.55 Subjects underwent pre and post-bronchodilator spirometry according to ATS guidelines, and only those meeting selection criteria and without lung function abnormality or bronchodilator response were considered nonasthmatic or “no asthma”.
The institutional review boards of Brigham & Women's Hospital and the Icahn School of Medicine at Mount Sinai approved the study protocols.
Nasal Sample Collection and RNA Sequencing
A standard cytology brush was applied to the right nare of each subject and rotated three times with circumferential pressure for nasal epithelial cell collection. The brush was immediately placed in RNALater and then stored at 4° C. until RNA extraction. RNA extraction was performed with Qiagen RNeasy Mini Kit (Valencia, Calif.). Samples were assessed for yield and quality using the 2100 Bioanalyzer (Agilent Technologies, Santa Clara, Calif.) and Qubit (Thermo Fisher Scientific, Grand Island, N.Y.).
Of the 190 subjects who underwent nasal brushing (66 with mild/moderate asthma, 124 with no asthma), a random selection of 150 nasal brushes from subjects with asthma and nonasthmatic controls were a priori assigned as the development set, and the remaining 40 subjects were a priori assigned as the test set of independent subjects (for testing the classification model). To minimize potential bias due to batch effects, the inventors submitted all samples (training and test set samples) to the Mount Sinai Genomics Core for library preparation and RNA sequencing at the same time to allow for sequencing of all samples in a single run. Staff at the Mount Sinai Genomics Core were blinded to the assignment of samples as development or test set.
The sequencing library was prepared with the standard TruSeq RNA Sample Prep Kit v2 protocol (Illumina). The mRNA sequencing was performed on the Illumina HiSeq 2500 platform using 40-50 million 100 bp paired-end reads. The data were put through the inventors' standard mapping pipeline56 (using Bowtie57 and TopHat58, and assembled into gene- and transcription-level summaries using Cufflinks59). Mapped data were subjected to quality control with FastQC and RNA-SeQC.60 Data were normalized separately for the development and test sets. Genes with fewer than 100 counts in at least half the samples were dropped to reduce the potentially adverse effects of noise. DESeq225 was used to normalize the data sets using its variance stabilizing transformation method.
VariancePartition Analysis of Potential Confounders
Given differences in age, race, and sex distributions between the asthma and “no asthma” classes, the inventors used variancePartition24 to assess the degree to which these variables influenced gene expression. The total variance in gene expression was partitioned into the variance attributable to age, race, and sex using a linear mixed model implemented in variancePartition v1.0.024. Age (continuous variable) was modeled as a fixed effect while race and sex (categorical variables) were modeled as random effects. The results showed that age, race, and sex accounted for minimal contributions to total gene expression variance (
Downstream Analyses were Therefore Performed with Unadjusted Gene Expression Data.
Differential gene expression and pathway enrichment analysis DESeq225 was used to identify differentially expressed genes in the development set. Genes with FDR≤0.05 were deemed differentially expressed, with fold change <1 implying under-expression and vice versa. Pathway enrichment analysis was performed using Gene SetEnrichment Analysis26.
Statistical and Machine Learning Analyses of RNAseq Data Sets
To discover gene expression biomarkers that are capable of predicting the asthma status of a patient, the inventors used a rigorous machine learning pipeline in Python using the scikit-learn package61. This pipeline combined feature (gene) selection18, (outer) classification19 and statistical analyses of classification performance20 to the development set (
Feature (Gene) Selection:
Given a training set, a 5×5 nested (outer and inner) cross-validation (CV) setup27 was used to select sets of predictive genes (
The Recursive Feature Elimination (RFE) algorithm62 was executed on the inner CV training split to determine the optimal number of features. The use of RFE within this setting enabled the inventors to identify groups of features that are collectively, but not necessarily individually, predictive. This reflects the systems biology-based expectation that many genes, even ones with marginal effects, can play a role in classifying diseases/phenotypes (here asthma) in combination with other more strongly predictive genes63. Specifically, the inventors used the L2-regularized Logistic Regression (LR or Logistic)64 and SVM-Linear(kernel)65 classification algorithms in conjunction with RFE (conjunctions henceforth referred to as LR-RFE and SVM-RFE respectively). For this, for a given inner CV training split, all the features (genes) were ranked using the absolute values of the weights assigned to them by an inner classification model, trained using the LR or SVM algorithm, over this split. Next, for each of the conjunctions, the set of top-k ranked features, with k starting with 11587 (all filtered genes) and being reduced by 10% in each iteration until k=1, was considered. The discriminative strength of feature sets consisting of the top k features as per this ranking was assessed by evaluating the performance of the LR or SVM classifier based on them over all the inner CV training-test splits. The optimal number of features to be selected was determined as the value of k that produces the best performance. Next, a ranking of features was derived from the outer CV training split using exactly the same procedure as applied to the inner CV training split. The optimal number of features determined above was selected from the top of this ranking to determine the optimal set of predictive features for this outer CV training split. Executing this process over all the five outer CV training splits created from the development set identified five such sets. Finally, the set of features (genes) that was common to all these sets (i.e., in their intersection/overlap) was selected as the predictive gene set for this training set. One such set was identified for LR-RFE and SVM-RFE respectively.
(Outer) Classification:
Once respective predictive gene sets had been selected using LR-RFE and SVM-RFE, four outer classification algorithms, namely L2-regularized Logistic Regression (LR or Logistic)64, SVM-Linear66, AdaBoost66 and Random Forest (RF)67, were used to learn intermediate classification models over the training set. These intermediate models were applied to the corresponding holdout set to generate probabilistic asthma predictions for the constituent samples. An optimal threshold for converting these probabilistic predictions into binary ones was then computed from the holdout set. This optimization resulted in the proposed classification models. This optimization resulted in proposed classification models.
To obtain a comprehensive view of the performance of these proposed models, the above two components were executed on 100 random training-holdout splits of the development set. To determine the best performing combination of feature selection and outer classification algorithms, a statistical analysis of the classification performance of all the models resulting from all the considered combinations was conducted using the Friedman followed by the Nemenyi test20,68 These tests, which account for multiple hypothesis testing, assessed the statistical significance of the relative difference of performance of the combinations in terms of their relative ranks across the 100 splits, and allow the ordering of the overall performance of each combination in terms of the significance of their pairwise comparison. This statistical comparison was a novel aspect of the present pipeline, as this task, generally referred to as “model selection,” is typically based on a single training-holdout split. Even if multiple such splits are employed, models are generally selected based on absolute performance scores, and not based on the statistical significance of performance comparisons, as was done in the present Examples.
Optimization for parsimony: For biomarker optimization, it is essential to consider parsimony (i.e., minimize number of features or genes for accurate classification) In these models, an adapted performance measure, defined as the absolute performance measure for each model divided by the number of genes in that model, was used for this statistical comparison. In terms of this measure, a model that does not obtain the best absolute performance measure among all models, but uses much fewer genes than the other, may be judged to be the best model. The result of this statistical analysis, visualized as a Critical Difference plot28 (
Final Model Development and Evaluation:
The final step in the pipeline was to determine the representative model from the 100 iterations of the most statistically superior combination of feature selection and classification method identified from the above steps. In case of ties among the models of the best performing combination, the gene set that produced the best asthma classification F-measure (
Validation of the LR-RFE & Logistic Asthma Gene Panel in an RNAseq Test Set of Independent Subjects
The LR-RFE & Logistic asthma gene panel identified by the machine learning pipeline was then tested on the RNAseq test set (n=40) to assess its performance in independent subjects. F-measure was used to measure performance. For comparison, the same machine learning methodology was used to train and evaluate models from all combinations of feature selection and classification methods considered in the pipeline.
LR-RFE & Logistic Performance Comparison to Alternative Classification Models
To evaluate the relative performance of the LR-RFE & Logistic asthma gene panel, the inventors also applied the machine learning pipeline with replacement of the feature (gene) selection step with these pre-determined gene sets: (1) all filtered RNAseq genes, (2) all differentially expressed genes, and (3) known asthma genes from a recent review of asthma genetics29. These were each used as a predetermined gene set that was run through our machine learning pipeline (
Performance Comparison to Permutation-Based Random Models
To determine the extent to which the performance of all the above classification models could have been due to chance, the inventors compared their performance with that of random counterpart models (
Validation of the Asthma Gene Panel in External Asthma Cohorts
To assess the generalizability of the asthma gene panel, microarray-profiled data sets of nasal gene expression from two external asthma cohorts—Asthma1 (GSE19187)30 and Asthma2 (GSE46171)31 (Table 5)—were obtained from NCBI Gene Expression Omnibus (GEO)70. The asthma gene panel was evaluated on these external asthma test sets with performance measured by F-measures for the asthma and no asthma classes.
Validation of the Asthma Gene Panel in External Cohorts with Other Respiratory Conditions
To assess the panel's ability to distinguish asthma from respiratory conditions that can have overlapping symptoms with asthma, microarray-profiled data sets of nasal gene expression were also obtained for five external cohorts with allergic rhinitis (GSE43523)36, upper respiratory infection (GSE46171)31, cystic fibrosis (GSE40445)37, and smoking (GSE8987)12 (Table 6). The asthma gene panel was evaluated on these external test sets of non-asthma respiratory conditions with performance measured by F− measures for the asthma and no asthma classes.
Study Population and Baseline Characteristics
A total of 190 subjects underwent nasal brushing for this study, including 66 subjects with well-defined mild-moderate asthma (based on symptoms, medication use, and demonstrated airway hyperresponsiveness by methacholine challenge response) and 124 subjects without asthma (based on no personal or family history of asthma, normal spirometry, and no bronchodilator response). The definitional criteria we used for mild-moderate asthma were consistent with US National Heart Lung Blood Institute guidelines for the diagnosis of asthma7, and are the same criteria used in the longest NIH-sponsored study of mild-moderate asthma21,22.
From these 190 subjects, a random selection of 150 subjects were a priori assigned as the development set (to be used for classification model development and biomarker identification), and the remaining 40 subjects were a priori assigned as the RNAseq test set (to be used as one of 8 validation test sets for testing of the classification model and biomarker genes identified with the development set). Assignment of subjects to the development and test sets was done at this early juncture in the study to enable RNA sequencing from all subjects in a single run (to reduce potential bias from sequencing batch effects) with then immediate allocation of the sequence data to the development or test sets prior to any pre-processing and analysis. The test set was then set aside to preserve its independence.
The baseline characteristics of the subjects in the development set (n=150) are shown in the left section of Table 1. The mean age of subjects with and without asthma was comparable, with slightly more male subjects with asthma and more female subjects without asthma. Caucasians were more prevalent in subjects without asthma, which was expected based on the inclusion criteria. Consistent with the reversible airway obstruction that characterizes asthma4, subjects with asthma had significantly greater bronchodilator response than control subjects (P=1.4×10−5). Allergic rhinitis was more prevalent in subjects with asthma (P=0.005), consistent with known comorbidity between allergic rhinitis and asthma23. Rates of smoking between subjects with and without asthma were not significantly different.
RNA isolated from nasal brushings from the subjects was of good quality with mean RIN 7.8 (±1.1). The median number of paired-end reads per sample from RNA sequencing was 36.3 million. Following normalization and filtering, 11,587 genes were used for analysis. VariancePartition analysis24 showed that age, race, and sex minimally contributed to total gene expression variance (
Apre-bronchodilator measures. FEV1 = forced expiratory flow volume in 1 second, FVC = forced vital capacity. Mean (SD) or Number (%) provided.
BFisher's Exact test for categorical variables and t-test for continuous variables.
Differential gene expression analysis by DeSeq225, showed that 1613 and 1259 genes were respectively over- and under-expressed in asthma cases versus controls (false discovery rate (FDR)≤0.05) (Table 2A-2B). These genes were enriched for disease-relevant pathways26 including immune system (fold change=3.6, FDR=1.07×10−22), adaptive immune system (fold change=3.91, FDR=1.46×10−15), and innate immune system (fold change=4.1, FDR=4.47×10−9) (Table 2A-2B).
Identification of the Asthma Gene Panel by Machine Learning Analyses of RNAseq Development Set
To identify gene expression biomarkers that accurately predict asthma status, the inventors developed a nested machine learning pipeline that combines feature (gene) selection18 and classification19 techniques (
Evaluation Measures for Predictive Models
The most commonly used evaluation measures for predictive models in medicine are the positive and negative predictive values (PPV and NPV respectively). As shown in
A combination with good precision and recall determined from this comparison was LR-RFE & Logistic (
Forty six of the 90 genes included in the LR-RFE & Logistic model were differentially expressed genes, with 22 and 24 genes over- and under-expressed in asthma, respectively (
The LR-RFE & Logistic model of 90 genes is a subset of the 275 unique genes identified in all eight models, which 275 genes are defined as the “asthma gene panel”. Preferably, the 90 genes in this LR-RFE & Logistic asthma gene panel are used in combination with the LR-RFE & Logistic classifier and the model's optimal classification threshold (classify as asthma if probability output ≥about 0.76, else no asthma) to be effectively used for asthma classification, diagnosis or detection. Similarly, the genes in the model-specific asthma gene panels (Table 4) are used in combination with their model-specific classifiers and the model-specific optimal classification threshold to classify, diagnose or detect asthma effectively.
Validation of the Asthma Gene Panel in an RNAseq Test Set of Independent Subjects
The inventors tested the asthma gene panel identified from the above-described machine learning pipeline on an independent RNAseq test set. For this step, the inventors used the test set (n=40) of nasal RNAseq data from independent subjects that was set aside and remained untouched by the development set analysis. The baseline characteristics of the subjects in the test set (n=40) are shown in the right section of Table 1. The baseline characteristics were similar between the development and test sets, except for a lower prevalence of allergic rhinitis among those without asthma in the test set.
The LR-RFE & Logistic Model asthma gene panel performed with high accuracy in the RNAseq test set of independent subjects, achieving AUC=0.994 (
As context for comparison to other models possible from the machine learning pipeline and other methods,
Similarly, the other seven classification models and corresponding asthma gene panels performed well in terms of precision and recall, and also beat random performance, such that these models also classify asthma accurately.
Validation of the LR-RFE & Logistic Model Asthma Gene Panel in External Asthma Cohorts
To test the generalizability of the LR-RFE & Logistic Model asthma gene panel for asthma classification, the inventors applied this model to gene expression array data sets generated from two independent cohorts by other investigators with and without asthma (Asthma1GEO GSE19187)30 and Asthma2 (GEO GSE46171)21.). Table 5 summarizes the characteristics of these external independent test sets. These datasets were generated from nasal samples collected by independent investigators from subjects with and without asthma from distinct populations, which were then profiled on gene expression microarray platforms. In general, RNA-seq based predictive models are not expected to translate to microarray profiled samples.32,33 Gene mappings do not perfectly correspond between RNAseq and microarray due to disparities between array annotations and RNAseq gene models33. The goal was to assess the performance of the LR-RFE & Logistic Model asthma gene panel despite the discordance of study designs, sample collections, and gene expression profiling platforms.
The inventors found that the LR-RFE & Logistic Model asthma gene panel performed relatively well given the above handicaps, and better than expected in classifying both asthma and no asthma (
The LR-RFE & Logistic Model Asthma Gene Panel is Specific to Asthma: Validation in External Cohorts with Non-Asthma Respiratory Conditions
Because symptoms of asthma often overlap with those of other respiratory diseases, the inventors next sought to test the specificity of the LR-RFE & Logistic Model gene panel to asthma classification. For this, the inventors evaluated the performance of this LR-RFE & Logistic Model panel on nasal gene expression data derived from case control cohorts with allergic rhinitis (GSE43523)36, upper respiratory infection (GSE46171)31, cystic fibrosis (GSE40445)37, and smoking (GSE8987)12. Table 6 details the characteristics for these external cohorts with non-asthma respiratory conditions. In four of the five non-asthma data sets, the LR-RFE & Logistic Model asthma gene panel appropriately produced one-sided classifications, i.e., all samples were classified as “no asthma” or healthy, the term for the control class (
Examination of Genes in the LR-RFE & Logistic Model Asthma Gene Panel
Forty-six of the 90 genes included in the LR-RFE & Logistic Model panel were differentially expressed (FDR≤0.05), with 22 and 24 genes over- and under-expressed in asthma respectively (
The inventors have identified a panel of genes, as well as subsets of these genes for use with specific classifiers, expressed in nasal epithelium that accurately classifies subjects with mild/moderate asthma from healthy controls. This asthma gene panel, consisting of 275 unique genes interpreted via eight logistic regression classification models, performed with good precision and sensitivity. Specifically, the LR-RFE & Logistic model and associated asthma gene panel performed with high precision (PPV=1.00 and NPV=0.96) and sensitivity (0.92 and 1.00 for asthma and no asthma respectively) for classifying asthma. The performance of the LR-RFE & Logistic Model asthma gene panel across independent asthma test sets supports the generalizability of this panel across different study populations and two major modalities of gene expression profiling (RNA sequencing and microarray), as well as the specificity of this LR-RFE & Logistic Model panel as a diagnostic tool for asthma in particular, as well as the gene panels identified by the other seven models as discussed herein.
The asthma gene panel has high potential to be used as a minimally invasive biomarker to aid in asthma diagnosis in children and adults, as it can be quickly obtained by simple nasal brush, does not require machinery for collection, and is easily interpreted. According to the Global Initiative for Asthma and US National Heart Lung Blood Institute, the diagnosis of asthma should be based on a history of typical symptoms and objective findings of variable expiratory airflow limitation by PFT6, 7. Practically, however, objective findings are often not obtainable. Patients with mild/moderate asthma are frequently asymptomatic at the time of the clinical encounter, so they may have no detectable wheezing or cough on exam. Pulmonary function testing (PFT) is often not done for patients, as was keenly demonstrated by a study showing that over half of 465,866 patients age 7 years and older with newly diagnosed with asthma had no PFTs performed within a 3.5 year time period surrounding the time of diagnosis.8 Clinicians may defer PFTs due to lack of equipment, time, and/or expertise to perform and interpret results8, 9. Diagnosing asthma based on history alone contributes to its under-diagnosis, as patients with asthma under-perceive and under-report their symptoms11. Misdiagnosis of asthma also occurs frequently given overlapping symptoms between asthma and other conditions39. Even if PFTs are obtained, spirometric abnormalities in mild/moderate asthmatics are not always present. An objective, accurate diagnostic tool that is easy and quick to obtain and interpret with minimal effort required by the provider and patient could improve asthma diagnosis so that appropriate management can be pursued. The nasal brush-based asthma gene panel meets these biomarker criteria.
Implementation of the asthma gene panel could involve clinicians brushing a patient's nose, placing the brush in a prepackaged tube, and submitting the sample for gene expression profiling targeted to the panel. Some platforms allow for direct transcriptional profiling of tissue without an RNA isolation step, avoiding inconveniences associated with direct RNA work40, 41 and yielding comparable results to RNAseq42. Bioinformatic interpretation of the output via the LR-RFE & Logistic model and classification threshold could be automated, resulting in a determination of asthma or no asthma for the clinician to consider. Biomarkers based on gene expression profiling are being successfully used in other disease areas (e.g., MammaPrint43 and Oncotype DX44 for diagnosing/predicting breast cancer phenotypes).
Because it takes seconds for nasal brushing, the panel may be attractive to time-strapped clinicians, particularly primary care providers at the frontlines of asthma diagnosis. Asthma is frequently diagnosed and treated in the primary care setting45 where access to PFTs is often not immediately available. Although PFTs yield results without specimen handling, these advantages do not seem to overcome its logistical limitations as evidenced by their low rate of real-life implementation, 9 but low cost46. However, gene expression profiling costs are likely to decrease47, and implementation of the LR-RFE & Logistic Model asthma gene panel could result in cost savings if it reduces the under-diagnosis and misdiagnosis of asthma3. Undiagnosed asthma leads to costly healthcare utilization worldwide3, including in the United States, where asthma accounts for $56 billion in medical costs, lost school and work days, and early deaths48. Clinical implementation of the asthma gene panel could identify undiagnosed asthma, leading to its appropriate management before high healthcare costs from unrecognized asthma are incurred. Given the the LR-RFE & Logistic Model panel's demonstrated specificity, use of the LR-RFE & Logistic Model asthma gene panel could also reduce asthma misdiagnosis by correctly providing a determination of “no asthma” in non-asthmatic subjects with conditions often confused with asthma. Clinical benefit from gene-expression based biomarkers has already been seen in the breast cancer field, where use of the 70-gene panel test MammaPrint to guide chemotherapy in a clinical trial leads to a lower 5-year rate of survival without metastasis compared to standard management43.
The nasal brush-based asthma gene panel capitalizes on the common biology of the upper and lower airway, a concept supported by clinical practice and previous findings.12-15 Clinically, clinicians rely on the united airway by screening for lower airway infections (without limitation, influenza, methicillin-resistant Staphylococcus aureus) with nasal swabs.49 Sridhar et al. found that gene expression consequences of tobacco smoking in bronchial epithelial cells were reflected in nasal epithelium.12 Wagener et al. compared gene expression in nasal and bronchial epithelium from 17 subjects, finding that 99.9% of 33,000 genes tested exhibited no differential expression between nasal and bronchial epithelium in those with airway disease.13 In a study of 30 children, Guajardo et al. identified gene clusters with differential expression in exacerbated asthma vs. controls.14 The above studies were done with small sample sizes and microarray technology, although more recently, Poole et al. compared RNA-seq profiles of nasal brushings from 10 asthmatic and 10 control subjects to publically available bronchial transcriptional data, finding strong correlation (ρ=0.87) between nasal and bronchial transcripts, and strong correlation (ρ=0.77) between nasal differential expression and previously observed bronchial differential expression in asthmatics.15
Although based on only 90 genes, the LR-RFE & Logistic Model asthma gene panel classified asthma with greater accuracy than models using all differentially expressed genes in the sample (n=2187), all known asthma genes from genetic studies of asthma (n=70), as well as models based on information from all sequenced genes (n=11587 after filtering) (
The asthma gene panel did not perform quite as well in the asthma microarray test sets, and this was to be expected due to differences in study design between the RNAseq and and microarray test sets. First, the baseline characteristics and phenotyping of the subjects differed. Subjects in the RNAseq test set were adults who were classified as mild/moderate asthmatic or healthy using the same strict criteria as the development set (see Materials and Methods above), which required subjects with asthma to have an objective measure of obstructive airway disease (i.e., positive methacholine challenge response). In contrast, subjects in the Asthma1 microarray test set were all children (i.e., not adults) with underlying allergic rhinitis and dust mite allergen 358 sensitivity, whose asthma status was then determined clinically30 (Table 5). Subjects from the Asthma2 cohort were adults who were classified as having asthma or as healthy based on history. As mentioned, the diagnosis of asthma based on history alone without objective lung function testing can be inaccurate52. The phenotypic differences between these test sets alone could explain the differences in performance of the LR-RFE & Logistic Model asthma gene panel in the microarray test sets. Second, the differential performance may be due to the difference in gene expression profiling approach. Gene mappings do not perfectly correspond between RNAseq and microarray due to disparities between array annotations and RNAseq gene models.33 Compared to microarrays, RNAseq quantifies more RNA species and captures a wider range of signal.50 Prior studies have shown that microarray-derived models can reliably predict phenotypes based on samples' RNAseq profiles, but the converse does not often hold.33 Despite the above limitations, the asthma gene panel (identified using the RNAseq-derived development set) performed with reasonable accuracy in classifying asthma in the independent microarray test sets. These results support the generalizability of the asthma gene panel to asthma populations that may be phenotyped or profiled differently.
An effective biomarker for clinical use should have good positive and negative predictive value.53 In the present method, if an individual has asthma, the ideal biomarker would confirm this most of the time so that an accurate diagnosis is made, and if an individual does not have asthma, the ideal biomarker would confirm this (indicating “no asthma”) so that misdiagnosis does not occur. This is indeed the case with the LR-RFE & Logistic Model asthma gene panel, which achieved high positive and negative predictive values of 1.00 and 0.96 respectively on the RNAseq test set. The inventors tested the LR-RFE & Logistic Model asthma gene panel on independent tests sets of subjects with upper respiratory infection, cystic fibrosis, allergic rhinitis, and smoking, showing that the panel had a low to zero rate of misclassifying subjects with these other respiratory conditions as having asthma (
Even though the development set was from a single center and its baseline characteristics do not characterize all populations, variancePartition analysis demonstrated minimal contribution of age, race, and gender to gene expression variance in these data (
As with any disease, the first step is to accurately identify affected patients. The asthma gene panel described in this study provides an accurate path to this critical diagnostic step. With a correct diagnosis, an array of existing asthma treatment options can be considered6. A next phase of research will be to develop a nasal biomarker to predict endotypes and treatment response, so that asthma treatment can be targeted, and even personalized, with greater efficiency and effectiveness54.
In summary, the inventors applied a machine learning pipeline to identify a panel of genes expressed in nasal epithelium that accurately classifies subjects with mild/moderate asthma from healthy controls. This asthma gene panel, comprised of 275 genes and/or its subsets used in combination with model-specific classifiers and model-specific optimal classification thresholds, performed with accuracy across 8 independent test sets, demonstrating generalizability across study populations and gene expression profiling modality, as well as specificity to asthma. The asthma gene panel has high potential to be used as a minimally invasive biomarker to aid in asthma diagnosis, as it can be quickly obtained by simple nasal brush, does not require machinery for collection, and is easily interpreted. There are currently many limitations in asthma diagnostics. If applied to clinical practice, this asthma gene panel could improve asthma diagnosis and classification, reduce incorrect diagnoses, and prompt appropriate therapeutic management.
Table 2. Lists of over-expressed (A) and under-expressed (B) genes and pathways in asthma cases as compared to controls. Differentially expressed genes were identified using DESeq225 and enriched pathways were identified from the Molecular Signature Database26.
Escherichia coli
salmonella hijack
Positive and negative predictive values (PPV and NPV respectively) obtained when the LR-RFE & Logistic asthma gene panel was applied to classifying samples in various microarray-derived data sets of subjects with non-asthma respiratory conditions and controls. Also shown in parentheses are the corresponding PPVs and NPVs obtained when random counterpart models are applied to these datasets for the same classification tasks.
While several possible embodiments are disclosed above, embodiments of the present invention are not so limited. These exemplary embodiments are not intended to be exhaustive or to unnecessarily limit the scope of the invention, but instead were chosen and described in order to explain the principles of the present invention so that others skilled in the art may practice the invention. Indeed, various modifications of the invention in addition to those described herein will become apparent to those skilled in the art from the foregoing description. Such modifications are intended to fall within the scope of the appended claims.
Disclosed are methods and compositions that can be used for, can be used in conjunction with, can be used in preparation for, or are products of the disclosed methods and compositions. These and other materials are disclosed herein, and it is understood that combinations, subsets, interactions, groups, etc. of these methods and compositions are disclosed.
All patents, applications, publications, test methods, literature, and other materials cited herein are hereby incorporated by reference in their entirety as if physically present in this specification.
This application claims priority to U.S. Provisional Application No. 62/296,291, filed on 17 Feb. 2016 and 62/296,915, filed on 18 Feb. 2016, the disclosures of each of which are herein incorporated by reference in their entirety.
This invention was made with government support under Grant Nos. R01GM114434, K08AI093538 and R01AI118833, all awarded by the National Institutes of Health (NIH). The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2017/018318 | 2/17/2017 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62296291 | Feb 2016 | US | |
62296915 | Feb 2016 | US |