The contents of the electronic sequence listing (Tamimi_19.31_SequenceListing.txt; Size: 1,564 bytes; and Date of Creation: May 13, 2020) is herein incorporated by reference in its entirety.
The present invention provides an identification system for ascertaining, who is at risk of having asthma or currently has asthma or severe asthma. The system is based on transcriptomic expression data from messenger RNA (mRNA) extracted from saliva, blood, sputum, bronchial brush, and biopsies samples as well patients demographic data like age, sex, ethnicity, asthma risk factors, and clinical and laboratory tests and exam routinely done for asthma diagnosis. Also provided related materials and methods, including primers and kits used.
Asthma is one of the most common chronic disease, usually characterized by being a lifelong condition and carrying a high disease burden. Bronchial hyperresponsiveness, inflammation, and airway obstruction episodes are the main characteristic features of this disease. The prevalence of asthma was shown to have increased in the last years, reaching alarming levels. Many theories about the factors that render people at risk of developing asthma were proposed without conclusive results. It is not yet understood what converts fully controlled cases into severe or fatal cases. Asthma is unpredictable, and severity can fluctuate into asthma attack leading to sudden death. Hospital admissions and mortality in individuals with asthma has been rising in the past ten years; nevertheless, new drug discovery has progressed slower than in other specialties. Also, a significant issue that is emerging in asthma is its heterogeneity rather than being a single disease. Such heterogeneity can be attributed to the fact that the airways constrict differently in response to the same provoking stimuli, leading to a heterogeneous clinical presentation that indicates the heterogeneity of the underlying pathogenesis.
Omics technologies such as genomics, transcriptomics, proteomics, and metabolomics, provide extremely detailed molecular-level information and enriched our understanding about the molecular basis of asthma, but fails to elucidate the big picture. Transcriptomic analysis of the airways has the potential to discover gene expression profiles that are characteristic of asthma and has shown promising power to identify different molecular mechanisms that separate different asthmatic phenotypes. Despite all efforts and costs spent on these studies, no conclusive results have been obtained, or results were at sometimes contradictory with each other. The biological processes that underlie complex diseases like asthma do not operate in isolation but manifest collectively as woven intercalated molecular cascades and interactions. The biological complexity of asthma can not be captured using an isolated output from each of those omics technologies.
Biomarkers that aid in asthma phenotyping allows physicians to “personalize” treatment with targeted biological agents. Unfortunately, testing for these biomarkers is not routine in patients with refractory asthma to standard therapy. Scientific advances in recognition of sensitive and specific biomarkers are steadily outpacing the clinical availability of reliable and non-invasive assessments methods designed for the prompt and specific diagnosis, classification, treatment, and monitoring of severe asthma patients. The current diagnosis of asthma through a combination of clinical history with pulmonary function testing and methacholine or exercise challenge test does not explicitly characterize or quantify airway inflammation. Currently, there is a limited set of relevant biomarkers, like eosinophil counts, a fraction of exhaled nitric oxide [Feno] values, and periostin and IgE levels. However, these markers have limitations because they define patients with a Th2 high pattern while a significant proportion of asthmatic patients do not exhibit a Th2 pattern, with the result that this testing does not convey the full story. Therefore, it can be seen that a novel integrated approach that can unravel the complexity of the molecular basis of asthma would contribute the art.
In accordance with the present disclosure, there is provided a technique for assessing salivary and blood biomarkers. Thus, in one aspect there is provided a method for assessing the asthma status of a subject, the method comprising: (a) providing a saliva and blood sample from the subject; (b) assessing the mRNA in the sample to determine the level of transcription of at least one gene selected from Table 1 or Table 2 across different asthma endotypes; (c) utilizing the results of (b) and to characterize the subject according to their probable asthma status. In an embodiment, (d) utilizing the characterization of the subject according to their probable asthma status to establish a diagnostic and prognostic intervention plan and/or treatment for the subject.
As well as providing diagnostic information about the subject per se, in another embodiment, the method may be used to stratify a population according to their relative risk. This allows health providers to prioritize further actions. It should be understood that “risk” in this context is used in a relative context with respect to a population of which the subject is a member. Thus, in an additional embodiment, the method may be used for assessing the asthma risk status for a subject.
In accordance with a second aspect of the present disclosure, there is provided a method that may be used within a population to establish asthma development risk in the members of the population and/or determine who is at higher risk of developing asthma or severe asthma.
In accordance with a third aspect of the present disclosure, there is provided a method that may be used to classify healthy subjects into category 1, characterized as low risk, category 2 as medium risk, and category 3 as high risk of developing asthma. In a preferred embodiment the method may be used to diagnostically characterize the subject according to the following asthma severity: (1) healthy; (2) mild asthma; (3) moderate asthma, and (4) severe asthma.
In some embodiments, the method is used for assessing the risk of developing severe asthma in an asthmatic subject.
Thus, in accordance with a fourth aspect of the present disclosure and without limitation, the method may be used for one or more of
In preferred embodiments, the gene expression of Table 1 has been utilized by the inventors as significant Differently Expressed Genes (DEG) in healthy and asthmatic patients useful to determine the asthma status of a subject.
Transcriptome Biomarkers
As used herein, “transcriptomic based biomarker” refers to one of the genes listed in Table 1 and Table 2 and Appendix—Table 6.
The present invention provides an identification system for ascertaining, who is at risk of having asthma or currently has asthma or severe asthma. The system is based on transcriptomic expression data from mRNA extracted from bronchial biopsies, brush and liquid biopsies such as saliva and blood samples. Optionally, this can be combined together with patients' demographic data such as age, sex, ethnicity, asthma risk factors, which can be collected using a simple questionnaire, in addition to clinical exam and laboratory tests routinely carried out for asthmatic patients. The biomarkers identified can differentiate mild, moderate, and severe asthma, which is not yet available in clinical practice. Since the inventors utilize publicly available data, they were able to determine the validity of genes identified in more than 7,000 samples available, providing valid and robust support to the results obtained using this approach.
Briefly, the inventors utilized publicly available transcriptomics of Bronchial Epithelium, demographic, symptom, and risk factor data from 1,000 patients. These data have been analyzed using a novel artificial intelligence technique. From this, they identified a list of 76 genes differentially expressed between asthmatic (mild, moderate, and severe) compared to healthy control. Nine genes were specific to middle asthma compared to healthy, and 16 genes were specific to moderate asthma compared to healthy. Another list of 225 gene set of RNA markers that differentiate severe asthmatic from healthy and another asthmatic (mild and moderate).
The inventors further used the gene signature identified in gene expression analysis system to profile 60 distinct human RNA targets using a highly multiplexed amplification method in a further analysis data of RNA extracted from saliva, sputum, blood bronchial brush, and biopsies samples as well as optionally demographic, risk factor, and symptom data which can be collected using a sample questionnaire. The cohort used for validation was locally recruited, 60 patients, 20 normal, 20 nonsevere mild to moderate asthma, and 20 with severe asthma. The analysis has shown that combining just a small selection of RNA amplicons with simple demographic data and routine lab test like Peak flow, a blood differential count can accurately characterize (or classify, the terms are used interchangeably) the asthmatic status of a subject and identify those with or at risk of progressing to severe asthma and those that will not respond to asthma-specific treatments.
Saliva collection is easy, innocuous, acceptable by patients, and thus it represents a potential tool for measuring disease biomarkers with the potential to reduce the requirement for costly and invasive bronchoscopic investigations in subjects who are at risk of developing asthma, specifically severe asthma. Circulating genes expression level can be a reliable, non-invasive, and cost-effective biomarker that can provide additional discriminating power to the available clinical and laboratory tests of severe asthma.
Datasets Selection.
The methods described can provide information regarding healthy versus asthmatic sample. In one embodiment the method may provide information to identify DEG in acute asthmatic attack subjects versus convalescent phase, whether the sample taken is from bronchial biopsy versus brush or liquid biopsy, and the DEG in PBMC in response to the most common asthma allergen. Thus, the methods of the present invention can provide useful information in identifying DEG in severe and moderate asthmatic bronchial biopsy, brush, PBMC and saliva tissue.
In some embodiments, the subject may be identified as a candidate for being at risk of developing asthma or severe asthma. The methods disclosed herein are useful because they enable health care providers to determine appropriate diagnostic intervention and/or treatment plans. In one embodiment, the characterization of the subject as being at risk of developing (2) mild asthma may be used to decide a non-urgent further diagnostic intervention is required. In another embodiment, the characterization of the subject as being at risk of developing (3) moderate or (4) severe asthma may be used to decide an urgent further diagnostic intervention is required.
As shown in the examples, in preferred embodiments and based on the dataset herein, the system can provide clinically relevant limits of sensitivity and specificity. As described in more detail hereinafter, the choice of markers or other expression limits provide combinations of sensitivity and specificity as desired.
“Diagnostic” in this context means identifying the presence or nature of a pathological condition such as asthma. Diagnostic methods differ in their sensitivity and specificity. The “sensitivity” of a diagnostic test is the percentage of diseased who test positive (percent of “true positives”). Subjects who are not diseased and who test negative in the assay, are termed “true negatives.” Diseased individuals not detected by the assay are “false negatives.” The specificity of a diagnostic assay is 1 minus the false positive rate, where the “false positive” rate is defined as the proportion of those without the disease who test positive. While a particular diagnostic method may not provide a definitive diagnosis of a condition, it suffices if the method provides a positive indication that aids in diagnosis.
Choice of the Subject.
The term “subject,” as used herein, generally refers to a mammal. Typically the subject is a human. Where the present invention relates to the analysis of nucleic acid of a subject, such an individual may be entirely symptomless or maybe one who has asthma. A subject under the care of a physician or other health care provider may be referred to as a “patient.”
The method may be used to assess risk within a population by screening individual members of that population.
The Use of Saliva is Preferred in the Present Invention.
Saliva for use in the invention can be simply and acceptable collected and easily stored using conventional techniques. Increasing bodies of work are utilizing saliva and are detecting useful biomarkers in asthma studies.
Methods of the present invention may include obtaining a saliva sample comprising nucleic acid from an individual. Alternatively, the assessment of the biomarkers herein may be performed or based on a historical sample, or information already obtained therefrom.
In some embodiments, a saliva sample was collected from subjects who were asked to fast without eating or drinking for 1 hour before the saliva collection by gargling and rinsing the mouth with water 5 minutes before proceeding with the saliva collection. 1 mL of unstimulated whole saliva via passive drool was collected in pre-prepared 50 mL tube containing 1 mL of RNAlater (Invitrogen).
Specifically, the methods may optionally involve obtaining a saliva sample from a subject. As used herein, the term “obtaining a saliva sample” refers to any process for directly or indirectly acquiring the saliva sample from a subject. For example, a saliva sample may be obtained (e.g., at a laboratory facility) from one or more persons who procured the sample directly from the subject.
In some embodiments, the use of blood or the use of sputum, or the use of bronchial biopsy sample from a subject may be utilized in the present invention.
Datasets Selection.
Datasets to Identify Common DEG in Bronchial Epithelium of Asthmatic Patients.
The methods of the present invention utilized the publicly available transcriptomic dataset extracted from Gene Expression Omnibus (GEO) of asthmatic patients. The inclusion criteria for selecting datasets were as follows: datasets on human samples only, an experiment that included matching healthy control, only datasets with defined clinical classifications of participants, only dataset with bronchial epithelium gene expression using microarray. The inventors used dataset (GSE64913) as training dataset because it is designed with a complete characterization of patients and inclusion criteria. Out of the total 70 samples included in those studies, 33 asthmatic patients were compared to 37 healthy controls, as shown in Table 1.
Dataset to Identify DEG in Severe and Moderate Asthmatic Bronchial Biopsy Versus Brush.
Another dataset (GSE76227) with 190 samples (121 severe asthmatic and 69 moderate ones) where bronchial epithelial brush transcriptomics was compared to bronchial biopsy was used to find the identified genes are capable of detecting the disease even when the sample contains other than epithelial cells.
Dataset to Identify DEG in Asthmatic Lung Fibroblasts Compared to Healthy Fibroblasts in Different Locations of the Lung.
The GSE27335 dataset was used in the present invention to determine the DEG between bronchial and parenchymal fibroblasts in healthy and asthmatic patients.
Dataset to Identify DEG in Severe Versus Non-Severe Asthmatic PBMC Compared to Healthy.
Three datasets were used in the present invention to identify DEG in PBMC in healthy versus asthmatic samples in each cell type (CD4 vs. CD8 T lymphocytes) (GSE73482), and DEG in atopic asthmatic patients in acute versus convalescent asthma attack (GSE16032). A third dataset (GSE73482) was used to identify the DEG in PBMC in response to the most common asthma allergen (house dust mite).
Validation of Common DEG in Bronchial Epithelium of Asthmatic Patients and Deciphering Their Profiles in Conditions Other Than Asthma.
The methods of the present inventions extracted seven datasets of the same microarray platform, representing different types of samples other bronchial epithelium (nasal scraping, sputum, and blood) as well as different variables such as smoking, steroid inhaler treatment, acute versus convalescent conditions, rhinovirus infection, exercise-induced bronchospasm, to identify the variables that may affect their expression. The inventors compared the different locations of airway epithelium (nasal, central airway and peripheral), gender-specific variations, and looked for other respiratory diseases that share common features with asthma, like COPD and IPF. The total number of explored samples was 615 (263 asthmatic, 36 COPD, 23 IPF, 60 smokers, and 184 healthy controls), as shown in Table 2:
Raw CEL Files Processing, Normalization, and Filtration.
An example method of the present invention extracted files from the Raw Affymetrix Human Genome U133 Plus 2.0 Array CEL. Each dataset underwent preprocessing and normalization separately using in-house R script on R statistical software version 3.02 that uses affy packages GCRMA, MASS, and RMA as shown in the flowchart of
Measuring Levels of RNA.
Methods of the invention will generally employ protocols which examine the presence and/or expression of mRNAs, in saliva, or sputum, or blood, or bronchial biopsy sample. Methods for the evaluation of mRNAs in cells are well known.
Probes Collapsing to Their Corresponding Genes.
Probes for the present invention were selected among matching probes, then collapsed to their corresponding gene using GSEA software. The resulting gene list was used for AbsGSEA to identify the significantly enriched pathways. A total of 100,000 gene sets were analyzed, and results were ranked according to the nominal P-value (<0.05) and false discovery rate (≤0.25) as previously described. The gene sets that passed AbsGSEA filtering was explored using classical GSEA to identify genes that positively or negatively enriched in each pathway. Genes that are enriched in more than 100 pathways were selected. For the gene sets differentially regulated between healthy and severe asthma, the leading-edge analysis was performed to identify the biologically important gene subset. A shortlist of 35 genes was identified. The flowchart of the approach is shown in
Identifying the Confounding Factors.
From healthy group transcriptomic data, the inventors identified genes that are differentially expressed between males and females, large airway and small airway, in order to identify genes that differ due to the disease (severe versus healthy). Another data set was used to differentiate genes that are enriched in the epithelial brush from those that can be also enriched in the bronchial biopsy. The effect of the cell examined whether CD4 or CD8 lymphocytes during an acute attack or in a convalescent-phase were also identified. Individual sample raw mRNA expression samples for the identified genes were extracted from each dataset to confirm the findings and identify the factors that affect its level. For statistical purposes, the inventors used the Student T-test between asthma and control for the gene expression assuming p-value <0.05 as significant.
The methods of the present disclosure may utilize one or more non-genetic factors or markers to assess the status of the individual. “Non-genetic” in this context means not based directly on a nucleic acid-based assessment, but rather relating to other demographic, risk factor or symptom characteristics demonstrated by the subject. These criteria can typically be self-reported using a simple questionnaire.
Patient Selection.
The methods of the present disclosure utilized the collection of blood samples from healthy individuals and from asthmatic patients. Blood from 10 non-severe asthmatic patients (mild to moderate asthma) and from 10 severe asthmatic patients (fulfilling the criteria for asthma as per American Thoracic Society). Those patients were compared to 10 non-asthmatic volunteer subjects who had no recent infection of the respiratory tract and no history of allergy or asthma. Both patients and healthy controls were subjects were recruited in the Asthma Clinic in Rashid Hospital—Pulmonary medicine Department. The study was approved by the Ethics Committee of Dubai Health Authority and the University of Sharjah, and each subject gave written informed consent after a thorough explanation by the treating physicians and the researchers.
Blood samples (12 mL) utilized in the methods of the present invention were collected from each individual in EDTA-containing blood collection tubes (3 mL each) and transferred within two hours to the Sharjah Institute for Medical Research (SIMR) and PBMCs were isolated using techniques well known in the art. Twelve milliliters of Histopaque-1077 (Sigma, #10771, Germany) were added to a 50 mL centrifuge tube and brought to room temperature, then 12 mL of whole blood were carefully layered on top of the Histopaque and centrifuged at 400×g for precisely 30 minutes at room temperature. After centrifugation, the buffy layer interface was carefully collected with a Pasteur pipette and transferred to a clean 15 mL conical centrifuge tube. Subsequently, the cells were washed twice with isotonic phosphate-buffered saline solution and centrifuged at 250×g for minutes, after which the cell pellets were stored at −80+C. until analysis for protein, DNA and RNA extraction.
Any RNA isolation technique that does not select against the isolation of mRNA can be utilized for the purification of RNA from saliva. RNA was extracted using RNAeasy mini kit (Qiagen, #74106, Germany) as per manufacturer instructions. PBMC pellets were lysed first then passed through Qiagen QIAshredder columns (Qiagen, #79656, Germany) for homogenizing the lysates. RNA purity (OD260/280) and quantity were measured using Nanodrop 2000 (Thermo Scientific™, USA). Then, the purified RNA was reverse transcribed into cDNA using High Capacity cDNA Reverse Transcription (Applied Biosystems, #4375222, USA) as per manufacturer instructions. 5× Hot FIREPol EvaGreen qPCR Supermix (Solis BioDyne, Estonia) was used to quantify mRNA of the selected genes using QuantStudio3 (Applied Biosystems, USA).
Primer Design
The target amplicon was chosen based on a unique protein domain to the gene of interest using SMART (Simple Modular Architecture Research Tool). The specific protein domain sequence was back-translated to its corresponding nucleotide sequence using the Consensus CDS (CCDS) database. Then, the identified nucleotide sequence was used for primer design in Primer3 platform. The primers were subjected to in silico QC check for possible dimers. Thermodynamics for the RNA secondary structure of the amplicons were tested using MFOLD tool. The primers were examined for their performance using PCR, agarose gel electrophoresis and melt curves parameters in qRT-PCR.
Transcriptomic Analysis.
The targeted RNA-seq library preparation for the present invention was carried out using AmpliSeq (Thermo Fisher Scientific), which is designed over 21,000 distinct human RNA targets using a highly multiplexed amplification method. Each amplicon represents a unique gene. The average size of each amplicon is ˜15 bp. For library preparation, a barcoded cDNA library was first generated with SuperScript VILO cDNA Synthesis kit from 20 ng of total RNA treated with Turbo DNase (ThermoFisher Scientific). Then cDNA was amplified using Ion AmpliSeq technology to accurately maintain expression levels of all targeted genes. Amplified cDNA libraries were evaluated for quality and quantified using Agilent Bioanalyzer high sensitivity chip. Libraries were then diluted to 100 pM and pooled equally with two individual samples per pool. Pooled libraries were amplified using emulsion PCR on Ion Torrent OneTouch2 instruments (OT2) and enriched following manufacturer's instructions. Templated libraries were then sequenced on an Ion Torrent Proton sequencing system, using Ion PI kit and chip version 2.
The Resultant 2953 were used as a Gene Set to perform Gene Set Enrichment on a different Dataset to identify genes that are differentially expressed between Mild, Moderate and Severe Asthma as compared to the healthy epithelium (
Targeted RNA-seq using Ion AmpliSeq sequencing analysis of all samples was performed using the Ion Torrent Software Suite version 5.4. The alignment was carried out using the Torrent Mapping Alignment Program (TMAP). TMAP is optimized for Ion Torrent sequencing data for aligning the raw sequencing reads against reference sequence derived from hg19 (GRCh37) assembly. To maintain specificity and sensitivity, TMAP implements a two-stage mapping approach. First, four alignment algorithms, BWA-short, BWA-long, SSHA, and SUper-maximal Exact Matching were employed to identify a list of candidate mapping locations. A further alignment process is performed using the Smith-Waterman algorithm to find the final best mapping. Raw read counts of the targeted genes were performed using samtools (samtools view -c -F 4 -L bed_file bam_file). The quality control, including the number of expressed transcripts, is checked after Fragments Per Kilobase Million (FPKM) normalization. Differentially expressed gene (DEG) analysis was performed using R/Bioconductor package DESeq2 with raw read counts from RNASeq and AmpliSeq. Read count normalization was performed using the regularized logarithm (rlog) method provided in DESeq2. Genes with less than ten normalized read counts were excluded from further analysis. DEG determination was carried out using the LIMMA package.