This invention generally relates to chemical analysis of biological material, using nucleic acid products used in the analysis of nucleic acids, e.g., primers or probes for diseases caused by alterations of genetic material.
Sepsis is a life-threatening organ dysfunction due to a dysregulated host response to infection. Despite declining age-standardized incidence and mortality, sepsis remains a significant cause of health loss worldwide. Rudd et al., The Lancet, 395(10219), 200-211 (Jan. 18, 2020). Sepsis is treatable, and timely implementation of targeted interventions improves outcomes.
Sepsis is diagnosed clinically by the presence of acute infection and new organ dysfunction. Singer et al., JAMA, 315, 801-810 (February 2016). Unlike the previous concepts of septicemia or blood poisoning, the definition of sepsis extends across bacterial, fungal, viral, and parasitic pathogens. The definition focuses on the host response as the major source of morbidity and mortality. Bone et al., Chest, 101, 1644-1655 (1992). Globally, there were about 48.9 million cases of sepsis in 2017, with about 11.0 million total sepsis-related deaths worldwide, representing 19.7% (18·2-21·4). This number may be a substantial undercount. Rudd et al., The Lancet, 395(10219), 200-211 (Jan. 18, 2020). Sepsis results from an underlying infection, so sepsis is an intermediate cause of health loss. Because, according to the principles of the International Classification of Diseases (ICD), causes of death are assigned based on the underlying disorder that triggers the chain of events leading to death rather than intermediate causes, sepsis, when reported as the cause of death, are considered miscoded.
Thus, the global burden of sepsis is more significant than previously appreciated. There is substantial variation in sepsis incidence and mortality according to Healthcare Access and Quality Index (HAQ Index), Lancet, 390, 231-266 (2017)), with the highest burden in places that cannot prevent, identify, or treat sepsis. Further research is needed to understand these disparities and developing policies and practices targeting their amelioration. More robust infection-prevention measures should be assessed and implemented in areas with the highest incidence of sepsis and among populations on which sepsis has the most significant impact. The impact of sepsis is especially severe among children, so more than half of all sepsis cases worldwide in 2017 occurred among children, many neonates.
Physicians diagnose sepsis using clinical judgment under one or more clinical scores. The systemic inflammatory response syndrome (SIRS) approach assesses an inflammatory state affecting the whole body, which is the body's response to an infectious or non-infectious challenge. Jui et al. (American College of Emergency Physicians), Ch. 146: Septic Shock. in Tintinalli's Emergency Medicine: A Comprehensive Study Guide, 7th edition, (New York: McGraw-Hill, 2011). pp. 1003-14. Sepsis has both pro-inflammatory and anti-inflammatory components. The qSOFA approach simplifies the SOFA score by including only its three clinical criteria and by including any altered mentation. Singer et al., JAMA, 315, 801-810 (February 2016). qSOFA can easily and quickly be repeated serially on patients.
A culture of the bacterial infection confirms a diagnosis of sepsis. A culture diagnosis can be delayed by forty-eight hours and sometimes cannot be performed successfully. Clinical judgment sometimes misses sepsis.
Biomarkers are being developed for sepsis, but no reliable biomarkers exist. A 2013 review concluded moderate-quality evidence exists to support the use of the procalcitonin level as a method to distinguish sepsis from non-infectious causes of SIRS. Still, the level alone could not definitively make the diagnosis. Wacker et al., The Lancet Infectious Diseases. 13(5), 426-35 (May 2013). A 2012 systematic review found that soluble urokinase-type plasminogen activator receptor (SuPAR) is a nonspecific marker of inflammation and does not accurately diagnose sepsis. Backes et al. Intensive Care Medicine, 38(9): 1418-28 (September 2012).
There remains a need in the medical art for a better diagnosis of sepsis.
The concept of diagnostics is analogous to using a fishing lure to find a single protein, gene, or RNA sequence. The invention provides an improved concept, using a fishing net to obtain all the RNA data in a sample, and use computational biology to better sort through all the data (fish) to identify patients with sepsis and the bacteria causing the immune response. The invention provides an initial diagnostic for sepsis that can also monitor the indicia of treatment and recovery (bacterial counts reduce, physiology returns to steady-state). The invention can be used for many other hospital conditions, particularly those needing an intensive care unit stay with the attendant risk of bacterial infection, such as trauma, stroke, myocardial infarction, or major surgery.
In the first embodiment, the invention provides unmapped bacterial RNA reads to identify bacteria that cause sepsis. In the second embodiment, the invention provides unmapped viral reads to identify sepsis or viral reactivation. In the third embodiment, the invention provides the use of unmapped B/T V(D)J to identify sepsis. In the fourth embodiment, the invention provides Principal Component Analysis of RNA splicing entropy to identify sepsis. In the fifth embodiment, the invention provides RNA lariats to identify sepsis. In the sixth embodiment, the invention provides a Principal Component Analysis of gene expression, alternative RNA splicing, or alternative transcription start and end to identify sepsis.
In producing the listed embodiments, one of ordinary skill in the molecular biological art uses one or more of these steps.
The first step is for one of ordinary skill in the molecular biological art to obtain RNA sequencing from a body sample. In the seventh embodiment, the body sample is a bodily fluid sample. In the eighth embodiment, the bodily fluid sample is blood. In the ninth embodiment, the target is 100,000,000 reads/sample.
The second step is for one to align the RNA sequencing data (reads) to the genome of interest. In the tenth embodiment, the reads from a human sample are aligned to a human genome. In the eleventh embodiment, the reads from a mouse sample are aligned to a mouse genome.
The third step is to select the un-mapped reads and analyze the reads using a Read Origin Protocol (ROP).
In the first embodiment, the next step is to identify bacteria present in the sample. From the ROP, one of ordinary skill in the molecular biological art identifies bacteria present in the sample. In the twelfth embodiment, one of ordinary skill in the molecular biological art or medical art uses the identified bacteria to list potential causative organisms of sepsis (product).
In the second embodiment, from the ROP, the next step is to identify the viruses present in the sample. In the thirteenth embodiment, one uses the virus identified with PCA to identify likely sepsis samples.
In the third embodiment, from the ROP, the next step is to identify the T/B cell epitopes present in the samples. In the fourteenth embodiment, one uses the T/B cell epitopes identified with PCA to identify likely sepsis samples.
Alternatively, or in combination, in the third step, one selects the mapped reads and then uses a program that enables detection and quantification of alternative RNA splicing events to identity gene expression, RNA splicing events, alternative transcription start/end, or RNA splicing entropy. In a fifteenth embodiment, the program that enables detection and quantification of alternative RNA splicing events is Whippet. In the sixteenth embodiment, one uses the gene expression changes, RNA splicing events, and alternative transcription start/end with PCA to identify likely sepsis samples. In the seventeenth embodiment, one uses the RNA splicing entropy identified with PCA to identify likely sepsis samples.
In the fifth embodiment, from the gene expression, RNA splicing events, alternative transcription start/end, or RNA splicing entropy, the next step is for one to identify RNA lariats from the mapped reads. In the eighteenth embodiment, one uses the RNA lariats with PCA to identify likely sepsis samples.
In the nineteenth embodiment, the invention provides an output product with five plots comprising bacterial RNA reads, viral reads, B/T V(D)J epitopes, RNA splicing entropy, and RNA lariat embodiments described above and a list of likely bacteria causing the infection.
RNA sequencing data be used in several ways. (1) Identification of biomarkers. Rather than need to pick a subset to test for, RNA sequencing data can identify genes with increased expression that would correlate to biomarkers of interest. (2) Identification of new biomarkers. RNA sequencing data allows for analysis of processes such as RNA splicing. The method of RNA splicing entropy can be quantified and grouped according to a Principal Component Analysis into sick or not sick. RNA lariats can also be identified in sequencing data and used as a potential biomarker. All biomarkers can be followed over time to assess for resolution of the sepsis. (3) Use of un-mapped reads in sepsis. RNA sequencing typically aligns with the genome of reference (i.e., the human genome). Reads that are not aligned to the human genome are discarded (the percentage of un-mapped reads could itself be a biomarker). These un-mapped reads could be of two major potential interests. (4) Identification of the microbe causing the infection. The unmapped reads can be referenced to the genome of disease-causing microbes (bacteria, viruses, fungi, etc.) to identify the causative organism and start treatment earlier. Serial measurements can also assess the effectiveness of treatment.
The results presented show that mice exposed to trauma separated from controls using PCA. Similarly, mice that did not survive fourteen days post exposure clustered closely together on PCA. These results show a substantial difference in global pre-mRNA processing entropy in mice exposed to trauma vs. controls, and that pre-mRNA processing entropy is useful in predicting mortality.
Despite causing death in one out of five people in the world, there is not a single standard test to diagnose sepsis. Despite declining age-standardized incidence and mortality, sepsis remains a significant cause of health loss worldwide. Rudd et al., The Lancet, 395(10219), 200-211 (Jan. 18, 2020). Sepsis patients undergo the physiology common to patients in the intensive care unit: hypotension, tachycardia, hyperthermia, and hypoxia.
Delays in treatment for sepsis impact mortality. Early identification of the differences between clinically similar patients would allow for earlier interventions (surgery, antibiotics). Using RNA sequencing technology combined with computation biology techniques to understand RNA biology the differences in these two patients could be identified. Earlier prediction of complications would also allow for triage of patients to facilities equipped to deal with them and allow for better discussions regarding expected mortality and morbidity.
It takes days to get a final diagnosis for bacterial pathogen, since culturing of the bacteria is needed. Confirming bacteremia is done microbial blood culture, but the turnaround time can lead to a delay in diagnosis. Biron et al., Biomarker Insights. 10(Suppl 4), 7-17 (Sep. 15, 2015). Procalcitonin (PCT) has been shown to correlate more closely to onset and treatment of sepsis than C-reactive protein (CRP). Vijayan et al., J. Intensive Care (Aug. 3, 2017). Much work has been done with PCT as a predictor of sepsis before symptom onset. Dolin et al., Shock, 49(4), 364-70 (April 2018). PCT has low specificity for sepsis, and is elevated in cancers, autoimmune diseases, and other physiological stressors. Bloos & Reinhart, Virulence, 5(1), 154-60 (Jan. 1, 2014).
RNA sequencing data can identify the bacteria more quickly than culture. The drop in the cost of sequencing has refocused genetic analyses from DNA to RNA sequencing. Methods to analyze this data have improved. Stark et al., Nature Reviews Genetics (2019). Compared to DNA. RNA undergoes dynamic changes by transcription and post-transcriptional processing, providing unique insight into cellular activity. RNA reflects a broader source of infectious etiologies, given that both DNA and RNA viruses have RNA genetic material, whether in the genome or by transcription of mRNA. Patients with trauma who die or have complications are expected to have different changes in expression, alternative RNA splicing, and alternative transcription start/end compared to patients who survive and do not have a complication. The differences seen in RNA biology may correlate with injury severity or predict outcomes. This invention should help direct care in trauma patients when RNA sequencing speeds increase to allow for results that are available when needed for patients in the ICU (within one hour).
RNA sequencing data related to other processes (RNA splicing entropy, gene expression, viral counts, lariat counts, etc.) provide a signature that can identify patients with sepsis. A better understanding of RNA biology in the clinical scenario of critically ill sepsis patients can have a broad impact on biomedical science. When the information in RNA sequencing data can identify patients who have not resolved the immune response to the initial sepsis, outcomes can improve.
The number of unmapped reads aligning to viral pathogenic genomes can be a biomarker of critical illness. Patients with late death should have different gene expression, alternative RNA splicing (including RNA splicing entropy), and alternative transcription start/end as compared to patients with an early death. the genes with increased alternative RNA splicing (including RNA splicing entropy), and alternative transcription start/end are expected to be different in the patients who died late compared to those who died early. These identified genes provide insight into proteins not considered in trauma patients as potential biomarkers or targets of therapeutic intervention but point to pathological mechanism not appreciated or unclear.
RNA biology before the trauma should be able to predict survivors. Mice that survive to fourteen days should have less RNA biology changes compared to mice at the early time point. This are done across three distinct background mice to account for the heterogeneity of humans and the comparability of the two most common immunological/genetic mouse model strains used. As it relates to comparing samples across mouse strains, since gene expression. RNA splicing, and alternative transcription start/end are all basic molecular functions, the results remain similar across the multiple strains.
Identification of B and T cell epitopes from the unmapped reads could be a biomarker for sepsis. Critical illness decreases the diversity of these epitopes. A resolution could signal an improvement in clinical status. Losing some epitopes could indicate immune suppression seen in critical illness.
Alternative transcription starts and end is another biological process potentially influenced by sepsis. Current technology now allows us to identify changes in transcription with RNA sequencing data. Hardwick et al., Frontiers in Genetics. 10, 709 (2019); Cass & Xiao X, Cell Systems, 9(4), 23, 393-400.e6 (October 2019). The genes that have increased difference in alternative transcription start/end could be disease treatment targets. A change to the start or end of the RNA is likely to change the ultimate endpoint of that transcript. Understanding the changes in transcription start and end would better describe the ultimate result of proteins since that were thought to be transcribed and translated could have been transcribed (with changes in the start or end) which led to nonsense mediated decay or the translation of an alternative isoform.
Genes with significant alternative splicing and high entropy in the mouse after trauma may be target for intervention. This invention can better diagnose sepsis and the microbe causing the disease. Emergency room and critical care physicians can use the invention.
While proteins have traditionally been used to reflect inflammatory load, RNAs are more specific to certain etiologies and clinical outcomes.
High through-put sequencing technologies allows for coding and non-coding RNAs (ncRNA) as markers of disease risk and progression. Next-generation sequencing (NGS) quantifies RNAs by sequencing of complementary DNA (cDNA), allowing transcriptomic analysis of mRNAs, ribosomal RNAs (rRNA), and ncRNAs. Kukurba & Montgomery, Cold Spring Harb. Protoc., 2015(11), 951-69 (Apr. 13, 2015).
Coding and non-coding RNAs have been studied as biomarkers. Less attention has been on the portion of data produced (9-20%) via RNA-sequencing that is consistently discarded when it cannot be mapped to a reference genome. Mangul et al., ROP: Dumpster diving in RNA-sequencing to find the source of 1 trillion reads across diverse adult human tissues. Genome Biol., 19 (Feb. 15, 2018).
The discovery of serum-stable circulating miRNAs allows the use of cell-free miRNAs as biomarkers of disease. Benz et al., Int. J. Mol. Sci., 17(1) (Jan. 9, 2016); Wang et al., J. Cell Physiol., 231(1), 25-30 (2016). Elevated miR-133a levels in serum correlate to poorer prognosis in ICU patients. Tacke et al., Crit. Care Med., 42(5), 1096-104 (May 2014). Groups of miRNAs delineate between different infectious etiologies, such as S. aureus and E. coli. Wu et al., PLoS One, 8(10) (2013). The lack of standardization in measuring circulating miRNA expression affects reproducibility between analyses and limited its clinical applicability. Lee et al., Mol. Diagn. Ther., 21(3), 259-68 (June 2017).
Physiologic stress induces viral reactivation by impairing the immune response and upregulating cell cycle progression pathways such as MAPK and NF-κB. Walton et al., PLoS One, 9(6), e98819 (Jun. 11, 2014); Traylen et al., Future Virol., 6(4), 451-63 (April 2011). Secretion of pro-inflammatory cytokines, such as TNF-α, functions in reactivating latent cytomegalovirus (CMV) in patients that had undergone recent stress even absent systemic inflammation. Prösch et al., Virology, 272(2), 357-65 (Jul. 5, 2000). A combination of inflammatory challenges and immune cell dysregulation has been shown to contribute to an environment that both promotes viral reactivation and maintains viremia. Walton et al., PLoS One, 9(6), e98819 (Jun. 11, 2014).
In a traumatic shock EXAMPLE, C57BL6 mice were treated by sequential hemorrhagic shock followed by cecal ligation and puncture, which induces sepsis. RNA was extracted from cellular component of lung and immune cells in blood after discarding plasma and serum. Samples were collected from both healthy and critically ill mice and sequenced via NGS at Gene Wiz in South Plainfield, N.J., USA. Reads were aligned to mm9 genome using STAR and then unmapped reads were mapped to viral genomes via ROP. Dobin et al., Bioinformatics, 29(1), 15-21 (January 2013). Mangul et al., Genome Biol., 19 (Feb. 15, 2018). Two-sample t tests were conducted to compare number of viral reads in healthy versus critically ill mouse lung and blood.
For convenience, the meaning of some terms and phrases used in the specification, examples, and appended claims, are listed below. Unless stated otherwise or implicit from context, these terms and phrases have the meanings below. These definitions are to aid in describing particular embodiments and are not intended to limit the claimed invention. Unless otherwise defined, all technical and scientific terms have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. For any apparent discrepancy between the meaning of a term in the art and a definition provided in this specification, the meaning provided in this specification shall prevail.
Acute respiratory distress syndrome (ARDS) has the medical art-defined meaning. ARDS is a type of respiratory failure characterized by rapid onset of widespread inflammation in the lungs. Symptoms include shortness of breath, rapid breathing, and bluish skin coloration. Causes may include sepsis, pancreatitis, trauma, pneumonia, and aspiration.
Alternative splicing (AS) has the molecular biological art-defined meaning. RNA splicing is a basic molecular function that occurs in all cells directly after RNA transcription, but before protein translation, in which introns are removed and exons are joined. Alternative splicing or alternative RNA splicing, or differential splicing, is a regulated process during gene expression that results in a single gene coding for multiple proteins. Exons of a gene can be included within or excluded from the final, processed messenger RNA (mRNA) produced from that gene. The proteins translated from alternatively spliced mRNAs can contain differences in their amino acid sequence and, often, in their biological functions.
Aldo/keto reductase gene has the molecular biological art-defined meaning.
Base R is an R-based computer program.
Mann Whitney U tests has the statistical art-defined meaning. The Mann-Whitney U test (also called the Mann-Whitney-Wilcoxon (MWW), Wilcoxon rank-sum test, or Wilcoxon-Mann-Whitney test) is a nonparametric test of the null hypothesis that it is equally likely that a randomly selected value from one population is less than or greater than a randomly selected value from a second population. This test Can investigate whether two independent samples were selected from populations having the same distribution.
mountainClimber is a cumulative-sum-based approach to identify alternative transcription start (ATS) and alternative polyadenylation (APA) as change points. Unlike many existing methods, mountainClimber runs on a single sample and identifies multiple ATS or APA sites anywhere in the transcript. Cass & Xiao, Cell Systems, 9(4), 23, 393-400.e6 (October 2019).
Next Generation Sequencing (NGS) has the molecular biological art-defined meaning. NGS technology is typically characterized by being highly scalable, allowing the entire genome to be sequenced at once. Usually, this is accomplished by fragmenting the genome into small pieces, randomly sampling for a fragment, and sequencing it using one of a variety of technologies.
Principal Component Analysis (PCA) has the computer-art and molecular biological art-defined meaning. Principal component analysis is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components.
Read origin protocol (ROP) has the computer-art meaning of is a computational protocol that aims to discover the source of all reads, including those originating from repeat sequences, recombinant B and T cell receptors, and microbial communities. The Read Origin Protocol was developed to determine what the unmapped reads represented. Mangul al., Genome Biology 19, 36 (2018). Recent development of Read Origin Protocol (ROP) has demonstrated that unmapped reads align to bacterial, viral, fungal, and B/T rearrangement genomes.
Read has the molecular biological art-defined meaning of reading sequencing results to determine nucleotide base structure.
Sepsis has the medical art-defined meaning of a life-threatening condition that arises when the body's response to infection injures its tissues and organs. Bone et al., Chest, 101, 1644-1655 (1992); Singer et al., JAMA, 315, 801-810 (February 2016).
STAR aligner is the Spliced Transcripts Alignment to a Reference (STAR), a fast RNA-seq read mapper, with support for splice-junction and fusion read detection. STAR aligns reads by finding the Maximal Mappable Prefix (MMP) hits between reads (or read pairs) and the genome, using a Suffix Array index. Different parts of a read can be mapped to different genomic positions, corresponding to splicing or RNA-fusions. The genome index includes known splice-junctions from annotated gene models, allowing for sensitive detection of spliced reads. STAR performs local alignment, automatically soft clipping ends of reads with high mismatches. Dobin et al., STAR: Ultrafast universal RNA-seq aligner. Bioinformatics, 29(1), 15-21 (January 2013).
Treatment for sepsis has the medical-art recognized meaning. Sepsis is treatable, and timely implementation of targeted interventions improves outcomes. The Mayo Clinic informs the public that several medications are used in treating sepsis and septic shock. They include antibiotics. Broad-spectrum antibiotics, which are effective against a variety of bacteria, are usually used first. After learning the results of blood tests, a doctor may switch to a different antibiotic that's targeted to fight the specific bacteria causing the infection. They include intravenous fluids and vasopressors. Other medications include low doses of corticosteroids, insulin to help maintain stable blood sugar levels, drugs that modify the immune system responses, and painkillers or sedatives.
Treatment for COVID-19 has the medical-art recognized meaning. Corticosteroids can be therapeutic. See Prescott & Rice, Corticosteroids in COVID-19 ARDS: Evidence and hope during the pandemic. JAMA, 324, 1292-1295 (2020). Other treatments are known by persons having ordinary skill in the medical art. See Waterer & Rello, Steroids and COVID-19: We need a precision approach, not one size fits all. Infectious Diseases and Therapy (2020). See also Beigel et al., Remdesivir for the treatment of Covid-19—Preliminary Report. New England Journal of Medicine (2020).
Treatment for Acute respiratory distress syndrome (ARDS) has the medical-art recognized meaning. Corticosteroids can be therapeutic. See Prescott & Rice, Corticosteroids in COVID-19 ARDS: Evidence and hope during the pandemic. JAMA, 324, 1292-1295 (2020). Other treatments are known by persons having ordinary skill in the medical art.
V(D)J recombination has the molecular biological art-defined meaning. V(D)J recombination occurs in developing lymphocytes during the early stages of T and B cell maturation, involves somatic recombination, and results in the highly diverse repertoire of antibodies/immunoglobulins and T cell receptors (TCRs) found in B cells and T cells, respectively.
Whippet (OMICS_29617) is a program that enables detection and quantification of alternative RNA splicing events of any complexity with computational requirements compatible with a laptop computer. Whippet applies the concept of lightweight algorithms to event-level splicing quantification by RNAseq. The software can facilitate the analysis of simple to complex AS events that function in normal and disease physiology. Alternative splicing events with high entropy are identified using Whippet. Sterne-Weiler et al., Molecular Cell, 72, 187-200.e186 (2018).
Mouse strains. Mice are purchased from The Jackson Laboratory. C57BL/6J, the most popular mouse model used, exhibits a Th1/more pro-inflammatory phenotype. C57BL/6J is also the background of numerous knock out animals. BALB/cJ is also another commonly used mouse and can be the background of analyses with knockout animals but has more of a Th1/anti-inflammatory predominant repose phenotype. The CAST mouse is derived from wild mouse and genetically different from common laboratory mice. Using these three strains adjusts for the heterogeneity seen in humans.
Mouse model of sepsis: cecal ligation and puncture (CLP). A mouse model of hemorrhagic shock followed by the induction of sepsis by cecal ligation and puncture induces severe sepsis. Lomas-Neira et al., Shock, 45(2), 157-65 (2016)); Monaghan et al., Mol Med., 24(1), 32 (Jun. 18, 2018); Wu et al., PLoS One, 8(10) (2013); Monaghan et al., Annals of Surgery, 255, 158-164 (2012). Anesthetized, restrained mice in supine position catheters are inserted into both femoral arteries. Mice are bled over a 5-10-minute period to a mean blood pressure of 30 mmHg (±5 mmHg) and kept stable for 90 minutes. To achieve this level of hypotension, the mice have one mL of blood withdrawn. One mL of blood is approximately 50% of their blood volume so this correlates to class 4 hemorrhagic shock in humans. Mice are resuscitated intravenously (IV) with Ringers lactate at four times drawn blood volume. Sham hemorrhages are performed as a control in which femoral arteries ligated, but no blood is drawn to mimic the tissue destruction. The following day, sepsis is induced as a secondary challenge by cecal ligation and puncture. The timing of this secondary challenged is based on previous findings that hemorrhagic shock followed twenty-four hours by the induction of sepsis produced results in line with critical illness such as altering PaO2 to FIO2 ratios. The mouse model uses a double hit of hemorrhagic shock followed by cecal ligation and puncture correlates to a missed bowel injury in humans after hemorrhagic shock. This mouse model correlates with an injury severity score (ISS) of twenty-five. The dual challenge of hemorrhagic shock followed by septic shock is in line with the sepsis patients who are critically ill. Sometimes patients present with bleeding from wounds and a bowel injury missed upon initial assessment.
Sample sizes for these assays are based upon results from the inventor's previous work looking at the alternative splicing of sPD-1 and an effect size of Cohen's d=2.85 standard deviations difference between groups was calculated. With such a large effect size, power analysis poorly justifies sample size since, if the effect size is tenable, it would be exceedingly rare for assays of any sample size to fail to reach statistical significance. Small sample sizes provide poor point estimates and may be very unstable. the inventors chose a sample size of six mice per group based on feasibility and hoping to provide a reasonable point estimate for each group.
Mice of both sexes are used, because there are significant sex differences in the response to bleeding from trauma. Deitch et al., Annals of Surgery, 246(3), 447-53; discussion 53-5 (2007).
Human subjects. Patients are recruited from the Trauma Intensive Care Unit (TICU) at Rhode Island Hospital with Institutional Review Board approval and consent. The patient population at Rhode Island Hospital (a level 1 trauma center) is sufficient for this EXAMPLE. Over 3700 trauma patients were admitted to the hospital in 2018. The TICU admitted 765 patients in 2018. This would cause over 3000 patients admitted to the intensive care unit over the 4-year project. Using the advanced technology of the hospital's electronic health records (EPIC) combined with the mandated trauma registry there are streamlined efforts to recruit and retain patients. Since the mouse model correlates to an injury severity score (ISS) of twenty-five, the goal is to ensure that the average ISS for all the patients is twenty-five. Minimal risk to the patient is maintained since there is no direct benefit; the blood collected are less than 50 mL over an 8-week period and not collected more than twice a week. Blood samples from patients are taken on admission (25 mL) and during the TICU stay when a complication is developed (25 mL). This should cause the maximum for the initial 8-week period after the trauma. When the patient is recovered, at least 8 weeks after the last blood draw, a final blood draw 50 mL of are done in the outpatient setting. A power analysis was done based upon previous results from human patients. The effect size of Cohen's d=0.8 using a power of 80% and alpha of 0.05 the inventors calculated a sample size of twenty-six per group. The mortality of patients in the TICU is 5%. To enroll twenty-six patients who die after trauma, the inventors need 520 TICU patients (26/0.05=520). No enrollment is planned in the last six months to ensure adequate follow up, data collection and analysis. Fourteen % of patients in the TICU have complications after trauma. Due to the correlation to the mouse model of an ISS of twenty-five, the average ISS for the enrolled patients is targeted at twenty-five. This system recruits some patients who are not used. These patient samples are banked and not sent for RNA sequencing. After twenty-six patients who die and twenty-six patients with a complication are enrolled. The entire set of patients has an average ISS of twenty-five then recruitment conclude.
Where patients are being recruited, variables such as age, weight, and medical co-morbidities are collected and compared across groups. If these variables are different (t test or rank sum), these factors are adjusted for in the analysis by regression.
In the human studies, both sexes are recruited and analyzed in the GTEx data set. Age, weight, and other health problems are constant in the mouse assays.
Sample collection and sequencing. Mouse blood and lung samples were obtained as described. Monaghan et al., Annals of Surgery, 255, 158-164 (2012). Data for humans was obtained from GTEx by their protocols. RNA was extracted using the MasterPure Complete DNA/RNA Purification kit (epicenter, Madison Wis., USA) followed by the Globin Clear Kit (ThermoScientific, Waltham, Mass., USA). RNA was then sent to Genewiz (South Plainfield, N.J., USA) for sequencing as 1400 ng RNA in forty μL of fluid.
The GTEx Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS and the data used for the analyses were obtained from the GTEx Portal and dbGaP accession number phs000424.v6.p1.
Cloud based computing. All computational biology work is performed on cloud-based computing by Lifespan-RI Hospital approved and supported Microsoft Azure environment. This server manages all large data sets from RNA sequencing. An intentional decision was made to use cloud-based computing for this project. Due to the depth of sequencing that is needed for RNA splicing analysis (100 million reads vs. forty million), more data is generated from both sequencing and analysis (a small study generated one terabyte of sequencing data and another terabyte from the alignment to the genome). With such a large amount of data predicted available for the EXAMPLE, the ability to expand and contract the storage space and computing power in the cloud is the ideal choice. This server stores and analyzes data from both mouse and human samples. Since RNA sequencing data is always identifiable, the data from humans are treated as though it is protected health information (PHI), even though none of the typical identifiers (such as name, date of birth, etc.) are associated with the data. The server was created in collaboration with the Information Technology department at Rhode Island Hospital to ensure data security. The cloud server is only accessible through a hospital virtual desktop and data are saved only to the Azure server or a hospital computer. Data are encrypted while stored, and when in transit to or from the hospital. Any link to typical identifiers (name, date of birth, etc.) are kept separate from the sequencing data. The cloud-based server allows for large data analysis with computing and storage needs changing on a per-use basis. The Azure server is Linux based and uses programming in R and Python. The following pipeline encompasses the typical analysis: differential expression, RNA analysis is done with Whippet. This also includes an entropy measure, and genes of interest undergo GO term analysis. Genes with alternative transcription start and end sites identified through Whippet are correlated with findings from the mountainClimber analysis.
Computational analysis and statistics. RNA sequencing data from the mouse was first checked for quality using FASTQC. RNA-sequencing data collected from the GTEx consortium, and the mouse ARDS model was analyzed with the Whippet software for differential gene processing. Alternative transcription events are those events identified by Whippet as ‘tandem transcription start site,’ ‘tandem alternative polyadenylation site,’ ‘alternative first exon,’ and ‘alternative last exon.’ Alternative RNA splicing events are those events labeled ‘core exon.’ ‘alternative acceptor splice site,’ ‘alternative donor splice site,’ and ‘retained intron.’ Alternative mRNA processing events were determined by a log 2 fold change of greater than 1.5+/−0.2. Statistical significance was calculated by the chi-square p-value of a contingency table based on 1000 simulations of the probability of each result.
Gene ontology (GO) was assessed using The Gene Ontology Resource Knowledgebase. Ashburner et al., Nature Genetics, 25, 25-29 (2000); The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Research, 47. D330-d338 (2019). Genes from the analyses were entered and outputs displayed. Outputs from gene ontology do not correlate with actual increase or decrease in a gene's expression but are related to expected based upon the set of genes entered.
Blood sample collection. Blood samples are collected on day 0 of ICU admission. Clinical data including COVID specific therapies was collected prospectively from the electronic medical record and participants were followed until hospital discharge or death. Ordinal scale can be collected as described by Beigel et al., New England Journal of Medicine (2020); along with sepsis and associated SOFA score, and the diagnosis of ARDS. See Singer et al., The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). JAMA, 315: 801-810 (2016); Ferguson et al. The Berlin definition of ARDS. Intensive Care Medicine, 38: 1573-1582 (2012).
RNA extraction and sequencing. Whole blood can be collected in PAXgene tubes (Qiagen, Germantown, Md.) and sent to Genewiz (South Plainfield, N.J., USA) for RNA extraction, ribosomal RNA depletion and sequencing. Sequencing can be done on Illumina HiSeq machines to provide 150 base pair, paired-end reads. Libraries were prepared to have three samples per lane. Each lane provided 350 million reads ensuring each sample had >100 million reads.
Computational Biology and Statistical Analysis. All computational analysis can be done blinded to the clinical data. The data can be assessed for quality control using FastQC. Andrews. A quality control tool for high throughput sequence data. FastQC (2014). RNA sequencing data can be aligned to the human genome utilizing the STAR aligner. Dobin et al., Bioinformatics (Oxford, England), 29, 15-21 (2013). Reads that aligned to the human genome can be separated and called ‘mapped’ reads. Reads that do not align to the human genome, which are typically discarded during standard RNA sequencing analysis, were identified as ‘unmapped’ reads. The unmapped reads then align to the relevant comparator and counted per sample using Magic-BLAST. See Boratyn et al., BMC Bioinformatics, 20, 405 (2019). The unmapped reads were further analyzed with Kraken2. See Wood, Lu, & Langmead, Genome Biology, 20, 257 (2019). The analysis used the PlusPFP index to identify other bacterial, fungal, archaeal, and viral pathogens. See Kraken 2/Bracken Refseq indexes maintained by BenLangmead, which uses Kutay B. Sezginel's modified version of the minimal GitHub pages theme.
Reads that align to the human genome, the mapped reads, also can undergo analysis for gene expression, alternative RNA splicing, and alternative transcription start/end by Whippet. See Sterne-Weiler et al., Molecular Cell, 72, 187-200.e186 (2018). When comparisons are made between groups (died vs. survived) differential gene expression can be set with thresholds of both p<0.05 and +/−1.5 log 2 fold change. Alternative splicing was defined as core exon, alternative acceptor splice site, alternative donor splice site, retained intron, alternative first exon and alternative last exon. Alternative transcription start/end events can be defined as tandem transcription start site and tandem alternative polyadenylation site. Alternative RNA splicing and alternative transcription start/end events can be compared between groups. See Sterne-Weiler et al., Molecular Cell, 72, 187-200.e186 (2018). Significance was set at great than 2 log 2 fold change as described by Fredericks et al., Intensive Care Medicine (2020). Genes identified from the analysis of mapped reads can be evaluated by GO enrichment analysis (PANTHER Overrepresentation released 20200728). See Mi et al. Nature Protocols, 8, 1551-1566 (2013).
Whippet can generate an entropy value for every identified alternative splicing and transcription event of each gene. These entropy values are created with no groups used in the gene expression analysis. To visualize this data a principal component analysis (PCA) can be conducted to reduce the dimensionality of the dataset and to obtain an unsupervised overview of trends in entropy values among the samples. Raw entropy values from all samples can be concatenated into one matrix and missing values were replaced with column means. Mortality can be overlaid onto the PCA plot to assess the ability of these raw entropy values to predict this outcome in this sample set. This analysis was done in R (version 3.6.3).
Kraken 2. The following tools are compatible with both Kraken 1 and Kraken 2. Both tools assist users in analyzing and visualizing Kraken results. Bracken allows users to estimate relative abundances within a specific sample from Kraken 2 classification results. Bracken uses a Bayesian model to estimate abundance at any standard taxonomy level, including species/genus-level abundance. Pavian has also been developed as a comprehensive visualization program that can compare Kraken 2 classifications across multiple samples. KrakenTools is a suite of scripts to help analyze Kraken results. For more information, a person having ordinary skill in the biomedical art can refer to Wood, Lu, & Langmead, Improved metagenomic analysis with Kraken 2, Genome Biology (Nov. 28, 2019).
The following EXAMPLES are provided to illustrate the invention and should not be considered to limit its scope.
Because bacterial infections are a common cause of morbidity in trauma patients, unmapped reads that align with bacteria are useful for the diagnosis and treatment of trauma patients. Unmapped reads from RNA sequencing data provide a valuable tool for the trauma patient. The decrease in the number of bacterial reads in the blood may be due to increased immune response. Some bacteria keep constant levels between groups, which signifies a virulent pathogen.
The technique of RNA sequencing has resulted in creating massive amounts of data. The first step with public RNA sequencing data is usually to align the reads to the reference genome of interest. RNA sequences that do not align with the reference genome (10-30%) are usually discarded when they cannot be mapped.
The inventors used a mouse model of hemorrhagic shock followed by cecal ligation and puncture. The inventors isolate RNA from blood and lung samples and had the RNA sequenced using standard techniques. They compare RNA from the test mice to sham controls. They analyze the RNA data that did not map to the mouse genome. Unmapped reads aligned to common bacterial pathogens, including Acinetobacter baumannii, Escherichia coli, Klebsiella pneumoniae, Pseudomonas aeruginosa, Staphylococcus aureus, Streptococcus agalactiae, Streptococcus pneumoniae, and Streptococcus pyogenes. The inventors also identify specific genes with high read counts.
In one assay, the blood samples from the test mice exposed to trauma had fewer reads mapping to bacteria (365,974) as compared to the control mice (902,063, p=0.02). In the lung, the bacteria counts were similar. Despite an overall decrease in mapped bacterial RNA reads in the test mice, the three Streptococcus species and Staphylococcus aureus had a similar number of reads mapping between the test mice and the control mice. The most common RNA read mapped to aldo/keto reductase gene from group B strep (82793634[uid]). There was more expression of this gene in the blood of mice after trauma (15,096) compared to controls (3671, p=0.006). This difference was not seen in the lung compartment (13,691 vs. 15,996, p=0.24). In the blood of the test mice, most of the identified bacterial sequences were reduced in counts compared to the blood of the control mice (43 vs. 16).
Unmapped data have been aligned to regions in the genomes of viruses. In critical illness, not only does the percentage of unmapped reads suggest a biomarker, but also the alignment of unmapped reads to some viral genomes. The percentage of unmapped reads in these organs during periods of critical illness can be a biomarker of severity and outcomes.
To assess the impact of critical illness on unmapped reads and their composition, the inventors expose mice (e.g., C57BL6 mice) to sequential treatment of hemorrhagic shock followed by sepsis. This treatment produces indirect acute respiratory distress syndrome (ARDS). RNA is extracted from lung and blood samples and sequenced via next-generation RNA-sequencing. Reads are aligned to the mm9 reference genome. The sources of unmapped reads were aligned by Read Origin Protocol (ROP). Changes in the viral signature of the unmapped reads are different when comparing blood to the lung.
In a second assay, the blood samples of critically ill mice averaged 31.9 million reads versus 32.1 million reads in healthy mice, and lung samples of critically ill mice averaged 33 million reads versus 33.7 million reads in healthy mice. The blood of critically ill mice had an average of 1.5 million unmapped reads (4.74%), more than the average 52,000 unmapped reads (0.16%) in the blood of healthy mice (p=0.000082). The lungs of critically ill mice had, on average, 194,331 unmapped reads (0.58%), which was more than the average 130,480 unmapped reads (0.39%) seen in the lungs of healthy mice (p=0.031665). In blood samples, unmapped reads from critically ill mice were less likely to be viral than healthy mice (average 3480 in critically ill vs. 4866 in healthy, p=0.025955). In lung samples, unmapped reads from critically ill mice were more likely to be viral than those from healthy mice (average 6959 in critically ill vs. 3877 in healthy, p=0.031959). The results were notable for higher viral loads in lungs of critically ill mice, showing that viral RNA loads can be a biomarker of critical illness.
Human correlates can translate into a clinical setting.
In immune systems, V(D)J recombination allows for a diversity of antibodies in B cells and T cell receptors in T cells. During critical illness, the variety of these recombination events reduces, but recovers. RNA sequencing better characterizes V(D)J recombination events. RNA sequencing shows more diversity in critical illness compared to what was described previously. B and T cell composition could prove to be an important marker in critical illness and predicting outcomes of sepsis.
The inventors subject mice (e.g., C57BL6 mice) to sequential of hemorrhagic shock followed by sepsis. This induces acute respiratory distress syndrome (ARDS). Lung and blood samples are collected. RNA from the samples is sequenced by next-generation sequencing. Reads from critically ill and healthy mice are aligned to GRCm38 annotation and then mapped to the V(D)J annotation by Read Origin Protocol (ROP).
In a third assay, the inventors recovered ˜thirty million reads were recovered from RNA-seq data generated from lung tissue of critically ill mice and healthy controls. Alignment with STAR aligner showed an average of 7.77% unaligned reads in the healthy control, and 8.78% unaligned reads in the samples extracted from critically ill mice. Unmapped reads then underwent a secondary alignment to assay for V(D)J recombinants. Healthy mice have an average of 629 recombinant epitopes, whereas critically ill mice had an average of only 208 recombinant epitopes. Assays were done in triplicate with littermates.
Analysis of unmapped reads shows that critical illness inhibits the generation of B cell and T cell epitopes by the immune system during critical illness. Although the percentage of unmapped reads between healthy mice and critically ill mice was not significant, the composition of B and T cell epitopes differs vastly in critically ill mice.
Next Generation Sequencing is useful for the diagnosis and treatment of diseases.
The effect of alternative RNA splicing before translation has not been studied much, especially in the critically ill patient. Previous work showed an association between cancer and the level of global alternative splicing entropy. Elias & Dias, Cancer Microenvironment, 1(1),131-9 (2008); Ritchie et al., PLoS Computational Biology, 4(3), e1000011 (2008). RNA splicing entropy is correlated with acute respiratory distress syndrome (ARDS) across multiple tissues. Evaluating splicing entropy can provide insights about biological processes and gene targets in the critical illness setting.
The inventors induce a mouse model of ARDS by subjecting mice to hemorrhagic shock, followed by cecal ligation and puncture. Blood and lung samples are collected from three mice undergoing ARDS and three sham controls. RNA is purified.
Next-generation RNA sequencing is performed. Alternative splicing (AS) entropy levels are determined using Whippet (v 0.11) on Julia (v 0.6.4). Principal Component Analysis (PCA) is conducted using base R (v 3.4.0). Alternative splicing events with a proportion of spliced in values between 0.05 and 0.95 are analyzed. A threshold of 1.5 is applied to determine the percentage of high entropy events. Proportions of high entropy events across tissues and experimental groups are compared using Mann Whitney U tests.
In a fourth assay, Principal Component Analysis of the blood samples was performed. Samples clustered based on tissue type and ARDS status on a Principal Component Analysis plot This result suggested that splicing entropy can serve as a biomarker for ARDS status. The inventors observed differential levels of splicing entropy across tissue types, with the most entropy in the lung.
This EXAMPLE demonstrates the collecting of RNA sequencing data from a complex tissue (blood), rather than a cell line, and uses computational biology techniques to analyze the data.
RNA splicing occurs directly after DNA transcription, but before protein translation. RNA splicing by a two-step esterification process with the formation of an intermediary lariat formed by the intron and joining of the 5′ and 3′ splice sites. Introns typically degrade rapidly.
The biology of lariats has recently been identified as important as it relates to viral biology. The DBR1 gene encodes for the only RNA debranching enzyme. Mutations of DBR1 increase susceptibility to HSV1 and increase viral brainstem infections in humans. Assessing the RNA lariat counts in the critically ill trauma patients could predict poor outcomes or prolonged immune suppression. The inventers undertook the mouse model of critical illness (CLP). Assessing for the resolution or return to a healthy level of lariat counts could be a marker to identify immune suppression or those patients at risk for a complication.
The identification of lariats from RNA sequencing data had been difficult. The William G. Fairbrother laboratory created a method to count lariats from RNA sequencing data. Taggart et al., Nature Structural & Molecular Biology, 19, 719-721 (2012).
In a fifth assay, the preliminary data suggests that in the critically ill mouse, the typical metabolism of RNA lariats is changed, resulting in an accumulation of lariats in the blood. The inventors found that the blood of mice with the critical illness have higher lariat counts compared to the control mice.
Lungs from healthy mice had an average of 3877 viral reads. Lungs from critically ill mice had on average 6956 viral reads. Blood from healthy mice had 4866 viral reads. Blood from critically ill mice had 3480 viral reads. Lungs from critically ill mice were more likely to have unmapped reads originating from viral genomes when compared to lungs from healthy mice (0.36% in critically ill, 0.21% in healthy; p-value=0.032). This could be due to critical illness leading to a compromised immune response that allows for viral reactivation and a higher viral load in lungs of critically ill mice. Traylen et al., Future Virol., 6(4), 451-63 (April 2011).
Blood of healthy mice were more likely to have unmapped reads originating from viral genomes than blood of critically ill mice (0.05% in critically ill, 0.11% in healthy; p-value=0.026). There are several explanations for why healthy mice could have increased viral loads in the blood compared to critically ill mice. Mature lymphocytes are constantly recirculating through blood and lymphatic organs. Charles et al., Immunobiol. Immune Syst. Health Dis. 5th Ed. (2001). In critical illness, the release of pro-inflammatory mediators may compound the intensity of immune surveillance, as documented in patients with systemic inflammatory response syndrome (SIRS). Duggal et al., Science Reports, 8(1), 1-11 (Jul. 5, 2018).
Change in leukocyte populations in critically ill mice may lead to a higher number of RNA-producing polymorphonucleocytes (PMN) in blood, which reduces the total viral RNA signal in critically ill mouse blood. Therefore, steps are taken to enrich for lymphocytes and monocytes to reduce RNA reads from PMNs.
This traumatic shock EXAMPLE demonstrated an association between critical illness and higher viral loads in mouse lung, lending promise to the clinical use of viral loads as a marker of critical illness.
More should be known about RNA biology, specifically alternative RNA splicing, in the sepsis population.
Over 90% of human genes with multiple exons require alternative splicing events to produce functional proteins. Pan et al., Nature Genetics 40, 1413-1415 (2008). RNA splicing creates a large natural source of variation of the transcribed gene to the produced protein product. RNA splicing is under exquisite control under normal conditions. Fever, hypothermia, and osmotic stress from fluid shifts can influence RNA splicing in vitro and change RNA splicing, altering protein expression. Gultyaev et al., TSitologiia i Genetika, 48, 40-44 (2014); Lemieux et al., PloS One 10, e0126654 (2015); Mahen et al., PLoS Biology 8, e1000307 (2010). Acidosis influences RNA splicing. Elias & Dias, Cancer Microenvironment, 1 131-139 (2008). Hypoxia also influences RNA splicing. Romero-Garcia et al., Experimental Lung Research 40, 12-21 (2014); Kasim et al., The Journal of Biological Chemistry, 289, 26973-26988 (2014). The effects of physiologic stress on RNA splicing should be better known. The pathological significance of changes induced RNA splicing process and proteins should be better understood.
This EXAMPLE shows the use of deep RNA sequencing data using computational biology methods (RNA splicing entropy, lariat counts, viral identification, and B and T cell epitope creation) and apply these methods to three distinct data sets: mouse of different strains undergoing sepsis, deceased sepsis patients who participated in the GTEx project, and human sepsis patients.
RNA splicing entropy after sepsis. RNA splicing is a basic molecular function in all cells. This EXAMPLE uses the global index/marker of RNA splicing called ‘RNA splicing entropy’ a calculation of the precision of RNA splicing typically occurring. The entropy and thus the disorder, is maximal when the probability of all events P (xi) is equally likely and the outcome is most uncertain. This calculation is done for each type of alternative splicing event: skipped exon, retained intron, alternative donor (3′ splice site), and alternative acceptor (5′ splice site). The alternative splicing events with high entropy are identified using Whippet.
A lower percentage of RNA slicing entropy may predict increased mortality or more complications, particularly infections, in patients with sepsis. Previous work on cancer samples has shown that RNA splicing entropy is increased in the tumor compared to the healthy tissue in many cancer types. From the preliminary data in mice with and without ARDS after sepsis, RNA splicing entropy is less in the blood, 7.7% vs 10.7%, p=0.1. RNA splicing entropy was calculated for total white blood cell components of mice with critical illness caused by hemorrhage and cecal ligation and puncture and compared to controls. The RNA from blood and the lungs of mice was extracted, processed, and then subjected to deep RNA sequencing.
Obtaining this data demonstrates the ability to isolate RNA samples from the target organ tissues of interest in the mouse model system. This EXAMPLE demonstrates the ability to process the complex data using computational biology and custom scripts that result from RNA sequencing. This preliminary data suggests that the process of RNA splicing in critical illness is different compared to the controls, changes in RNA splicing entropy may be a reflection/response to or a mechanism driving pathological processes that drive mortality and morbidity in patients with sepsis. Genes with significant alternative splicing and high entropy in the mouse after sepsis may be target for intervention. These genes of interest are identified using machine-learning techniques and compared across both humans and mice.
Assessment of viral activity after sepsis. In the initial assessment of RNA sequencing data, the reads are aligned to the genome of the species the sample came from. The unmapped reads can account for up to 20% of the data and this data is typically discarded. From this Read Origin Protocol analysis of multiple data sets (including GTEx data), the inventors found their protocol accounted for 99.9% of all reads. The data typically discarded was then analyzed in a seven-step process. Two of those steps are of particular interest because of the relevance to critical care: Viral reads and B and T cell receptor rearrangement.
Identification of viruses after sepsis is a marker of immune suppression since there is data suggesting sepsis re-activates herpes infections. Cook et al., Critical Care Medicine, 31, 1923-1929 ((2003)). Much current research is focused on these mechanisms and interventions. Viral counts could correlate with immune suppression or complications. This is important because of the re-activation data. RNA sequencing data from the lungs of control mice showed fewer viral reads (3877) compared to mice after sepsis (6956, p=0.032). In the blood the opposite was true. Control had 4866 counts versus sepsis with 3480 counts (p=0.026). This difference between tissue types could be due to a multitude of reasons, such as latent infections, like CMV, in the lung. Because blood is the most accessible tissue type, the efforts for the human samples should focus on the blood.
Assessment of immune cell epitopes after sepsis. During critical illness, the immune system is activated and likely creating new receptors to respond to challenges/pathogens. These epitopes come from lymphocytes, known to be reduced in sepsis with resolution to normal levels linked to recovery. Heffernan et al., Critical Care, 16, R12 (2012). While the count of lymphocytes themselves is useful, measuring the number and diversity of the epitopes could provide further insights into immune suppression after sepsis.
In the mouse model, preliminary data shows fewer epitopes in the lung of mice after sepsis, compared to control. This demonstrates the ability to analyze data from a mouse model and characterize B and T cell epitopes via computational methods. Like lymphocytes, the production of epitopes may reduce. Recovery should correlate with a return to normal immune state.
The above-described methods to assess for immune suppression in sepsis patients by analysis of RNA sequencing data to understand RNA biology are applied to these samples.
For analysis of RNA splicing entropy, lariat counts, viral identification, and B and T cell epitope creation in the mouse model, using pilot data, using forty mice (twenty critically ill, twenty healthy controls) should have 80% power to detect a difference at a two-tailed alpha of 0.05. This method is used for each of the three mouse variants.
At the time points of twenty-four hours after cecal ligation and puncture and fourteen days after cecal ligation and puncture, mice are sacrificed, and organs procured. Organs to be collected are brain, lung, heart, kidney, liver, spleen, and blood. RNA from these samples is isolated as described below. The time point of twenty-four hours after CLP is selected as that is the time of most significant organ dysfunction. The time point of fourteen days is selected, since this is the point at which a mouse would be considered a survivor after this challenge.
RNA from blood samples in the mouse are processed using the MasterPure Complete RNA Purification (epicenter, Madison Wis., USA) kit for mice. Due to the high concentration of globin RNA in blood samples, these samples can then be further processed with the GLOBINclear Kit (epicenter, Madison Wis., USA). From blood one of skill in the molecular biological art can get 30-50 nanograms per microliter, with a total blood volume isolated from the mouse of about one mL. RNA from lung, heart, brain, kidney, liver, and spleen samples are extracted using MasterPure Complete RNA Purification kit for mice. After RNA samples are processed, the RNA was sequenced using standard techniques, for example by Deep RNA sequencing with a goal of 100,000,000 reads per sample. All samples should require at least 1400 nanograms of RNA for deep sequencing.
Human samples. Patients are recruited under Institutional Review Board approval and after consent is obtained. Blood samples are obtained from pre-existing catheters to minimize the risk. Blood samples are collected on admission and serially while the patient is in the intensive care unit. Samples are collected in PAXgene tubes and stored in an −80 C freezer until isolation of RNA for sequencing is needed. RNA sequencing is done in batches to minimize cost. For this experiment, it is expected 300 sepsis patients are recruited (average of 100 the first three years to allow analysis over the final two years of the project).
Control samples are obtained from healthy patients undergoing routine laboratory analysis at outpatient facilities. Blood from these patients is collected in PAXgene tubes and stored in an −80 C freezer until isolation of RNA for sequencing is needed. RNA sequencing is done in batches to minimize cost. Healthy controls are matched to sepsis patients based upon demographic/clinical data. Recruitment aims for 300 patients total (average 100 each year over the first three years). Sample size calculations for the recruitment of humans was done based upon initial results from the mice assays. Preliminary data from humans with sepsis shows more variation compared to the mice data. These differences from humans are accounted for by several things such as age, sex, medical co-morbidities, and variations in the timing of collection from the point of the sepsis.
RNA from blood samples from humans are processed using the MasterPure Complete RNA Purification (epicenter, Madison Wis., USA) kit for humans. Due to the high concentration of globin RNA in blood samples, these samples can then be further processed with the GLOBINclear Kit (epicenter, Madison Wis., USA). All samples require at least 1400 nanograms of RNA for deep sequencing, e.g., by Deep RNA sequencing with a goal of 100,000,000 reads per sample.
Genotype Tissue Expression (GTEx). The GTEx data has over 500 patients included with at least one sample that has undergone RNA sequencing. Extensive clinical data is available on these participants. The data can stratify the patients into early deaths (<36 hours) and late deaths (>36 hours). This classification and comparison between the groups was done as it highlights a population who could be intervened upon. The patients who die later die because of immune suppression leading to complications from sepsis. Earlier identification of immune suppression could change outcomes. The GTEx samples have been collected and undergone RNA sequencing. This sequencing data are analyzed as described above.
Innovativeness. RNA sequencing technology affords an avenue to bring precision medicine to sepsis patients. The inventors used blood samples from sepsis patients, process them and obtain RNA sequencing data of similar quality to that of cell lines or solid tissue samples. Monaghan et al., Shock, 47, 100 (2017). RNA sequencing allows for understanding not only the gene expression but also RNA biology. RNA is unstable compared to DNA. Kara & Zacharias, Biopolymers, 101, 418-427 (2014). RNA is influenced by the specific cellular environment (altered in sepsis).
Conceptual Innovation. Past work on sepsis and molecular mechanisms has been focused on gene transcription and protein expression. The process of alternative RNA splicing also can influence the expression of a protein independent of the gene expression. Chang et al., Combinatorial Chemistry & High Throughput Screening, 13, 242-252 (2010); Fredericks et al., Biomolecules, 5, 893-909 (2015).
By comparing findings in mice to humans using the publicly available RNA sequencing data from GTEx and human samples from the Intensive Care Unit, the inventors can establish the nature/type of RNA splicing common across species.
By determining the temporal relationship of changes in RNA splicing entropy, RNA lariats, viral identification, and B and T cell epitope creation with developing complications/mortality, the inventors can establish whether RNA biology can provide insight to immune suppression after sepsis.
Assessing information in the unmapped reads (viral and B/T cell epitopes) to determine clinical significance is using data that is typically discarded. This is like the use of lymphocyte counts to predict sepsis outcomes. Heffernan et al., Critical Care, 16, R12 (2012).
Technical innovation. RNA is isolated from complex tissues from both mice and humans. The isolate RNA is of high enough quality to allow for deep RNA sequencing. This analysis has only previously been done on cell line or cancer samples.
The inventors can use a series of analytical algorithms; initially, using the STAR aligner, then Whippet to assess and characterize splicing events and splicing entropy. This analysis is done across GTEx data, mice with sepsis and humans with sepsis.
The inventors can use the Read Origin Protocol as a basis. The inventors can modify as appropriate to assess viral content and B/T cell epitopes in data obtained from mouse models of sepsis, GTEx, and humans with sepsis.
The inventors can apply the scripts used previously to calculate lariat counts from RNA sequencing data. Taggart et al., Nature Structural & Molecular Biology, 19, 719-721 (2012). The RNA sequencing data is obtained from mouse models of sepsis, GTEx, and humans with sepsis.
Assaying the large amount of data that comes from RNA sequencing is commonly not successful due to several reasons. The analyses have biases for which controls are not in place, the large data should produce a statistically significant result but is it biologically and clinically significant. Using multiple biologic outputs (RNA splicing entropy, lariat counts, viral identification, and B and T cell epitope creation) across three samples (GTEx, mouse model, and humans) mitigate.
By assaying RNA splicing entropy, lariat counts, viral identification, and B and T cell epitope creation, one of ordinary skill in the molecular biological art can identify patients with this prolonged immune suppression.
Analyzing data already collected, such as using the GTEx data, and data like the unmapped reads from RNA sequencing supports creativity. This data would typically be ignored, but with the proper clinical relevance, the data can be reanalyzed and potentially find new biomarkers. The lymphocyte counts on a complete blood count with differential, a potential biomarker in the sepsis population. Heffernan et al., Critical Care, 16, R12 (2012).
Analysis of RNA sequencing data can provide one marker of the severity of the critical illness.
Evaluating RNA biology and outcomes after sepsis. Next generation RNA sequencing allows for the analysis of the RNA and assessment of not only gene expression but also other biological processes (alternative splicing, changes in transcription start and end). Correlating genomic information from high throughput sequencing technologies about a patient on arrival to the hospital with outcomes such as death and complications like infection should improve care. Since RNA is not as stable as DNA, assessing RNA are more sensitive to the physiologic stress in sepsis. The inventors can assess how the physiologic stress of sepsis influences RNA biology and alters proteins. Assaying RNA biology in critical care sepsis patients should translate to other patients with critical care after diseases.
By high throughput RNA sequencing the inventors can assay gene expression and the RNA processing events of alternative transcription start/end and alternative RNA splicing of from leukocytes in the blood. All three of these biological processes influence protein expression via generation of the RNA (gene expression), changing the beginning and end of the RNA (alternative transcription start/end), and changing the isoforms that are expressed (alternative RNA splicing). The combination of these three modalities creates a ‘transcriptomic phenotype’ and better identifies expressed proteins in the sepsis population as compared to the typical use of gene expression alone, compared to DNA, RNA is more influenced by the physiologic derangements seen in sepsis such as hypoxia and acidosis in cell culture. Elias & Dias, Cancer Microenvironment, 1(1),131-9 (2008); Kasim et al., The Journal of Biological Chemistry, 289(39), 26973-88 (2014).
In an intensive care unit, monitoring of physiology correlates to improved clinical outcome. Clinicians do not monitor how this physiology impacts RNA biology. Using high throughput sequencing, the inventors assay RNA biology in sepsis patients. The understanding of RNA biology at the time of injury should predict mortality, complications, and other outcomes in sepsis patients. Three aims are tested using a mouse model of sepsis, data from GTEx of sepsis patients, and blood from sepsis patients with correlation to outcomes.
Aim 1: Identify changes in RNA biology (gene expression, alternative transcription start/end, and alternative RNA splicing) in the blood before and after a pre-clinical mouse model of sepsis and compare to controls.
Aim 2: Using the data available from the Genotype Tissue Expression (GTEx) project correlate findings in the mouse model to these sepsis patients (81 patients).
Aim 3: Enroll critically ill sepsis patients and identify aspects of RNA biology that identify and predict outcomes (mortality, infection).
These analyses use data from high throughput sequencing and cloud computing to establish findings of RNA biology that correlate and predict outcomes in sepsis patients. This data comes from an ancestrally diverse sepsis population and can be applied to sepsis patients across the country and to multiple critically ill patient populations.
New technology has come that allows for analysis of all genes, not just those identified by the technology at the time. Tompkins, The Journal of Trauma and Acute Care Surgery. 78(4), 671-86 (2015). With RNA sequencing technology, particularly at the depth identified (80-100 million reads) needed for RNA biology assessment, the inventors can assess all genes transcribed, not just those identified as important with older technology. The analysis of all transcribed genes allows for the identification of genes that may be important for trauma, that in the past were overlooked, likely due to low transcription levels, with RNA sequencing technology the inventors can assay RNA biology (alternative transcription start/end and alternative RNA splicing), for a complete understanding of what genes are ultimately translated to functional proteins. Hardwick et al., Frontiers in Genetics, 10, 709 (2019).
Over 90% of human genes with multiple exons require alternative splicing events to produce functional proteins, creating a potentially large natural source of variation of the transcribed gene to the produced protein product. Pan et al., Nature Genetics, 40(12), 1413-5 (2008). Splicing is under exquisite control under normal conditions. Some conditions common in trauma, such as fever, hypothermia, and osmotic stress from fluid shifts can influence RNA splicing in vitro and change RNA splicing, altering protein expression. Gultyaev et al., TSitologiia i Genetika, 48(6), 40-4 (2014); Lemieux et al., PloS One, 10(5), e0126654 (2015); Mahen et al., PLoS Biology, 8(2), e1000307 (2010).
Using a mouse model of trauma caused by hemorrhage followed by cecal ligation and puncture, the inventors reported that alternative RNA splicing results in expression of varied isoforms of an immune modulating protein (programmed cell death receptor-1, PD-1). Preliminary data on RNA splicing entropy indicate that global RNA splicing is modified in the mouse model of trauma. Ritchie et al., PLoS Computational Biology, 4(3), e1000011 (2008). Increased RNA splicing entropy is also present in other pathologic conditions, such as cancers, as compared to normal tissue. Ritchie et al., PLoS Computational Biology, 4(3), e1000011 (2008). Increased entropy is characteristic of disease states and could be a marker of critical illness after sepsis.
Sepsis patients are a good population in which to assay critical illness and generalize the findings to other patients. A population of sepsis patients is an ideal group to assay genomic factors as previous research has been hindered by lack of racial and ethnic diversity. Multiple factors cause minorities to avoid healthcare. Chikani et al., Public Health Reports, 131(5), 704-10 (2016). By assaying sepsis patients, the inventors can collect data from a diverse population that is more in line with the general population and not the population that seeks healthcare. The findings are more generalizable, especially among an ancestrally diverse population.
Protocols for sepsis have improved outcomes. Rhodes et al., Intensive Care Medicine, 41(9), 1620-8 (2015). Sepsis can cause critical illness in a young population. The response to sepsis should not be influenced by co-morbidities associated with an increasingly aged population, but the inventors can collect co-morbidities to assess if there is an impact.
Genomic medicine is an ideal target for sepsis patients but is limited by sequencing technologies. Although genomic medicine is typically defined as using genomic information about an individual patient as part of their clinical care, this definition cannot be applied to sepsis patients or any critically ill patients.
Next generation RNA sequencing takes about 18 hours on an Illumina machine, but this does not include time for data analysis. Since the data are delayed until the outcome of the patient is known, data analysis can be blinded to allow for more robust conclusions, through this work, the efficiencies in computation biology can be elucidated so that when the sequencing technology speeds up, the analysis are quick enough to have a clinically relevant time frame (less than one hour) from sample acquisition to actionable result.
Thus, there is value in understanding of how stressors associated with sepsis can affect RNA biology (RNA splicing (and entropy) and alternative transcription start/end) and how changes in the RNA biology leads to altered protein product expression, contributing to potential dysfunction at a cell and tissue level.
Innovation. Past work focusing on trauma and molecular mechanisms has been focused on gene transcription and protein expression. The process of alternative RNA splicing and alternative transcription start/end both have the potential to influence the expression of a protein independent of the gene expression. Chang et al., Combinatorial Chemistry & High Throughput Screening, 13(3), 242-52 (2010); Fredericks et al., Biomolecules, 5(2), 893-909 (2015). By comparing findings in mice to humans using the publicly available RNA sequencing data from GTEx and human samples from the Trauma Intensive Care Unit the inventors can establish the nature/type of RNA biology that is common across species.
In determining the temporal relationship of changes in RNA biology with developing complications/mortality, the inventors can establish whether RNA biology can provide insight to immune suppression after sepsis.
Knowledge of RNA biology in the critically ill is useful because previous work on this process has focused largely on chronic diseases and genetic diseases.
The combination of gene expression, RNA splicing, and transcription start/end create a ‘transcriptomic phenotype’ that can be followed during the patients hospital stay.
RNA is isolated from complex tissues from both mice and humans. The isolate RNA is of high enough quality to allow for deep RNA sequencing. This analysis has only previously been done on cell line or cancer samples.
The inventors can use a series of analytical algorithms using the STAR aligner, then Whippet, to assess and characterize RNA biology. Results from Whippet are compared to mountainClimber to ensure accurate data as it pertains to alternative transcription start and end. This analysis is done across GTEx data, mice with sepsis and humans with sepsis.
Using multiple biologic outputs (alternative RNA splicing, including entropy, alternative transcription start/end) across three different samples (GTEx, mouse model, and humans in the trauma intensive care unit) should mitigate some of the potential flaws.
Preliminary data regarding trauma. In a small cohort of trauma patients from GTEx, three patients form the early death cohort (<48 hours) were compared to six patients from the late death cohort (>/=48 hours). In this comparison, 524 genes are significantly increased in the late death versus the early death. In the late death group, 2331 genes are decreased compared to the early death group. The GO terms associated with the genes that decreased expression in the late group compared to the early group are valid based upon previous research. The terms with a decrease in expected representation in the GO terms reference mitochondrial biology. This decrease in GO terms likely represents that genes are increased in expression at the early death time point. Mitochondrial molecular patterns have been a component of the early response to trauma and those genes would be increased in the early group. (37, 38) anemia occurs during trauma. In the late group, genes associated with erythrocyte development are over-represented, suggesting increase expression in the late death group compared to the early death group. These few GO terms and correlation to phenotypes of trauma, suggest use of early versus late death is a valid clinical tool. This preliminary data shows the ability to access, manage, and analyze GTEx data with clinically significant groups using modern computational biology techniques. Using GO terms allows us to prove clinical relevance. This project aims to obtain and analyze all the trauma samples from GTEx. The inventors can also use similar computational approaches with the prospectively collected data from trauma patients.
Multiple alternative RNA splicing events and alternative transcription start, and events are detected, but there are fewer that are significant. Using the same cohort as above, this preliminary date from GTEx data, alternative splicing and alternative transcription events are characterized using Whippet. Multiple events were identified to be alternative RNA splicing and alternative transcription start/end in the blood samples. When comparing the groups there were only significant differences when assessing alternative RNA splicing and not alternative transcription start and end. This data confirms that alternative RNA splicing is an active process during trauma and could predict mortality and outcomes in trauma patients, genes with changes in splicing, and potentially transcription start/end could identify useful targets. The combination of gene expression, splicing and transcription start/end could alter what proteins were thought to have increased gene expression and subsequent protein transcription have altered processing resulting in new isoforms or changes in transcription. These findings highlight the ability to access GTEx data, categorize the samples in a clinically relevant manner, and process the RNA sequencing data with advanced computational methods, such as Whippet.
RNA splicing, specifically RNA splicing entropy shows differences after trauma. From the preliminary data in mice with and without, the inventors can show that in the blood there is less RNA splicing entropy, 7.7% versus 10.7%, p=0.1. RNA splicing entropy was calculated using Whippet. The percentage of each type of splicing event with an entropy of >1.5 (Alternative Donor, Alternative Acceptor, Retained Intron, and Skipped Exon). Using the mouse model of trauma, RNA splicing entropy was calculated for total white blood cell components of mice after trauma caused by hemorrhage with cecal ligation and puncture (n=3) and compared to controls (n=3). The RNA from blood was extracted, processed, and then subjected to deep RNA sequencing. This preliminary data suggests that the process of RNA splicing in critical illness is different compared to the controls, changes in RNA splicing entropy may be a reflection/response to or a mechanism driving pathological processes that drive mortality and morbidity in patients with trauma. Obtaining this data demonstrates the ability to isolate RNA samples from the target organ tissues of interest in the mouse model system. This EXAMPLE demonstrates the ability to process the complex data using computational biology and custom scripts that result from RNA sequencing.
The trauma patients in the intensive care unit provide an ancestrally diverse population and adequate numbers to correlate mortality and other complications. The trauma intensive care unit admits over 750 patients a year with 20% of those patients coming from an ancestrally diverse background. The enrollment is in line with the general population, even though underrepresented minorities seek medical care at a reduced rate. One aspect to this invention is the correlation of the RNA sequencing data to mortality and complications.
This EXAMPLE shows the importance of not only predicting mortality, but also using RNA sequencing data to predict complications as patients with complications had a higher mortality (7.7%). Mortality could be influenced. This data shows the trauma center has the volume of patients in the intensive care unit to have an appropriately powered study.
Over four years, 520 patients can be enrolled based on sample size calculations, with fewer than the 3000 expected admissions proving feasibility.
This approach uses RNA sequencing data from a mouse model of trauma, re-analysis of existing genomic data in GTEx about early versus late trauma deaths, and samples from ancestrally diverse critically ill trauma patients uniquely suited to provide clinical information applicable across many clinical scenarios, particularly critically ill patients with cancer, sepsis, stroke, or myocardial infarction. The analysis of the RNA data from next generation sequencing technology creates a ‘transcriptomic phenotype’ for each trauma patient. Understanding the RNA biology at the time of injury can predict outcomes (mortality and complications) in trauma patients. The method to test the three aims, the expected result, and the potential impact are summarized in TABLE 5.
Aim 1: Identify changes in RNA biology (gene expression, alternative transcription start/end, and alternative RNA splicing) in the blood before and after a pre-clinical mouse model of trauma and compare to controls.
Rationale: to determine if altered RNA biology in its various forms can predict outcomes, RNA sequencing data must be collected at various time points during the traumatic injury. The inventors can establish the equivalency of such a pre-clinical animal model to what is encountered clinically. The inventors previously used a mouse model of hemorrhagic shock followed my septic shock by cecal ligation and puncture (CLP). Monaghan et al., J. Transl. Med., 14(1), 312 (2016). This mouse model mimics a trauma patient with hemorrhagic shock from an extremity injury who then had a missed bowel injury resulting in severe critical illness. Using this mouse model, the inventors can obtain blood at the initial injury and assess if changes in RNA biology, to predict mortality from the severe trauma model. Using a mouse model allows for acquisition of blood samples at multiple time points (twenty-four hours after injury and in those mice that survived). The inventors can first assess if RNA biology in the blood can predict mortality, if changes in RNA biology are seen twenty-four hours after injury, and how these correlate to the RNA biology of survivors at fourteen days.
Test 1: Assess RNA sequencing data and identify genes with changes in expression, alternative RNA splicing, and alternative transcription start/end to develop the ‘transcriptomic phenotype’ from shed blood in the mouse model of trauma to predict outcomes. Mice (8-12 weeks old) undergo hemorrhagic shock followed by CLP to mimic the critical illness that a trauma would undergo after hemorrhagic shock from an extremity injury complicated by a missed small bowel injury. Mice are used from the background of C57BL/6J, BALB/cJ, and CAST to simulate the heterogeneity of humans. Each group has twenty-four (twelve sham and twelve trauma) mice for each strain based upon statistical calculations. C57BL/6J mice have a 30% survival at fourteen days. The shed blood from the hemorrhage component is collected. Although this blood is collected before the effects of hemorrhage, this time point can mimic an early time point in trauma, since the mice have undergone anesthesia and isolation/catheter insertion of the artery. RNA is isolated, sequenced and analyzed as described. The mice that survive to fourteen days can also be sacrificed and used in Test 2.
Test 2: Assess RNA sequencing data and identify genes with changes in expression, alternative RNA splicing, and alternative transcription start/end to develop the ‘transcriptomic phenotype’ from the blood of mice at twenty-four hours and fourteen days after trauma. Mice (8-12 weeks old) undergo hemorrhagic shock followed by CLP to mimic a severe trauma. Mice are used from the background of C57BL/6J, BALB/cJ, and CAST. Mice are sacrificed at twenty-four hours after CLP. Mice that survive to fourteen days are also sacrificed to assess RNA biology at that point among the survivors. Appropriate controls for each type of background mice undergo sham procedures. Based upon previous work, six mice are needed for each group. After mice are sacrificed (CO2 overdose followed by direct cardiac puncture) at either twenty-four hours or fourteen days after CLP blood are harvested. RNA from blood samples in the mouse are processed.
Human samples. Through collaboration with the military, soldiers in combat areas could be consented to donate blood before deployment. This blood would then undergo RNA sequencing and be compared to samples collected if there was an unfortunate traumatic injury. Many previous efforts using animal models to treat diseases such as sepsis failed to translate to humans. Fink & Warren. Nature Reviews Drug Discovery, 13(10), 741-58 (2014). The inventors previously studied conditions in mice with correlation to humans. Monaghan et al., J. Transl. Med., 14(1), 312 (2016); Monaghan et al., Molecular Medicine, 24(1), 32 (2018); Monaghan et al., Journal of the American College of Surgeons, 213(3), S54-S5 (2011); Monaghan et al. Annals of Surgery 255(1), 158-84 (2012). Trauma research may have better translatable results because of the timing of the disease. In trauma, the time of the event is known. This timing correlates with the induced trauma in the mouse. In sepsis, the time point at which sepsis started in the mouse is known. In humans, the time at which sepsis starts is impossible to know, as exemplified by inability to understand when an appendix may perforate. Iacobellis et al., Seminars in Ultrasound, CT, and MR, 37(1), 31-6 (2016). This is limited because it is a controlled traumatic challenge and should produce very consistent response to trauma. In humans, no trauma is the same. The number of humans needed to detect a difference is more since the traumas are not similar. Humans have more heterogeneity adjusted for by using multiple mouse strains. The inventors can account for differences in trauma by using the Injury Severity Score. The ISS of this challenge on the mouse is twenty-five, and this is the target average ISS of patients enrolled.
Aim 2: Using the data available from the Genotype Tissue Expression (GTEx) project correlate findings in the mouse model to these trauma patients (eighty-one patients).
Rationale. Using the GTEx data, the inventors can assess RNA biology in the blood of trauma patients. The GTEx data has over 500 patients included with at least one sample that has undergone RNA sequencing. The patients in the GTEx data set have extensive clinical data available. Unfortunately, all patients in this data set are deceased. This should be considered in interpretation of the data. To adjust for the fact all patients are deceased, the inventors use the time to procurement of the RNA from the death of the patient as a variable due to adjust for RNA degradation and other metrics as suggested by the GTEx consortium. (50) Trauma patients are selected (n=81) and identified as early (<48 hours) versus late death (>/=48 hours). The inventors can compare RNA biology between trauma patients who died early versus late and compare it to findings in a mouse model of mice who died early (twenty-four hours) versus survivors (fourteen days)
Test 1: Assess RNA sequencing data and identify genes with changes in expression, alternative RNA splicing, and alternative transcription start/end to develop the ‘transcriptomic phenotype’ the blood of deceased trauma patients and compare among early and late deaths. There are 81 unique trauma patients in the data set with blood samples. These patients are aged 20-68, in line with the age of typical trauma patients. The GTEx samples have been collected and undergone RNA sequencing. RNA sequencing data are aligned to the human genome with STAR. RNA Splicing events are assessed using Whippet and characterized into one of the five alternative splicing events: skipped exon, retained intron, mutually exclusive exon, alternative 3′ splice site, and alternative 5′ splice site. Entropy calculations are completed using Whippet. Alternative transcription events from Whippet are compared to outputs from mountainClimber.
Test 2: Correlation of changes in expression, alternative RNA splicing, and alternative transcription start/end (the ‘transcriptomic phenotype’) in the blood of humans to the mouse samples. From mouse model (Aim 1) changes in expression, alternative RNA splicing, and alternative transcription are identified and these are compared to findings in the human GTEx data (Aim 2, Test 1). The mouse model data are taken from mice at twenty-four hours after CLP and at fourteen days after CLP. This data is compared to the human data of early (<48 hours) and late (>/=48 hours) death. The identical genetic background of laboratory mice (despite coming from three strains) allows for assumptions to be made about significance of changes at a higher resolution, due to the certainty of the genetic model. Simultaneously it creates uncertainty about the validity of findings, due to a lack of comparability to humans that experience conditions outside of the laboratory. Human data is plagued by an equal and opposite effect as data derived from animal models. The homogeneity of the mouse model is replaced with heterogeneity due to factors such as age, sex, co-morbidities, and differences in the trauma. By coupling the certainty provided by the homogeneity of the mouse model, and the uncertainty provided by the heterogeneity of the human model, the inventors create a powerful tool with the potential to validate results from mouse analyses in humans. Comparing events across species can identify RNA biology events and genes that are important at both the early and late time point. These findings are compared to those found in the prospective collected data from trauma patients.
Human samples. In this sample set, all the patients are dead. Since RNA is unstable compared to DNA, adjustments in the comparisons between groups during the analysis must be made for the time it took for samples to be collected and RNA isolated. The mouse work is comparing to mice that are alive but were sacrificed. The GTEx consortium, to adjust for problems associated with deceased donors, has described multiple methods. Carithers et al., Biopreservation and Biobanking, 13(5), 311-9 (2015).
Aim 3: Enroll critically ill trauma patients and identify aspects of RNA biology that identify and predict outcomes (mortality, infection).
Rationale: A current challenge with the data from the animal models is ensuring translation to humans. This aim allows for complete translation of mouse data to humans. The human population of interest are patients admitted to the Trauma Intensive Care Unit (TICU).
Test 1: Assess RNA sequencing data and identify genes with changes in expression, alternative RNA splicing, and alternative transcription start/end in the blood can be prospectively detected and use this ‘transcriptomic phenotype’ in trauma patients on arrival and be correlated to mortality. Trauma patients are recruited from the trauma intensive care unit, which has an average of over 750 patients, admitted each year (over the last three years) and an average injury severity score (ISS) of 13, but the goal is to enroll patients with an average ISS of 25 to mimic the mouse model. Blood is collected in PAXgene tubes and stored at −80 C after informed consent is obtained. Samples are collected serially while in the ICU. Blood samples from patients are taken on admission (25 mL) and during the TICU stay when a complication is developed (25 mL). This causes the maximum for the initial 8-week period after the trauma. When the patient is recovered, at least eight weeks after the last blood draw, a final blood draw 50 mL of are done, potentially in the outpatient setting. Patients who survive the trauma are compared to patients who died. Clinical information for the trauma patients is collected from the trauma registry. The trauma registry is a database required as part of verification by the American College of Surgeons to be a trauma center. The data are standardized across the entire recruitment period. RNA is isolated using the PAXgene RNA Kit. RNA was sequenced (goal 80 to 100 million reads). RNA sequencing data are aligned to the human genome using the STAR aligner. Changes in expression, alternative RNA splicing, alternative transcription start/end, and RNA splicing entropy are identified with Whippet. Alternative transcription findings are correlated with mountainClimber.
Test 2: Assess RNA sequencing data and identify genes with changes in expression, alternative RNA splicing, and alternative transcription starts and end in the blood can be prospectively detected in trauma patients on arrival and use the ‘transcriptomic phenotype’ to correlate to outcomes and complications. Patients from the trauma intensive care unit identify differences in RNA biology between the healthy controls and trauma patients predict outcomes and complications. Outcomes and complications are recorded from the medical record and are defined in the trauma registry (and decided by trained coders). The trauma registry provides some demographic data, such as injury severity score to better quantify and adjust for the severity of the trauma across patients. Outcomes to follow and use as potential for prediction include mortality, hospital length of stay, intensive care unit length of stay, ventilator free days, and discharge disposition. Complications to be recorded again are taken from the trauma registry and include items such as infections (pneumonia, surgical site infections, urinary tract infection, bacteremia, sepsis), unplanned return to the operating room, unplanned return to the intensive care unit, tracheostomy, and feeding tube placement.
Human samples: In this sample set, all the patients are critically ill. Consenting patient who are critically ill requires a proxy and this can sometimes be difficult in the unexpected nature of trauma. The inventors have past success in consenting these patients. Human heterogeneity may make finding a significant difference between two groups difficult. Drastic difference (trauma patients in the intensive care unit survive versus die and those with complications) should allow for the identification of differences in RNA biology (‘transcriptomic phenotype’). All samples for this assay come from living patients.
All the test mice have the traumatic injury. They are maintained for fourteen days. At fourteen days all mice are sacrificed. The survival rate at fourteen days for the double hit model is 30%. The rate goes up to 70%. Monaghan et al. Annals of Surgery 255(1), 158-64 (2012). These estimates result in an effect size of h=0.823. A sample size of twenty-four per group during analysis would exceed 80% power at a 2-tailed alpha of 0.05 by a chi-square test of independent proportions, for survival analyses the inventors use twenty-four mice per group. This are done to ensure enough power to detect if RNA splicing at the initial challenge can predict survivors. Sham mice are operated (8 from each mouse background strain) at this time to procure samples at the 14-day time point.
RNA isolation and sequencing. RNA data from GTEx is extracted and sequenced per their protocols. RNA from mouse blood samples is processed using the MasterPure Complete RNA Purification (epicenter, Madison Wis., USA) kit for mice. Due to the high concentration of globin RNA in blood samples, these samples then be further processed with the GLOBINclear Kit (epicenter, Madison Wis., USA). From blood the inventors can get approximately 30-50 nanogram per microliter, with a total blood volume isolated from the mouse of about one mL. After RNA samples are processed, they are sequenced. All samples require at least 1400 nanograms of RNA for deep sequencing. Each sample are sent out (due to advancing technologies, costs of sequencing change frequently, therefore outside facility are chosen based upon cost during sample send out) for Deep RNA sequencing with a goal of 80 million to 100 million reads per sample.
Blood from trauma patients and healthy human control samples are collected using the PAXgene tubes (PreAnalytiX, Switzerland) and isolated using the PAXgene RNA kit (PreAnalytiX, Switzerland). Since it is impossible to predict the patients who die or have a complication on admission to the ICU, banked samples are used since the cost to perform RNA sequencing on the blood of all TICU patients at Rhode Island Hospital is impossible.
Assessment of clinical information. Clinical data relevant to the patient samples are collected from the trauma registry and the electronic medical record. This allows for collection of endpoints such as mortality. ICU length of stay, hospital length of stay, ventilator days, renal failure, ARDS, pneumonia, and other infectious complications. Besides data in the chart, the inventors also perform functional assessments at follow up after discharge. These would be based upon previous work in critical illness and use the 36-item short form (SF-36). The assessment is done at the 8+ week follow up.
The objective of this EXAMPLE is to use RNA sequencing data and analysis to identify useful gene targets in sepsis.
Alternatively spliced RNA arises from co/post-transcriptional events facilitated by the spliceosome, introns are removed to form the mature RNA from which protein isoforms are translated. Alternatively transcribed genes are the product of changes in promoter usage, polyadenylation signals, and RNA polymerase II interactions with DNA which can lead to changes in isoform usage like alternative splicing events. These are identified from the analysis of RNA sequencing data. Significant differentially alternatively transcribed genes and alternative spliced genes were identified and were overlapped with genes reported as ARDS related. See, Reilly et al., American Journal of Respiratory and Critical Care Medicine (2017). Of 89 reported ARDS related genes, 38 were confirmed in at least one differential category confirming that the use of humans and mice with DAD/ARDS is appropriate and robust (p=1.25e−14). Eleven previously reported genes were present in all categories. These eleven genes were evaluated for the change in alternative splicing and alternative transcription GO term enrichment analysis was performed on the eleven overlapping genes, revealing twenty significant biological processes including ontology related to aging, and response to abiotic/environmental stimuli. See
Assaying the underlying changes in RNA processing (alternative splicing and alternative transcription start/end) not expands basic knowledge only of pathogenicity, but also provides additional targets for therapeutics. The most enriched GO term from the alternative splicing set, carboxy-terminal domain protein kinase complex (GO: 0032806) refers to phosphorylation of the CTD of RNA polymerase II, which is vital in regulating transcription and RNA processing. RNA polymerase complex binding (GO: 0000993), and transport of the SLBP Independent/Dependent mature mRNA (R-HSA-159227; R-HSA-159230) are among the most enriched. Alternative pre-mRNA splicing may have the dominate role in isoform usage in genes where expressions levels do not change, whereas alternative transcription may regulate isoform usage in genes that are more dynamically expressed during critical illness. Alternative splicing and alternative transcription may have separate roles in DAD/ARDS by regulating different genes to perform distinctive functions.
In this analysis of RNA sequencing data from deceased patients with ARDS identified by DAD and a clinically relevant mouse model of ARDS, useful genes are identified.
Overview. The inventors used RNA sequencing to identify changes in mRNA processing events (RNA splicing and transcription start/end sites) can be studied with RNA sequencing data. The inventors' strategy was to use the contrast how the processing of mRNA changes in lung and blood of patients with ARDS and compare to the lung and blood of a mouse model of ARDS.
Data. For this EXAMPLE, two main approaches were taken to obtain samples. The first was to use a validated mouse model of ARDS. Ayala et al., The American Journal of Pathology, 161, 2283-2294 (2002); Monaghan et al., Molecular Medicine (Cambridge, Mass., USA), 24, 32 (2018). All experiments were done according to guidelines from the National Institutes of Health (Bethesda, Md.). For the mouse model of ARDS. C57BL/6 male mice (The Jackson Laboratory, Bar Harbor, Me., USA) between 10 and 12 weeks of age were used. ARDS was induced in the mice by hemorrhage (non-lethal shock) followed by cecal ligation and puncture (CLP). The control group was sham hemorrhage followed by sham CLP.
The second approach was to identify patients in the GTEx Project with ARDS. All patients in the GTEx projects used in this EXAMPLE are deceased. A pathologist, blinded to the specimen ID and history, identified diffuse alveolar damage in lung samples from patients in GTEx. Most cases of clinical ARDS have diffused alveolar damage (DAD) morphologically. Zander & Farver, Pulmonary pathology e-book: A volume in foundations in diagnostic pathology series. (Elsevier Health Sciences, 2016). Classic DAD was identified based histologic features (For full description, please see supplement). Patients with evidence of diffuse alveolar damage in the lung and a corresponding blood and lung sample that had undergone RNA sequencing were placed in the ARDS group. Patients who had no evidence of diffuse alveolar damage in the pathology sample and a blood and lung sample with RNA sequencing were placed in the control group. Most cases of clinical acute lung injury (ALI) and acute respiratory distress syndrome (ARDS) have diffused alveolar damage (DAD) morphologically, which is divided into 2 phases: the acute/exudative phase and the organizing/proliferative phase. Other histologic patterns encountered in a clinical setting of ALI/ARDS include diffuse alveolar hemorrhage, acute eosinophilic pneumonia (AEP), and the acute fibrinous and organizing pneumonia (AFOP). Eight patterns of acute lung injury are evaluated in this EXAMPLE. Zander & Farver, Pulmonary pathology e-book: A volume in foundations in diagnostic pathology series. (Elsevier Health Sciences, 2016). Classic DAD was graded 1-4 based on the histologic features. Other patterns of injury were scored using a semiquantitative system for extent and histologic characteristics. For extent, grade was assigned: grade 1 (1 point): up to 10% tissue involved, grade 2 (2 points): 11-30% tissue involved, grade 3 (3 points): 31-50% tissue involved and grade 4 (4 points): >50% tissue involved. Histologic characteristics including intra-alveolar fibrin (1 point), cellular alveolar debris (I point), type II pneumocyte hyperplasia (1 point) and capillaritis/vasculitis. Total points 6 or higher were considered as DAD. Despite this complex method for categorizing diffuse alveolar damage, using this to diagnose ARDS is a major limitation. DAD could be present in other pulmonary diseases. The value RNA sequencing data from the lungs and blood of patients can provide biologic insights despite these limitations.
Results. Alternative splicing events were observed at 2-fold higher abundance as compared to alternative transcription events, yet significant alternative transcription events between groups were observed at a 6-fold higher prevalence (p=2.2e−16). Eighty-two alternative transcription events were common across all ARDS tissues (human and mouse, blood and lung, p=2.72e−16). No significant alternative splicing events were detected across all four tissues. As alternative splicing is species and tissue specific, it is unlikely to find an event that occurs in lung tissue and blood tissue in both human and mouse. GO term analysis was also performed on the significant differentially processing events.
The full list is TABLE 3 and TABLE 4 in International patent application PCT/US2021/018218, which are incorporated by reference.
This invention disclosure describes a specific diagnostic. Determining which COVID-19 patients are at risk for severe disease and developing better anti-coronavirus therapies are clinical priorities. The invention leverages new information from genomic studies that demonstrate very limited, highly-specific regions of the SARS-CoV-2 genome are transcriptionally active in the bloodstream of COVID-19 patients. The invention solves the problem of measuring viral load in the blood from infected patients to assess prognosis. Viral load measurements are central to the management and prognosis of patients with HIV infection. The inventors found that using SARS-CoV-2 viral load to identify patients with more severe COVID-19, track the disease's course, and follow the response to treatment. The same sequences are used to create antiviral, interfering RNAs that block viral gene expression specific for viremia. This therapy directly blocks genes for COVID-19 pathogenesis. The invention is superior to potential competitors because the genomic data informs the design of relevant oligonucleotides that increases the assay's sensitivity.
The invention measures the amount of SARS-CoV-2 circulating in patients' blood. This amount is elevated in patients critically ill with COVID-19. The results therefore provide information on the prognosis of individual patients. Interfering RNA that specifically targets these sequences reduce these genes' expression, interrupting viral replication and the downstream consequences of COVID-19. The open reading frame encoding the N protein also includes ORF9b, which has been shown to antagonize the action of antiviral type I interferon. Blocking this region of the virus enhance the host's endogenous antiviral response.
The invention can be used by all clinicians caring for COVID-19 patients. High or increasing SARS-CoV-2 viral loads identify those at the most significant risk for severe disease. The therapy is unique compared to those approved or under clinical trial.
This invention disclosure describes a more general approach. Diagnostic testing for specific infections depends on a substantial understanding of the underlying pathogen. Nucleic acid amplification tests (NAT) are increasingly used in clinical medicine, but developing a test is time-consuming and may miss NAT's optimal sequences. In the setting of a new pandemic like COVID-19, delays can be fatal. This invention leverages further information from deep sequencing of RNA from patients with specific infections to develop diagnostic targets and therapeutic strategies. Deep sequencing of RNA identifies the pathogen's most abundant RNA transcripts that establish a future work foundation. The approach is especially valuable for new pathogens poorly characterized because the RNA sequencing is unbiased and analyze known and unknown sequences.
The findings with SARS-CoV-2 illustrate the invention's potential. The sequencing studies demonstrate that very limited, highly specific regions of the SARS-CoV-2 genome are transcriptionally active in the bloodstream of COVID-19 patients. The invention solves the problem of measuring viral load in the blood from infected patients to assess prognosis. Viral load measurements are central to the management and prognosis of patients with HIV infection. The inventors found that using SARS-CoV-2 viral load to identify patients with more severe COVID-19, track the disease's course, and follow the response to treatment. The same sequences can be used to create antiviral, interfering RNAs that block viral gene expression specific for viremia. This therapy directly blocks genes for COVID-19 pathogenesis. The invention is superior to potential competitors that focus on DNA sequencing, which misses all possible RNA viruses like SARS-CoV-2 and influenza. Testing the most abundant RNA sequences enhance diagnostic assays' sensitivity because there can be more target sequences to measure.
The invention is a technical approach to measure pathogen RNA expression circulating in patients' blood. Computational analysis identities sequences, and these sequences can be used to develop diagnostic tests and therapeutics, such as small interfering RNAs mentioned above.
The invention can be used by academic and industry researchers who study infectious diseases pathogenesis, diagnostics, and treatment.
Attached are the genes and counts from Acinetobacter in a patient with COVID-19. The genes with the most counts are listed. Bact Gene Exp. The Out CSEQS=Aligned out bam.
RNA Sequencing to Identify Bacterial Resistance in Humans with Sepsis
Attached are the bacterial resistance genes identified in one patient with COVID-19. Resistance Gene.
Acinetobacter
Cutibacterium
Thermus
Thermophilus
Streptococcus
Escherichia
Salmonella
Neisseria
Streptococcus
Pasteurella
Staphylococcus
Escherichia
Escherichia
Escherichia
Escherichia
Escherichia
Neisseria
Cutibacterium
Moraxella
Propionibacteria
Mycobacterium
Mycobacterium
Mycobacterium
Mycobacterium
Mycobacterium
Streptomyces
Mycolicibacterium
Neisseria
Mycobacteroides
Mycobacteroides
Mycobacteroides
Mycobacteroides
Mycobacteroides
Mycolicibacterium
Mycolicibacterium
Mycolicibacterium
Mycolicibacterium
Mycolicibacterium
Mycolicibacterium
Mycolicibacterium
Mycolicibacterium
Mycobacterium
avium
Mycobacterium
intracellulare
Mycobacterium
intracellulare
Mycobacteroides
Mycobacteroides
Mycobacteroides
Mycobacteroides
Mycobacteroides
Mycobacterium
Mycobacterium
Mycobacteroides
chelonae
Mycolicibacterium
Mycobacteroides
abscessus
Deep RNA Sequencing of Intensive Care Unit Patients with COVID-19
Purpose: COVID-19 has impacted millions of patients across the world. Molecular testing occurring now identifies the presence of the virus at the sampling site: nasopharynx, nares, or oral cavity. RNA sequencing has the potential to establish both the presence of the virus and define the host's response in COVID-19.
Methods: Single center, prospective study of patients with COVID-19 admitted to the intensive care unit where deep RNA sequencing (>100 million reads) of peripheral blood with computational biology analysis was done. All patients had positive SARS-CoV-2 PCR. Clinical data was prospectively collected.
Results: The inventors enrolled fifteen patients at a single hospital. Patients were critically ill with a mortality of 47% and 67% were on a ventilator. All the patients had the SARS-CoV-2 RNA identified in the blood in addition to RNA from other viruses, bacteria, and archaea. The expression of many immune modulating genes, including PD-L1 and PD-L2, were significantly different in patients who died from COVID-19. Some proteins were influenced by alternative transcription and splicing events, as seen in HLA-C, HLA-E. NRP1 and NRP2. Entropy calculated from alternative RNA splicing and transcription start/end predicted mortality in these patients.
Conclusions: Current upper respiratory tract testing for COVID-19 only determines if the virus is present. Deep RNA sequencing with appropriate computational biology may provide important prognostic information and point to therapeutic foci to be precisely targeted in future studies.
Introduction: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causing coronavirus disease 2019 (COVID-19) has led to millions of cases worldwide. Dong, Du, & Gardner, The Lancet Infectious Diseases (2020). Current testing is by polymerase chain reaction to detect viral RNA in the nares. Sethuraman, Jeremiah, & Ryo, JAMA (2020). This provides no insight into the host response. Patients with COVID-19 that require intensive care unit (ICU) care are sick and difficult to manage, thus, there is a need for other diagnostic tests during the hospital stay to assist the clinicians.
Deep RNA sequencing refers to a process of sequencing where (at least) 100 million reads of sequence are generated per sample. Deep sequencing allows for the study of low abundance RNA and biologic processes beyond gene expression. Typically. RNA sequencing data is aligned to the genome of interest, such as aligning to human genes when the sample comes from a human. Reads that do not align to the genome of interest are usually discarded. When the RNA sequencing is performed with this large number of reads, it could be used to identify the presence of specific pathogens in the blood by aligning the reads that would have been discarded to other genomes of interest. In COVID-19, sequencing reads of SARS-CoV-2 may provide insight into the biology of the virus during active illness. In addition, secondary infections could be identified, potentially allowing for better, pathogen-directed antibiotic treatment.
The host response to the virus is responsible for some of the morbidity and mortality observed. Bouadma et al., Journal of Clinical Immunology, 1-11 (2020). Acute respiratory distress syndrome (ARDS) is the most common complication encountered with COVID-19. Bouadma et al., Journal of Clinical Immunology, 1-11 (2020). The laboratory has shown that there are significant changes in alternative RNA splicing and transcription start and end in ARDS as assessed by deep RNA sequencing. Fredericks et al., Intensive Care Medicine (2020). These changes are thought to be due to the physiology of ARDS, e.g., hypoxia and acidosis, which are known to influence splicing. Whether this occurs in patients infected by COVID-19 is not known.
While RNA sequencing can be used to measure immune modulating gene expression, an alternative approach is the evaluation of global entropy, or disorder in the processing of RNA. Sterne-Weiler et al., Molecular Cell 72, 187-200.e186 (2018). The inventors found that this entropy metric combined with Principal Component Analysis (PCA) can be leveraged to distinguish COVID-19 patients that develop life-threatening illness from those likely to recover.
Here the inventors examine deep RNA sequencing data from patients in the ICU with COVID-19 to characterize both pathogens and host responses. The inventors evaluate the sequences for the presence of the SARS-CoV-2 virus and other potential infectious agents. The host response to COVID-19 is also characterized. The long-term goal is to combine these measurements to better assist clinical decision-making.
Study design, Population and Setting: The study enrolled ICU participants at a single tertiary care hospital evidence of SARS-CoV-2 infection based on positive PCR from the nasopharynx documented during admission. All participants, or their appropriate surrogate, provided informed consent as approved by the Institutional Review Board (Approval #: 411616). Blood samples were collected on day 0 of ICU admission. Clinical data including COVID specific therapies was collected prospectively from the electronic medical record and participants were followed until hospital discharge or death. Ordinal scale was collected as previously described by Beigel et al., New England Journal of Medicine (2020). See also sepsis and associated SOFA score. Singer et al., The third international consensus definitions for sepsis and septic shock (sepsis-3). JAMA, 315, 801-810 (2016). Also, with the diagnosis of ARDS. Ferguson et al., The Berlin definition of ARDS: An expanded rationale, justification, and supplementary material. Intensive Care Medicine, 38, 1573-1582 (2012).
RNA extraction and sequencing. Whole blood was collected in PAXgene tubes (Qiagen, Germantown, Md., USA) and sent to Genewiz (South Plainfield, N.J., USA) for RNA extraction, ribosomal RNA depletion and sequencing. Sequencing was done on Illumina HiSeq machines to provide 150 base pair, paired-end reads. Libraries were prepared to have three samples per lane. Each lane provided 350 million reads ensuring each sample had >100 million reads. Raw data was returned on password protected external hard drives to ensure the security of the genomic data.
Computational Bioloqy and Statistical Analysis. All computational analysis was done blinded to the clinical data. The data was assessed for quality control using FastQC. See Andrews, A quality control tool for high throughput sequence data. FastQC. In: Editor{circumflex over ( )}Book A quality control tool for high throughput sequence data. FastQC (2014). RNA sequencing data was aligned to the human genome utilizing the STAR aligner. Dobin et al., Bioinformatics (Oxford, England) 29: 15-21 (2013). Reads that aligned to the human genome were separated and are now referred to as ‘mapped’ reads. Reads that did not align to the human genome, which are typically discarded during standard RNA sequencing analysis, were identified as ‘unmapped’ reads. The unmapped reads then aligned to the SARS-CoV-2 genome (NC_045512) and counted per sample using Magic-BLAST. Boratyn et al., BMC Bioinformatics, 20, 405 (2019). A coverage map of the SARS-CoV-2 genome was generated using al the subjects to identify the gene expression patterns of the virus in critically ill COVID-19 patients. The unmapped reads were further analyzed with Kraken2 using the PlusPFP index to identify other bacterial, fungal, archaeal, and viral pathogens. See Wood, Lu, & Langmead, Genome Biology, 20, 257 (2019).
Reads that aligned to the human genome, the mapped reads, also underwent analysis for gene expression, alternative RNA splicing, and alternative transcription start/end via Whippet. Sterne-Weiler et al., Molecular Cell 72, 187-200.e186 (2018). When comparisons were made between groups (died vs. survived) differential gene expression was set with thresholds of both p<0.05 and +/−1.5 log2 fold change. Alternative splicing was defined as core exon, alternative acceptor splice site, alternative donor splice site, retained intron, alternative first exon and alternative last exon. Alternative transcription start/end events were defined as tandem transcription start site and tandem alternative polyadenylation site. Alternative RNA splicing and alternative transcription start/end events were also compared between groups. Sterne-Weiler et al., Molecular Cell 72, 187-200.e186 (2018). Significance was set at great than 2 log 2 fold change as previously described by Fredericks et al., Intensive Care Medicine (2020). Genes identified from the analysis of mapped reads were then evaluated by GO enrichment analysis (PANTHER Overrepresentation released 20200728). See Mi, Muruganujan, Casagrande, & Thomas. Nature Protocols. 8, 1551-1566 (2013).
Whippet was also used to generate an entropy value for every identified alternative splicing and transcription event of each gene. These entropy values are created without the need for groups used in the gene expression analysis. To visualize this data a principal component analysis (PCA) was conducted to reduce the dimensionality of the dataset and to obtain an unsupervised overview of trends in entropy values among the samples. Raw entropy values from all samples were concatenated into one matrix and missing values were replaced with column means. Mortality was then overlaid onto the PCA plot to assess the ability of these raw entropy values to predict this outcome in this sample set. This analysis was done in R (version 3.6.3).
Study Population. Participant Characteristics, and RNA sequencing: Fifteen participants were enrolled and had blood samples drawn on the first day of their ICU stay. Clinical and demographic data is reported in TABLE 1. Most participants were male (73%). There was a diverse distribution in terms of race (60% not white) and ethnicity (60% Hispanic). The most common co-morbidity was hypertension, and the median BMI was almost 30. Forty percent of participants had ARDS at the time the samples was drawn, and the patients were distributed across the top of the ordinal scale with a score of 5 as the most common in 53% of the patients. Most participants required a ventilator (67%) and 20% progressed to extracorporeal membrane oxygenation (ECMO); 27% required renal replacement. The median length of hospital stay was 22 days with a mortality rate of 47%.
All samples had sufficient RNA and RNA integrity numbers (RIN) were adequate. See Fleige & Pfaffl, Molecular Aspects of Medicine, 27, 126-139 (2006). The median of sequencing was 125,687,784 reads (95% Cl 122,164,763 to 135,800,242) and greater than 90% of those reads were more than thirty bases. After using FastQC, all samples had mean quality scores over 30. The reads mapped to the human genome 62-66% of the time.
Identification of SARS-CoV-2 and other pathogens: Among the fifteen participant samples all participants had SARS-CoV-2 RNA detected. There was a total of 676 reads that align to the SARS-CoV-2 genome with each patient having between 18 and 98 reads. See
Genomic differences between participants who lived and those who died. Among participants who died there were 86 genes that increased in expression and 207 that decreased in expression (top results in TABLE 3. There were 88 significant alternative splicing events occurring in 84 unique genes (Top results TABLE 3) and 2093 alternative transcription events occurring in 1769 unique genes (Top results TABLE 3). ABCA13 was the only gene that had significant expression and alternative splicing events. Twenty-seven genes had significant expression and alternative transcription start/end differences. (TABLE 3) Eighteen genes had significantly different alternative splicing and alternative transcription start/end. (TABLE 3).
The genes that were significant between groups then underwent GO term analysis to assess significant enrichment for a biological process. The top GO terms for gene expression and alternative transcription are listed in TABLE 3. There were no significant GO terms for the genes impacted by alternative splicing.
RNA entropy as a diagnostic tool. From the over 100 million RNA sequencing reads for each participant, computational analysis via Whippet assigns an entropy value for over 380,000 RNA splicing events and alternative transcription start/end events. Principal component analysis was then applied to these >380.000 entropy scores for each of the fifteen participants and the first two principal components were plotted against each other (
Discussion. This project used deep RNA sequencing of whole blood from participants in the ICU with COVID-19 as a diagnostic tool. The protocol extracted RNA from the whole blood, as opposed to fractionating the whole blood specimen. Analysis of whole blood increased the breadth of RNA being sequenced, both cell associated and cell-free, and its simplicity for clinical practice. Alternatively, more complicated techniques, such as single cell sequencing may speak more to pathogenesis but adds to the complexity of the protocol and analysis. Despite its isolation from whole blood, the RNA was of high quality. A finding using RNA from whole blood from critically ill participants is that only 62-67% of the reads mapped to the human genome. This is less than the 85-97% of reads that typically map to the reference genome. See Sequencing Quality Control Consortium. Nature Biotechnology, 32, 903-914 (2014). One major drawback is the timing needed for RNA sequencing and analysis. Sequencing machines take ˜eighteen hours to generate data. The analysis can take additional time and is not yet clinically standardized. As technology advances and speed improves, this data can be increasingly accessible in the care of ICU patients.
SARS-CoV-2 RNA was identified in the unmapped reads in all patients (
Other authors have called for robust testing for potential co-infections with SARS-CoV-2. Lai. Wang. & Hsueh, Journal of Microbiology, Immunology, and Infection, 53, 505-512 (2020). With deep sequencing and computational analysis, the inventors have identified the RNA from multiple bacteria, viruses, and archaea in all of the specimens, as well as fungal RNA in two participants. This suggests deep RNA sequencing with computational analysis may be a tool for the identification of co-infections. More data is required with comparison to gold standards such as blood culture and pathogen-specific PCR. RNA sequencing has the benefit of being able to identify all pathogens with known genomes, including both RNA and DNA based organisms. Unclassified reads that do not align to any known organism (TABLE 2) or the other sequences that have cellular organism elements (TABLE 2) could provide evidence of pathogens before a genome is sequenced or the pathogen is cultured.
Critically ill COVID-19 patients provide a difficult clinical dilemma as it pertains to antibiotics. In severely ill patients, clinicians are more likely to prescribe antibiotics despite there not being an identified pathogen. Feng et al., American Journal of Respiratory and Critical Care Medicine (2020). With identification of bacteria known to cause human disease from the RNA sequencing data, appropriate antibiotics could be prescribed to these patients. In this data set, the inventors show that there were significantly more counts of Acinetobacter baumannii in a portion of patients. This bacterium has been associated with COVID-19. Sharifipour et al., BMC Infectious Diseases, 20, 646 (2020). Using a precision medicine approach with these data, patients with significantly elevated levels may potentially be treated with directed antibiotics, in the absence of more time-consuming positive culture data. While there was no difference in survival in participants with versus without identified bacteria in this study, antibiotic use was not standardized or prescribed prospectively based upon the results. Analysis of the unmapped reads aligning to Acinetobacter baumannii (averaging over 50.000 among the six with increased reads) could provide insights into genes that are expressed in critical illness and provide useful diagnostic and therapeutic targets.
The immune response to SARS-CoV-2 has been the focus of much research since the pandemic started. Poland, Ovsyannikova, & Kennedy, Lancet (London, England) (2020). The successful use of corticosteroids in the critically ill with COVID-19 emphasizes the importance of the immune system in this disease. Dexamethasone in Hospitalized Patients with Covid-19—Preliminary Report. New England Journal of Medicine (2020); Prescott & Rice, JAMA, 324, 1292-1295 (2020). Because a significant proportion of COVID-19 patients do not respond to corticosteroids, there are still calls for a more precise approach. Waterer & Rello, Infectious Diseases and Therapy (2020). PD-1 expression is increased in certain cell populations in patients with COVID-19. Bellesi et al., British Journal of Haematology (2020). But the uses of immune checkpoint inhibitors in cancer patients have been associated with more severe COVID-19. Robilotti et al., Nature Medicine 26: 1218-1223 (2020). Other authors suggest that immune checkpoint inhibitors may be useful in COVID-19. Vivarelli et al., Cancers 12 (2020). The data shows that patients who died had increased expression of PD-L1 and PD-L2 (FIG. E1, CD274 and PDCD1Lg1, TABLE 3). This suggests that immune checkpoint inhibitors targeted against the PD-1 system might be considered in those patients identified to have increased expression of PD-L1 and PD-L2 because of their higher risk of death after ICU admission.
Numerous other immune targets are identified from these genomic changes. N4BP1 is induced by interferon and the interferon response has been implicated in COVID-19. Hadjadj et al., Science (New York, N.Y.), 369, 718-724 (2020); Lei et al., Nature Commun., 11, 3810 (2020). The data supports the role for interferons in COVID-19 as patients who died had 2.5-fold increase in expression of interferon 1 alpha (IFNA1). Clinical features of COVID-19 also correlate with some of the genes identified. OR6C4 is an olfactory gene which the inventors identified has exhibiting a 5 fold increased in expression in patients that died (TABLE 3). This finding suggests that loss of smell may signify milder disease among patients in the ICU. Thrombotic complications are common in COVID-19 patients (9.5%) and patients admitted to the ICU have a higher incidence of venous thromboembolism. Al-Samkari et al., Blood (2020). Patients who died have significant decrease in gene expression and multiple changes in alternative transcription end (TABLE 3) of both NRP1 and NRP2. Both these genes are associated with coagulation. See Rossignol, Gagnon, & Klagsbrun. Genomics 70: 211-222 (2000). The COVID-19 spike protein binds both these receptors. Daly et al., Science (New York, N.Y., 2020). Previous work has shown that there is increased expression in both genes in the lungs of patients with COVID-19 when compared to controls. See Ackermann et al., The New England Journal of Medicine, 383, 120-128 (2020). Here, the decrease NRP1 and NRP2 were seen in ICU patients who died compared to ICU patients who survived.
Many studies have attempted to utilize clinical data to predict mortality in COVID-19. See Tian et al., Journal of Medical Virology (2020);_Zhang et al., Journal of Thrombosis and Haemostasis (JTH), 18, 1324-1329 (2020). Some focus on cytokines. McElvaney et al., EBioMedicine 61, 103026 (2020). For simplicity all these attempt to identify a few variables to predict mortality. Here the inventors utilize over 380,000 variables with PCA to create a figure that improves mortality prediction based upon where the patient is on the graph (
Despite the limitations of this single-center study with a small patient number, the inventors were still able to document that deep RNA sequencing and appropriate computational analysis yields valuable insight into the pathogenesis and host response of COVID-19 in critically ill patients. Useful drug targets were identified from SARS-CoV-2 RNA and the host response, including RNA dependent RNA polymerase, the N protein, and the PD-1 immune checkpoint pathway. The presence of pathogen RNA in the blood suggests co-infection should be reconsidered. Most importantly, PCA of the entropy of >380,000 events allowed use to group patients into those likely to die versus those likely to live, and this may be helpful in family discussions with critically ill patients. Translating these results to clinical practice can improve the diagnosis, assessment of prognosis, and therapy of COVID-19.
Critically ill patients develop acute respiratory distress syndrome (ARDS) and despite the study of genomics of ARDS, there is little progress. The drop in the cost of sequencing has refocused genetic studies from DNA to RNA sequencing and methods to analyze this data have improved. The objective of this investigation is to utilize RNA sequencing data and analysis to identify useful gene targets in ARDS.
The human cohort generated from the GTEx consortium consisted of 25 deceased patients with ARDS identified by the presence of diffuse alveolar damage (DAD), and 74 deceased patients evaluated to not have DAD. The mouse ARDS cohort included C57BL/6 mice ages 10-12 in a model previously described and compared to controls.
Alternatively spliced RNA arises from co/post-transcriptional events facilitated by the spliceosome, introns are removed to form the mature RNA from which protein isoforms are translated. Alternatively transcribed genes are the product of changes in promoter usage, polyadenylation signals, and RNA polymerase II interactions with DNA which can lead to changes in isoform usage like alternative splicing events. These are identified from the analysis of RNA sequencing data. Significant differentially alternatively transcribed genes and alternative spliced genes were identified and where alternative transcription may have separate roles in DAD/ARDS by regulating different genes to perform distinctive functions.
In this analysis of RNA sequencing data from deceased patients with ARDS identified by the presence of DAD and a clinically relevant mouse model of ARDS, useful genes were identified. Future research is needed using on the mechanism of alternative RNA splicing and alternative transcription start/end seen in ARDS, overlapped with genes previously reported as ARDS related. Of 89 reported ARDS related genes, 38 were confirmed in at least one differential category confirming that the use of humans and mice with DAD/ARDS is appropriate and robust (p=1.25e−14). Eleven previously reported genes were present in all categories. These eleven genes were evaluated for the change in alternative splicing and alternative transcription. GO term enrichment analysis was performed on the 11 overlapping genes, revealing 20 significant biological processes including ontology related to aging, and response to abiotic/environmental stimuli. There are 1639 genes that show overlap in alternative splicing and alternative transcription that were not previously in the literature. These genes were assessed for directionality alternative splicing and alternative transcription and GO terms should provide the foundation for future work in ARDS.
Studying the underlying changes in RNA processing (alternative splicing and alternative transcription start/end) not only expands basic knowledge of pathogenicity, but also provides additional targets for therapeutics. The most enriched GO term from the alternative splicing set, carboxy-terminal domain protein kinase complex (GO: 0032806) refers to phosphorylation of the CTD of RNA polymerase II, which is vital in the regulation of transcription and RNA processing. In addition, RNA polymerase complex binding (GO: 0000993), and transport of the SLBP Independent/Dependent mature mRNA (R-HSA-159227; R-HSA-159230) are among the most enriched. This suggests alternative pre-mRNA splicing plays the dominate role in isoform usage in genes where expressions levels do not change, whereas alternative transcription may regulate isoform usage in genes that are more dynamically expressed during critical illness. Although it is possible the enrichment reflects down regulation through inhibitory genes, these data support the hypothesis that alternative splicing. Although it is possible the enrichment reflects down regulation through inhibitory genes, these data support the hypothesis that alternative splicing and alternative transcription may have separate roles in DAD/ARDS by regulating different genes to perform distinctive functions.
In this analysis of RNA sequencing data from deceased patients with ARDS identified by the presence of DAD and a clinically relevant mouse model of ARDS, useful genes were identified. Future research is needed using on the mechanism of alternative RNA splicing and alternative transcription start/end seen in ARDS.
To translate the work described above, where SARS-CoV-2 was identified in the blood of patients using this methodology, the inventors again showed that they do this for other infections, specifically the bacterial infection Escherichia coli.
In a patient with a known Escherichia coli infection the blood of that patient was sequenced to a depth of >100 million reads. Sequencing data was aligned using STAR aligner with standard parameters to the human genome. Unmapped reads were extracted and then aligned to the Escherichia coli genome (Escherichia coli O25b:H4-ST131).
The reads aligned to Escherichia coli in a patients with an Escherichia coli infection. See TABLE 7.
Next, target genes were identified to create custom PCR primers for identification of the pathogen. See TABLE 8.
A person of ordinary skill in the biomedical art of can use these patents, patent applications, and scientific references as guidance to predictable results when making and using the invention:
Specific compositions and methods of RNA sequencing to diagnose sepsis have been described. The detailed description in this specification is illustrative and not restrictive or exhaustive. The detailed description is not intended to limit the disclosure to the precise form disclosed. Other equivalents and modifications besides those already described are possible without departing from the inventive concepts described in this specification, as those skilled in the art will recognize. When the specification or claims recite method steps or functions in order, alternative embodiments may perform the tasks in a different order or substantially concurrently. The inventive subject matter is not to be restricted except in the spirit of the disclosure.
When interpreting the disclosure, all terms should be interpreted in the broadest possible manner consistent with the context. Unless otherwise defined, all technical and scientific terms used in this specification have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. This invention is not limited to the particular methodology, protocols, reagents, and the like described in this specification and, as such, can vary in practice. The terminology used in this specification is not intended to limit the scope of the invention, which is defined solely by the claims.
All patents and publications cited throughout this specification are expressly incorporated by reference to disclose and describe the materials and methods that might be used with the technologies described in this specification. The publications discussed are provided solely for their disclosure before the filing date. They should not be construed as an admission that the inventors may not antedate such disclosure under prior invention or for any other reason. If there is an apparent discrepancy between a previous patent or publication and the description provided in this specification, the present specification (including any definitions) and claims shall control. All statements as to the date or representation as to the contents of these documents are based on the information available to the applicants and constitute no admission as to the correctness of the dates or contents of these documents. The dates of publication provided in this specification may differ from the actual publication dates. If there is an apparent discrepancy between a publication date provided in this specification and the actual publication date supplied by the publisher, the actual publication date shall control.
The terms comprise and comprising should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, used, or combined with other elements, components, or steps. The singular terms a, an, and the include plural referents unless context indicates otherwise. Similarly, the word or should cover and unless the context indicates otherwise. The abbreviation e.g., is used to indicate a non-limiting example and is synonymous with the term for example.
When a range of values is provided, each intervening value, to the tenth of the unit of the lower limit, unless the context dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that range of values.
Some embodiments of the technology described can be defined according to the following numbered paragraphs:
1. A method of using unmapped bacterial RNA reads to identify bacteria causing sepsis.
2. A method of using unmapped viral reads to identify sepsis or viral reactivation.
3. A method of using unmapped B/T V(D)J to identify sepsis.
4. A method of using a Principal Component Analysis of RNA splicing entropy to identify sepsis.
5. A method of using RNA lariats to identify sepsis.
6. A method of using a Principal Component Analysis of gene expression, alternative RNA splicing, or alternative transcription start and end to identify sepsis.
This patent matter claims priority under 35 U.S.C. § 119(e), to U.S. Ser. No. 63/176,531, filed Apr. 19, 2021, and 63/184,583, filed May 5, 2021, the contents of both of which are incorporated by reference.
This invention was made with government support under P20 GM103652, T32 HL134625, R35 GM142638, P20 GM121344 awarded by National Institutes of Health. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
63176531 | Apr 2021 | US | |
63184583 | May 2021 | US |