Early and accurate diagnosis of infection is key to improving patient outcomes and reducing antibiotic resistance. The mortality rate of bacterial sepsis increases 8% for each hour by which antibiotics are delayed; however, giving antibiotics to patients without bacterial infections increases rates of morbidity and antimicrobial resistance. The rate of inappropriate antibiotic prescriptions in the hospital setting is estimated at 30-50%, and would be aided by improved diagnostics. Strikingly, close to 95% of patients given antibiotics for suspected enteric fever have negative cultures. There is currently no gold-standard point of care diagnostic that can broadly determine the presence and type of infection. Thus, the White House has established a National Action Plan for Combating Antibiotic-Resistant Bacteria, which called for “point-of-need diagnostic tests to distinguish rapidly between bacterial and viral infections”.
While come PCR-based molecular diagnostics can profile pathogens directly from a blood culture, such methods rely on the presence of adequate numbers of pathogens in the blood. Moreover, they are limited to detecting a discrete range of pathogens. As a result, there is growing interest in molecular diagnostics that profile the host gene response. These include diagnostics that can distinguish the presence of infection as compared to inflamed but non-infected patients. Overall, while great promise has been shown in this field, no host gene expression infection diagnostic has yet made it into clinical practice.
There remains a need for sensitive and specific diagnostic tests that can distinguish between bacterial and viral infections.
Patients can be classified as having a viral infection or bacterial infection based on the expression of eight genes: by JUP, SUCLG2, IFI27, FCER1A, HESX1, SMARCD3, ICAM1, and EBI3. Increased JUP, SUCLG2, IFI27, FCER1A, HESX1 expression indicates that the subject has a viral infection and increased SMARCD3, ICAM1, EBI3 indicates that the subject has a bacterial infection.
In some embodiments a method of analyzing a sample is provided. This method may comprise: (a) obtaining a sample of RNA from a subject; and (b) measuring the amount of RNA transcripts encoded by JUP, SUCLG2, IFI27, FCER1A, HESX1, SMARCD3, ICAM1, and EBI3 in the sample, to produce gene expression data. This method may further comprise, based on the gene expression data, providing a report indicating whether the subject has a viral infection or a bacterial infection, wherein: (i) increased JUP, SUCLG2, IFI27, FCER1A, HESX1 expression indicates that the subject has a viral infection; and (ii) increased SMARCD3, ICAM1, EBI3 indicates that the subject has a bacterial infection.
In some embodiments, a method of treatment is provided. In these embodiments, the method may comprise (a) receiving a report indicating whether the subject has a viral infection or a bacterial infection, wherein the report is based on the gene expression data obtained by measuring the amount of RNA transcripts encoded by JUP, SUCLG2, IFI27, FCER1A, HESX1, SMARCD3, ICAM1, and EBI3, and (b) identifying the patient as having increased JUP, SUCLG2, IFI27, and FCER1A, and HESX1 expression, and treating the subject with anti-viral therapy; or (c) identifying the patient as having increased SMARCD3, ICAM1, EBI3 expression; and treating the subject with an anti-bacterial therapy.
Kits for performing the method are also provided.
The invention is best understood from the following detailed description when read in conjunction with the accompanying drawings. It is emphasized that, according to common practice, the various features of the drawings are not to-scale. On the contrary, the dimensions of the various features are arbitrarily expanded or reduced for clarity. Included in the drawings are the following figures:
The practice of the present invention will employ, unless otherwise indicated, conventional methods of pharmacology, chemistry, biochemistry, recombinant DNA techniques and immunology, within the skill of the art. Such techniques are explained fully in the literature. See, e.g., Handbook of Experimental Immunology, Vols. I-IV (D. M. Weir and C. C. Blackwell eds., Blackwell Scientific Publications); A. L. Lehninger, Biochemistry (Worth Publishers, Inc., current addition); Sambrook, et al., Molecular Cloning: A Laboratory Manual (2nd Edition, 1989); Methods In Enzymology (S. Colowick and N. Kaplan eds., Academic Press, Inc.).
All publications, patents and patent applications cited herein, whether supra or infra, are hereby incorporated by reference in their entireties.
Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limits of that range is also specifically disclosed. Each smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in that stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included or excluded in the range, and each range where either, neither or both limits are included in the smaller ranges is also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, some potential and preferred methods and materials are now described. All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited. It is understood that the present disclosure supersedes any disclosure of an incorporated publication to the extent there is a contradiction.
As will be apparent to those of skill in the art upon reading this disclosure, each of the individual embodiments described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present invention. Any recited method can be carried out in the order of events recited or in any other order which is logically possible.
It must be noted that, as used in this specification and the appended claims, the singular forms “a”, “an” and “the” include plural referents unless the content clearly dictates otherwise. Thus, for example, reference to “an agonist” includes a mixture of two or more such agonists, and the like.
The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates which may need to be independently confirmed.
As noted above, a method of analyzing a sample is provided. In some embodiments the method comprises (a) obtaining a sample of RNA from a subject; and (b) measuring the amount of RNA transcripts encoded by JUP, SUCLG2, IFI27, FCER1A, HESX1, SMARCD3, ICAM1, and EBI3 in the sample, to produce gene expression data. The method may be used in a variety of diagnostic and therapeutic methods, as described below.
Diagnostic Methods
As noted above, the method may be used to determine if a subject has a viral infection or bacterial infection. In some embodiments, the method may comprise: (a) obtaining a sample of RNA from a subject; (b) measuring the amount of RNA transcripts encoded by JUP, SUCLG2, IFI27, FCER1A, HESX1, SMARCD3, ICAM1, and EBI3 in the sample, to produce gene expression data and (c) providing a report indicating whether the subject has a viral infection or a bacterial infection, wherein: (i) increased JUP, SUCLG2, IFI27, FCER1A, HESX1 expression indicates that the subject has a viral infection; and (ii) increased SMARCD3, ICAM1, EBI3 indicates that the subject has a bacterial infection.
The measuring step can be done using any suitable method. For example, the amount of the RNA transcripts in the sample may be measured by RNA-seq (see, e.g., Morin et al BioTechniques 2008 45: 81-94; Wang et al 2009 Nature Reviews Genetics 10: 57-63), RT-PCR (Freeman et al BioTechniques 1999 26: 112-22, 124-5), or by labeling the RNA or cDNA made from the same and hybridizing the labeled RNA or cDNA to an array. An array may contain spatially-addressable or optically-addressable sequence-specific oligonucleotide probes that specifically hybridize to transcripts being measured, or cDNA made from the same. Spatially-addressable arrays (which are commonly referred to as “microarrays” in the art) are described in, e.g., Sealfon et al (see, e.g., Methods Mol Biol. 2011; 671:3-34). Optically-addressable arrays (which are commonly referred to as “bead arrays” in the art) use beads that internally dyed with fluorophores of differing colors, intensities and/or ratios such that the beads can be distinguished from each other, where the beads are also attached to an oligonucleotide probe. Exemplary bead-based assays are described in Dupont et al (J. Reprod Immunol. 2005 66:175-91) and Khalifian et al (J Invest Dermatol. 2015 135: 1-5). The abundance of transcripts in a sample can also be analyzed by quantitative RT-PCR or isothermal amplification method such as those described in Gao et al (J. Virol Methods. 2018 255: 71-75), Pease et al (Biomed Microdevices (2018) 20: 56) or Nixon et (Biomol. Det. and Quant 2014 2: 4-10), for example. Many other methods for measuring the amount of an RNA transcript in a sample are known in the art.
The sample of RNA obtained from the subject may comprise RNA isolated from whole blood, white blood cells, peripheral blood mononuclear cells (PBMC), neutrophils or buffy coat, for example. Methods for making total RNA, polyA+ RNA, RNA that has been depleted for abundant transcripts, and RNA that has been enriched for the transcripts being measured are well known (see, e.g., Hitchen et al J Biomol Tech. 2013 24: S43-S44). If the method involves making cDNA from the RNA, then the cDNA may be made using an oligo(d)T primer, a random primer or a population of gene-specific primers that hybridize to the transcripts being analyzed.
In measuring the transcript, the absolute amount of each transcript may be determined, or the amount of each transcript relative to one or more control transcript may be determined. Whether the amount of a transcript is increased or decreased may be in relation to the amount of the transcript (e.g., the average amount of the transcript) in control samples (e.g., in blood samples collected from a population of at least 100, at least 200, or at least 500 subjects that are known or not known to have viral and/or bacterial infections).
In some embodiments, the method may comprise providing a report indicating whether the subject has a viral or bacterial infection based on the measurements of the amounts of the transcripts. In some embodiments, this step may involve calculating a score based on the weighted amounts of each of the transcripts, where the scores correlates with the phenotype and can be a number such as a probability, likelihood or score out of 10, for example. In these embodiments, the method may comprise inputting the amounts of each of the transcripts into one or more algorithms, executing the algorithms, and receiving a score for each phenotype based on the calculations. In these embodiments, other measurements from the subject, e.g., whether the subject is male, the age of the subject, white blood cell count, neutrophils count, band count, lymphocyte count, monocyte count, whether the subject is immunosuppressed, and/or whether there are Gram-negative bacteria present, etc., may be input into the algorithm.
In some embodiments, the method may involve creating the report e.g., in an electronic form, and forwarding the report to a doctor or other medical professional to help identify a suitable course of action, e.g., to identify a suitable therapy for the subject. The report may be used along with other metrics as a diagnostic to determine whether the subject has a viral of bacterial infection.
In any embodiment, report can be forwarded to a “remote location”, where “remote location,” means a location other than the location at which the image is examined. For example, a remote location could be another location (e.g., office, lab, etc.) in the same city, another location in a different city, another location in a different state, another location in a different country, etc. As such, when one item is indicated as being “remote” from another, what is meant is that the two items can be in the same room but separated, or at least in different rooms or different buildings, and can be at least one mile, ten miles, or at least one hundred miles apart. “Communicating” information references transmitting the data representing that information as electrical signals over a suitable communication channel (e.g., a private or public network). “Forwarding” an item refers to any means of getting that item from one location to the next, whether by physically transporting that item or otherwise (where that is possible) and includes, at least in the case of data, physically transporting a medium carrying the data or communicating the data. Examples of communicating media include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the internet or including email transmissions and information recorded on websites and the like. In certain embodiments, the report may be analyzed by an MD or other qualified medical professional, and a report based on the results of the analysis of the image may be forwarded to the subject from which the sample was obtained.
In computer-related embodiments, a system may include a computer containing a processor, a storage component (i.e., memory), a display component, and other components typically present in general purpose computers. The storage component stores information accessible by the processor, including instructions that may be executed by the processor and data that may be retrieved, manipulated or stored by the processor.
The storage component includes instructions for determining whether the subject has a viral or bacterial infection using the measurements described above as inputs. The computer processor is coupled to the storage component and configured to execute the instructions stored in the storage component in order to receive patient data and analyze patient data according to one or more algorithms. The display component may display information regarding the diagnosis of the patient.
The storage component may be of any type capable of storing information accessible by the processor, such as a hard-drive, memory card, ROM, RAM, DVD, CD-ROM, USB Flash drive, write-capable, and read-only memories. The processor may be any well-known processor, such as processors from Intel Corporation. Alternatively, the processor may be a dedicated controller such as an ASIC.
The instructions may be any set of instructions to be executed directly (such as machine code) or indirectly (such as scripts) by the processor. In that regard, the terms “instructions,” “steps” and “programs” may be used interchangeably herein. The instructions may be stored in object code form for direct processing by the processor, or in any other computer language including scripts or collections of independent source code modules that are interpreted on demand or compiled in advance.
Data may be retrieved, stored or modified by the processor in accordance with the instructions. For instance, although the diagnostic system is not limited by any particular data structure, the data may be stored in computer registers, in a relational database as a table having a plurality of different fields and records, XML documents, or flat files. The data may also be formatted in any computer-readable format such as, but not limited to, binary values, ASCII or Unicode. Moreover, the data may comprise any information sufficient to identify the relevant information, such as numbers, descriptive text, proprietary codes, pointers, references to data stored in other memories (including other network locations) or information which is used by a function to calculate the relevant data.
Therapeutic Methods
Therapeutic methods are also provided. In some embodiments, these methods may comprise identifying a subject as having a viral infection or a bacterial infection using the methods described above, and treating a subject based on whether the subject is indicated as having a viral infection or bacterial infection. In some embodiments, this method may comprise receiving a report indicating whether the subject has a viral infection or a bacterial infection, wherein the report is based on the gene expression data obtained by measuring the amount of RNA transcripts encoded by JUP, SUCLG2, IFI27, FCER1A, HESX1, SMARCD3, ICAM1, and EBI3, and treating a subject based on whether the subject is indicated as having an viral infection or bacterial infection. In some embodiments the method may comprise: (a) identifying the patient as having increased JUP, SUCLG2, IFI27, and FCER1A, and HESX1 expression, and treating the subject with anti-viral therapy; or (b) identifying the patient as having increased SMARCD3, ICAM1, EBI3 expression, and treating the subject with an anti-bacterial therapy.
A subject indicated as having a viral infection may be treated by administering a therapeutically effective dose of an antiviral agent, such as a broad-spectrum antiviral agent, an antiviral vaccine, a neuraminidase inhibitor (e.g., zanamivir (Relenza) and oseltamivir (Tamiflu)), a nucleoside analogue (e.g., acyclovir, zidovudine (AZT), and lamivudine), an antisense antiviral agent (e.g., phosphorothioate antisense antiviral agents (e.g., Fomivirsen (Vitravene) for cytomegalovirus retinitis), morpholino antisense antiviral agents), an inhibitor of viral uncoating (e g, Amantadine and rimantadine for influenza, Pleconaril for rhinoviruses), an inhibitor of viral entry (e.g., Fuzeon for HIV), an inhibitor of viral assembly (e.g., Rifampicin), or an antiviral agent that stimulates the immune system (e.g., interferons). Exemplary antiviral agents include Abacavir, Aciclovir, Acyclovir, Adefovir, Amantadine, Amprenavir, Ampligen, Arbidol, Atazanavir, Atripla (fixed dose drug), Balavir, Cidofovir, Combivir (fixed dose drug), Dolutegravir, Darunavir, Delavirdine, Didanosine, Docosanol, Edoxudine, Efavirenz, Emtricitabine, Enfuvirtide, Entecavir, Ecoliever, Famciclovir, Fixed dose combination (antiretroviral), Fomivirsen, Fosamprenavir, Foscarnet, Fosfonet, Fusion inhibitor, Ganciclovir, Ibacitabine, Imunovir, Idoxuridine, Imiquimod, Indinavir, Inosine, Integrase inhibitor, Interferon type III, Interferon type II, Interferon type I, Interferon, Lamivudine, Lopinavir, Loviride, Maraviroc, Moroxydine, Methisazone, Nelfinavir, Nevirapine, Nexavir, Nitazoxanide, Nucleoside analogues, Novir, Oseltamivir (Tamiflu), Peginterferon alfa-2a, Penciclovir, Peramivir, Pleconaril, Podophyllotoxin, Protease inhibitor, Raltegravir, Reverse transcriptase inhibitor, Ribavirin, Rimantadine, Ritonavir, Pyramidine, Saquinavir, Sofosbuvir, Stavudine, Synergistic enhancer (antiretroviral), Telaprevir, Tenofovir, Tenofovir disoproxil, Tipranavir, Trifluridine, Trizivir, Tromantadine, Truvada, Valaciclovir (Valtrex), Valganciclovir, Vicriviroc, Vidarabine, Viramidine, Zalcitabine, Zanamivir (Relenza), and Zidovudine.
A subject indicated as having a bacterial infection may be treated by administering a therapeutically effective dose of an antibiotic. Antibiotics may include broad spectrum, bactericidal, or bacteriostatic antibiotics. Exemplary antibiotics include aminoglycosides such as Amikacin, Amikin, Gentamicin, Garamycin, Kanamycin, Kantrex, Neomycin, Neo-Fradin, Netilmicin, Netromycin, Tobramycin, Nebcin, Paromomycin, Humatin, Streptomycin, Spectinomycin(Bs), and Trobicin; ansamycins such as Geldanamycin, Herbimycin, Rifaximin, and Xifaxan; carbacephems such as Loracarbef and Lorabid; carbapenems such as Ertapenem, Invanz, Doripenem, Doribax, Imipenem/Cilastatin, Primaxin, Meropenem, and Merrem; cephalosporins such as Cefadroxil, Duricef, Cefazolin, Ancef, Cefalotin or Cefalothin, Keflin, Cefalexin, Keflex, Cefaclor, Distaclor, Cefamandole, Mandol, Cefoxitin, Mefoxin, Cefprozil, Cefzil, Cefuroxime, Ceftin, Zinnat, Cefixime, Cefdinir, Cefditoren, Cefoperazone, Cefotaxime, Cefpodoxime, Ceftazidime, Ceftibuten, Ceftizoxime, Ceftriaxone, Cefepime, Maxipime, Ceftaroline fosamil, Teflaro, Ceftobiprole, and Zeftera; glycopeptides such as Teicoplanin, Targocid, Vancomycin, Vancocin, Telavancin, Vibativ, Dalbavancin, Dalvance, Oritavancin, and Orbactiv; lincosamides such as Clindamycin, Cleocin, Lincomycin, and Lincocin; lipopeptides such as Daptomycin and Cubicin; macrolides such as Azithromycin, Zithromax, Sumamed, Xithrone, Clarithromycin, Biaxin, Dirithromycin, Dynabac, Erythromycin, Erythocin, Erythroped, Roxithromycin, Troleandomycin, Tao, Telithromycin, Ketek, Spiramycin, and Rovamycine; monobactams such as Aztreonam and Azactam; nitrofurans such as Furazolidone, Furoxone, Nitrofurantoin, Macrodantin, and Macrobid; oxazolidinones such as Linezolid, Zyvox, VRSA, Posizolid, Radezolid, and Torezolid; penicillins such as Penicillin V, Veetids (Pen-Vee-K), Piperacillin, Pipracil, Penicillin G, Pfizerpen, Temocillin, Negaban, Ticarcillin, and Ticar; penicillin combinations such as Amoxicillin/clavulanate, Augmentin, Ampicillin/sulbactam, Unasyn, Piperacillin/tazobactam, Zosyn, Ticarcillin/clavulanate, and Timentin; polypeptides such as Bacitracin, Colistin, Coly-Mycin-S, and Polymyxin B; quinolones/fluoroquinolones such as Ciprofloxacin, Cipro, Ciproxin, Ciprobay, Enoxacin, Penetrex, Gatifloxacin, Tequin, Gemifloxacin, Factive, Levofloxacin, Levaquin, Lomefloxacin, Maxaquin, Moxifloxacin, Avelox, Nalidixic acid, NegGram, Norfloxacin, Noroxin, Ofloxacin, Floxin, Ocuflox Trovafloxacin, Trovan, Grepafloxacin, Raxar, Sparfloxacin, Zagam, Temafloxacin, and Omniflox; sulfonamides such as Amoxicillin, Novamox, Amoxil, Ampicillin, Principen, Azlocillin, Carbenicillin, Geocillin, Cloxacillin, Tegopen, Dicloxacillin, Dynapen, Flucloxacillin, Floxapen, Mezlocillin, Mezlin, Methicillin, Staphcillin, Nafcillin, Unipen, Oxacillin, Prostaphlin, Penicillin G, Pentids, Mafenide, Sulfamylon, Sulfacetamide, Sulamyd, Bleph-10, Sulfadiazine, Micro-Sulfon, Silver sulfadiazine, Silvadene, Sulfadimethoxine Di-Methox, Albon, Sulfamethizole, Thiosulfil Forte, Sulfamethoxazole, Gantanol, Sulfanilimide, Sulfasalazine, Azulfidine, Sulfisoxazole, Gantrisin, Trimethoprim-Sulfamethoxazole (Co-trimoxazole) (TMP-SMX), Bactrim, Septra, Sulfonamidochrysoidine, and Prontosil; tetracyclines such as Demeclocycline, Declomycin, Doxycycline, Vibramycin, Minocycline, Minocin, Oxytetracycline, Terramycin, Tetracycline and Sumycin, Achromycin V, and Steclin; drugs against mycobacteria such as Clofazimine, Lamprene, Dapsone, Avlosulfon, Capreomycin, Capastat, Cycloserine, Seromycin, Ethambutol, Myambutol, Ethionamide, Trecator, Isoniazid, I.N.H., Pyrazinamide, Aldinamide, Rifampicin, Rifadin, Rimactane, Rifabutin, Mycobutin, Rifapentine, Priftin, and Streptomycin; others antibiotics such as Arsphenamine, Salvarsan, Chloramphenicol, Chloromycetin, Fosfomycin, Monurol, Monuril, Fusidic acid, Fucidin, Metronidazole, Flagyl, Mupirocin, Bactroban, Platensimycin, Quinupristin/Dalfopristin, Synercid, Thiamphenicol, Tigecycline, Tigacyl, Tinidazole, Tindamax Fasigyn, Trimethoprim, Proloprim, and Trimpex.
Methods for administering and dosages for administering the therapeutics listed above are known in the art or can be derived from the art.
Kits
Also provided by this disclosure are kits for practicing the subject methods, as described above. In some embodiments, the kit may contain reagents for measuring the amount of RNA transcripts encoded by JUP, SUCLG2, IFI27, FCER1A, HESX1, SMARCD3, ICAM1, and EBI3. In some embodiments, the kit may comprise, for each RNA transcript, a sequence-specific oligonucleotide that hybridizes to the transcript. In some embodiments, the sequence-specific oligonucleotide may be biotinylated and/or labeled with an optically-detectable moiety. In some embodiments, the kit may comprise, for each RNA transcript, a pair of PCR primers that amplify a sequence from the RNA transcript, or cDNA made from the same. In some embodiments, the kit may comprise an array of oligonucleotide probes, wherein the array comprises, for each RNA transcript, at least one sequence-specific oligonucleotide that hybridizes to the transcript. The oligonucleotide probes may be spatially addressable on the surface of a planar support, or tethered to optically addressable beads, for example.
In embodiments in which a quantitative isothermal amplification method is used, the kit may comprise reagents comprise multiple reaction vessels, each vessel comprising at least one (e.g., 2, 3, 4, 5, or 6) sequence-specific isothermal amplification primer that hybridizes to a single transcript, e.g., a transcript from a single gene selected from JUP, SUCLG2, IFI27, FCER1A, HESX1, SMARCD3, ICAM1, and EBI3, or cDNA made from the same. As such, in some embodiments, the kit may contain at least 8 reaction vessels, where each reaction vessels contain one or more primers for detection of an RNA transcript encoded by a single gene. In some embodiments, the kit may contain reagents for measuring the amount of up to a total of 30 or 50 RNA transcripts.
In some embodiments, the kit may contain reagents for measuring the amount of RNA transcripts of a set of any number of genes (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, or at least 7 genes, up to 30 or 50 genes), where the set of genes includes any pair of genes listed in Table 2 as well as optionally other genes (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, or at least 7 other genes) that independently are or are not listed on Table 1. For example, the kit may comprise, for each RNA transcript, a pair of PCR primers that amplify a sequence from the RNA transcript, or cDNA made from the same.
The various components of the kit may be present in separate containers or certain compatible components may be precombined into a single container, as desired.
In addition to the above-mentioned components, the subject kit may further include instructions for using the components of the kit to practice the subject method.
In any embodiment, the method can be practiced by measuring the amount of RNA transcripts encoded by than the eight listed genes, e.g., by measuring the amount of RNA transcripts encoded by 2, 3, 4, 5, 6, or 7 of the listed genes. The total number of transcripts measured in some embodiments may be 30 or 50 RNA in some embodiments.
In addition, other genes can be analyzed in addition to the eight listed genes or subset thereof. For example, in any embodiment, the method may further comprise measuring the amount of RNA transcripts of other genes listed in Table 1 below.
In some embodiments, the method may be practiced by measuring the amount of RNA transcripts of a set of any number of genes (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, or at least 7 genes, up to 30 or 50 genes), where the set of genes includes any pair of genes listed in Table 2 as well as optionally other genes (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, or at least 7 other genes) that are independently listed or not listed in Table 1.
In some embodiments, the method may further comprise measuring the amount of RNA transcripts encoded by CEACAM1, ZDHHC19, C9orf95, GNA15, BATF, C3AR1, KIAA1370, TGFBI, MTCH1, RPGRIP1, and HLA-DPB1 in addition to the listed genes. In these embodiments, increased expression of the CEACAM1, ZDHHC19, C9orf95, GNA15, BATF, and C3AR1 biomarkers and decreased expression of the KIAA1370, TGFBI, MTCH1, RPGRIP1, and HLA-DPB1 indicate that the subject has sepsis as described in WO2016145426. Thus, the present method can be used as an integrated decision model for the treatment of both bacterial and viral infections.
The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the present invention and are not intended to limit the scope of what the inventors regard as their invention nor are they intended to represent that the experiments below are all or the only experiments performed. Efforts have been made to ensure accuracy with respect to numbers used (e.g. amounts, temperature, etc.) but some experimental errors and deviations should be accounted for. Unless indicated otherwise, parts are parts by weight, molecular weight is weight average molecular weight, temperature is in degrees Celsius, and pressure is at or near atmospheric. Standard abbreviations may be used, e.g., room temperature (RT); base pairs (bp); kilobases (kb); picoliters (pl); seconds (s or sec); minutes (m or min); hours (h or hr); days (d); weeks (wk or wks); nanoliters (nl); microliters (ul); milliliters (ml); liters (L); nanograms (ng); micrograms (ug); milligrams (mg); grams ((g), in the context of mass); kilograms (kg); equivalents of the force of gravity ((g), in the context of centrifugation); nanomolar (nM); micromolar (uM), millimolar (mM); molar (M); amino acids (aa); kilobases (kb); base pairs (bp); nucleotides (nt); intramuscular (i.m.); intraperitoneal (i.p.); subcutaneous (s.c.); and the like.
Materials and Methods
Systematic Datasets Search
A systematic search was performed in NIH Gene Expression Omnibus (GEO) and European Bioinformatics Institute (EBI) ArrayExpress for public human microarray genome-wide expression studies of pneumonia or other respiratory infections. Datasets were excluded if they (i) were nonclinical, (ii) were performed using tissues other than whole blood or PBMCs, (iii) did not have at least 4 healthy samples, or (iv) did not have sufficient pathogen labels to identify whether the causal agent was bacterial or viral.
All microarray data were renormalized from raw data (when available) using standard methods. Affymetrix arrays were normalized using GC robust multiarray average (gcRMA) (on arrays with mismatch probes) or RMA. Illumina, Agilent, GE, and other commercial arrays were normalized via normal-exponential background correction followed by quantile normalization. Custom arrays were not renormalized and were used as is. Data were log2-transformed, and a fixed-effect model was used to summarize probes to genes within each study. Within each study, cohorts assayed with different microarray types were treated as independent.
COCONUT Conormalization
Out of 43 datasets that matched inclusion criteria and profiled respiratory infections, only 12 of these datasets contained both bacterial and viral infections, and only a single one contained intracellular bacterial, extracellular bacterial, and viral infections. Because of the difference in background measurements for these different arrays (owing to the use of different platforms), it is difficult to conduct analyses between all 43 datasets without getting significantly skewed results due to the batch effects. In order to make use of these data, Combat CO-Normalization Using conTrols (COCONUT)2, which allows for co-normalization of expression data without changing the distribution of genes between studies and without any bias towards sample diagnosis, was used. It applies a modified version of the ComBat empirical Bayes normalization method26 that only assumes an equal distribution between control samples. Briefly, the healthy controls from each cohort undergo ComBat conormalization without covariates, and the ComBat estimated parameters are acquired for each dataset's healthy samples. These parameters are then applied to the diseased samples in each dataset, which causes all samples to assume the same background distribution while still retaining the relative distance between healthy and diseased samples in each dataset.
Calculation of Signature Score
A previously described signature score1,2,5,24,25 was used to perform disease classification. The signature score (Si) is calculated as the geometric mean of the genes that are positively correlated with the response variable (in this case, bacterial infections) minus the geometric mean of the negatively correlated genes (Eq. 1).
Abridged Best Subset Selection
This method combines a greedy backward search with an exhaustive search. Performing a greedy search alone would be computationally feasible, but because of the nature of the greedy algorithm it does not ensure that the best possible combination of genes for diagnostic purposes is found. On the other hand, because best subset selection is an exhaustive search, it will always select the optimal combination of genes; however, the computational cost of best subset selection increases exponentially, so running it on more than ˜20 genes was infeasible. The Abridged Best Subset Selection (Abridged BSS) is a way to combine the strengths of both of these methods.
First, a greedy backward search on the initial gene list was run. Briefly, the search involves taking the starting gene set and calculating the AUROC after individually removing each of the genes. The search further involves identifying which gene's removal leads to the largest increase in AUROC, and then permanently removing that gene from the set. This same strategy is then applied to the new gene set, once again removing the gene whose exclusion results in the largest increase in AUROC. In a typical greedy backward search, this step would be repeated until a point where removing any gene results in a reduction of AUROC that is greater than some pre-defined threshold is reached. However, in this case, the greedy backward search is simply run until enough genes are eliminated to be able to perform best subset selection (in this case, this cutoff was 20 genes).
The best subset selection can be run on the abridged gene list. Briefly, the diagnostic power of every possible combination of the genes is assessed by calculating the signature scores for each combination and reporting the corresponding AUROC. Next, for every unique number of total genes, the subset of genes that produces the best AUC is reported. This results in a list of the best signatures for each number of total genes, from which the final gene signature can be selected.
Derivation of the 8 Gene Signature Using MANATEE
The Discovery respiratory infection cohorts were analyzed using Multicohort ANalysis of AggregaTed gEne Expression or MANATEE (
Next, the top 100 genes with the highest SAM score were selected. In order to select only those genes that were highly diagnostic, an Abridged BSS (described above) was performed on these genes. From the results of the Abridged BSS, a 15-gene signature (the signature with the max AUROC) and an 8-gene signature (the smallest signature that was within the 95% CI of the max AUROC signature) were selected to test in Hold-out Validation. Both signatures had equivalent AUROCs, so the 8-gene signature was chosen for next steps.
The systematic search for gene expression microarray or RNA-seq cohorts that profiled patients with intracellular bacterial, extracellular bacterial, or viral infections resulting in febrile symptoms3,4 identified 43 whole blood (WB) cohorts and 9 peripheral blood mononuclear cell (PBMC) that met the inclusion criteria.5-22 The 43 independent WB cohorts were comprised of 1963 non-healthy patient samples (562 extracellular bacterial infections, 320 intracellular bacterial infections, and 1081 viral infections), whereas the 9 independent PBMC cohorts were comprised of 417 non-healthy patient samples (172 extracellular bacterial infections, 11 intracellular bacterial infections, and 234 viral infections). These data included both children and adults from a broad spectrum of geographic regions. 28 WB datasets consisting of 1419 infected samples (348 extracellular bacterial infections, 280 intracellular bacterial infections, and 791 viral infections) were used as discovery cohorts, and the remaining 15 WB datasets consisting of 544 non-healthy samples (214 extracellular bacterial infections, 40 intracellular bacterial infections, and 290 viral infections) were used as independent validation cohorts. Four datasets (3 WB and 1 PBMC) that had no healthy samples, but that had patients with bacterial or viral infections, which were used as independent validation cohorts, were identified.
Selecting Top Differentially Expressed Genes with MANATEE
In order to utilize all of the data that had been collected, a multicohort analysis framework called Multicohort ANalysis of AggregaTed gEne Expression (MANATEE) (
JUP
SUCLG2
SMARCD3
ICAM1
IFI27
FCER1A
HESX1
EBI3
Deriving the 8 Gene Signature with MANATEE
The next step involved running an Abridged Best Subset Selection (Abridged BSS) on the list of 100 genes, which consists of first running a greedy backward search to select the top 20 best genes, and then running an exhaustive search on those 20 genes. Running the Abridged BSS on the current gene list allowed identification of most important genes within the signature for distinguishing bacterial and viral infections. From the results of the Abridged BSS, two signatures were selected for testing: the signature that had the maximum AUROC in Discovery [15 genes, AUROC=0.951 (95% CI 0.939 to 0.964)] and the smallest signature that was within the 95% confidence interval of the max AUROC signature [8 genes, AUROC=0.942 (95% CI 0.928 to 0.955);
Validating in Independent in Silico Cohorts
In order to verify that the results were broadly applicable and were not simply overfit to the training data, the performance of the 8-gene signature was tested in a series of completely independent cohorts. The 15 WB datasets were normalized with healthy samples that had been left out of discovery and held-out validation using COCONUT. These data included 544 non-healthy samples (214 extracellular bacterial infections, 40 intracellular bacterial infections, and 290 viral infections). The 8-signature had an AUROC of 0.948 (95% CI 0.929 to 0.967), 0.943 (95% CI 0.921 to 0.966), and 0.978 (95% CI 0.945 to 1) for distinguishing all bacterial vs. viral infections, extracellular bacterial vs. viral infections, and intracellular bacterial vs. viral infections, respectively (
A similar validation was performed in the 9 PBMC cohorts, which included 417 non-healthy patient samples (172 extracellular bacterial infections, 11 intracellular bacterial infections, and 234 viral infections). After COCONUT normalization of these datasets, it was found that the signature had an AUROC of 0.92 (95% CI 0.891 to 0.949), 0.921 (95% CI 0.891 to 0.95), and 0.906 (95% CI 0.786 to 1) for distinguishing all bacterial vs. viral infections, extracellular bacterial vs. viral infections, and intracellular bacterial vs. viral infections, respectively (
Validating in Prospective Cohorts
Finally, both the 7-gene and 8-gene signatures were profiled in a prospective cohort of 111 whole blood samples from Nepal using Fluidigm RT-PCR. It contains 25 viral infections, 15 extracellular bacterial infections, and 71 intracellular bacterial infections. Although 7-gene signature distinguished extracellular bacterial infections from viral infections with high accuracy (AUROC=0.886, 95% CI: 0.78-0.99), it had substantially lower accuracy in distinguishing intracellular bacterial infections from viral infections (AUROC=0.78, 95% CI: 0.68-0.88). The 7-gene signature had overall low accuracy in distinguishing bacterial and viral infections (AUROC=0.8, 95% CI: 0.72-0.89) (
Two Gene Combinations
The Area Under the Receiver Operating Curve (AUROC) for each pairwise combination of genes listed in Table 1 was calculated. Table 2 below shows the AUROC for all pairwise combinations of genes that have an AUROC ≥0.80:
This application claims the benefit of provisional application Ser. No. 62/823,460, filed on Mar. 25, 2019, which application is incorporated by reference herein in its entirety for all purposes.
This invention was made with Government support under contracts AI057229 and AI109662 awarded by the National Institutes of Health. The Government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2020/018414 | 2/14/2020 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62823460 | Mar 2019 | US |