The present disclosure relates in general to the field of immunology. More specifically, the present disclosure describes quantitative analysis of stimulus-specific responses by immune sentinel cells such as macrophages or monocytes and uses thereof.
Immune sentinel cells can sense a wide variety of molecules that derive from viruses, bacteria, fungi, or parasites, termed pathogen-associated molecular patterns (PAMPs), or that are indicative of tissue damage, termed damage-associated molecular patterns (DAMPs). They are recognized by dozens of diverse transmembrane and cytosolic pattern recognition receptors (PRRs). Cytokines produced by first responders, such as TNF or interferons (IFNs), may be sensed through cytokine receptors. Both PRRs and cytokine receptors transmit information about the stimuli via signaling adaptors to a stimulus-responsive signaling network with overlapping downstream pathways consisting of signaling kinases and transcription factors to coordinate diverse immune sentinel functions.
The functions of immune sentinel cells provide for both local anti-microbial activity and systemic immune activation and they coordinate the resolution and tissue healing when the threat is eliminated. Upon exposure to an immune threat, immune sentinel cells may induce resistance factors that may directly limiting pathogen invasion, replication, or assembly. Through secretion of inflammatory cytokines, phospholipids, and second messengers, immune sentinel cells communicate and spread this anti-microbial state to bystander cells within the tissue.
To limit pathogen spread, phagocytic immune sentinel cells, such as macrophages, neutrophils, and dendritic cells, upregulate their ability to engulf pathogens through dramatic reorganization of their cytoskeletons. Production of nitric oxide and reactive oxygen species contributes to pathogen killing, as does the release of antimicrobial DNA-enzyme mixtures, termed NETs (neutrophil extracellular traps). In response to some intra-cellular pathogens, the induction of cell suicide may limit pathogen viability and may occur in the absence of inflammatory mediator release through apoptosis or through inflammation-inducing necroptosis.
Cell-intrinsic pathogen defenses are complemented by the induction of local inflammation and secretion of chemokines for the recruitment and activation of diverse immune effector cells such as neutrophils and NK cells. Secreted and membrane-bound proteases remodel the extracellular matrix and assist migration of immune sentinel cells to the infected site. Eventually, immune sentinel cells orchestrate the adaptive immune response through systemic inflammation and modulatory cytokines that increase antigen presentation, adaptive immune cell production, selection, maturation, and recruitment.
Macrophages are immune sentinel cells that are distributed in every tissue. They recognize a vast variety of immune threats, initiating immune responses that involve innate defenses, sequential recruitment of diverse immune cells to the site of infection, and activation of systemic and adaptive immune mechanisms.
Immune responses are powerful, and often detrimental for the host. Hence, they are deployed on an “only-as-needed” basis. Given that pathogens differ widely in their biology, immune mechanisms that effectively counteract them should differ also. The “only-as-needed” rationale argues that healthy immune sentinel cell responses should be highly specific to the immune threat. Indeed, pathology caused by ineffective immune responses or by immune hyper-activity may be traced to failure of immune sentinel cells to mount immune responses that are specific and appropriate to the particular immune threat. One patho-physiological significance of mis-regulated macrophage responses, is the “cytokine storm” associated with many infections, including COVID-19.
Macrophages are capable of responding to hundreds of distinct pathogens, danger-associated molecular patterns, and cytokines through dozens of distinct receptors, and then generate stimulus-appropriate functions. Indeed, recent studies have revealed that macrophages are capable of producing gene expression programs that are much more stimulus-specific than previously thought.
In view of the important roles played by immune sentinel cells such as macrophages in the innate immune system, there is a need to develop methodologies that would provide quantitative measures for stimulus-specific responses of macrophages in various physiological or pathological conditions.
In one embodiment, the present disclosure provides a method of determining innate immune system responsiveness of a subject, comprising (i) obtaining a blood sample from the subject; (ii) exposing monocytes or monocyte-derived cells (e.g., macrophages, dendritic cells) derived from the blood sample to one or a combination of stimuli such as cytokines, pathogen-associated molecular patterns, and damage-associated molecular patterns; (iii) determining using genome wide RNA profiling, single cell mRNA expression of one or more of stimuli-related genes, or protein expression, responses to the stimuli by the monocytes or monocyte-derived cells (e.g., macrophages, dendritic cells); and (iv) determining a response specificity score characterizing the innate immune system responsiveness of the subject, wherein the determination comprises use of information theoretic and machine learning methodologies.
In another embodiment, the present disclosure provides a method for treating a subject having a condition or disease, comprising the steps of: (a) determining the responsiveness of the subject's innate immune system by a method comprising: (i) obtaining a blood sample from the subject; (ii) exposing monocytes or monocyte-derived cells (e.g., macrophages, dendritic cells) derived from the blood samples to one or a combination of stimuli such as cytokines, pathogen-associated molecular patterns, and damage-associated molecular patterns; (iii) determining using genome wide RNA profiling, single cell mRNA expression of one or more of stimuli-related genes, or protein expression, responses to the stimuli in the monocytes or monocyte-derived cells (e.g., macrophages, dendritic cells); and (iv) determining a response specificity score characterizing the responsiveness of the subject's innate immune system, wherein the determination comprises use of information theoretic and machine learning methodologies; (b) using the subject's response specificity score to query an information repository comprising (i) innate immune system responsiveness of a plurality of healthy subjects and patients having the condition or disease, and (ii) a correlation between the innate immune system responsiveness and a therapeutic outcome of the patients to one or more therapeutic regimens, and identifying a correlation value for the subject; and (c) treating the subject with a therapeutic regimen when the correlation value indicates a positive therapeutic benefit for the subject; or avoiding treating the subject with a therapeutic regimen when the correlation value indicates an absence of or a negative therapeutic benefit for the subject.
These and other aspects of the invention will be appreciated from the ensuing descriptions of the figures and detailed description of the invention.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
Some embodiments of the invention are herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of embodiments of the invention. In this regard, the description taken with the drawings makes apparent to those skilled in the art how embodiments of the invention may be practiced.
The present disclosure describes methods to measure Response Specificity (RS) (also referred to herein as Response Specificity Index (RSI) or Stimulus-Response Specificity) of monocytes or monocyte-derived cells (such as macrophages and dendritic cells) by assessing the response of such cells ex vivo or in vitro to various stimuli. By quantifying the Response Specificity, one can ascribe a Response Specificity score to monocytes-derived cells from different physiological or pathological conditions and identify its molecular drivers to address both fundamental biological questions and advance clinical studies, including the establishment of normal Response Specificity score ranges and detection of changes due to disease, and the effectiveness of or contraindications to treatments thereof. Response Specificity is indicative of a subject's innate immune health in guarding against many diseases and insults, and the assessment of Response Specificity score is disclosed herein as a useful clinical measurement, for example, to assess a patient with or at risk of inflammatory disease, or candidates for therapeutic interventions, to guide therapy for cancer or cancer immunotherapy, transplant rejection, and infectious diseases, by way of non-limiting examples.
The conceptual innovation is that hallmark of innate immune cells is Response Specificity. Response Specificity is provided herein as a clinical measure of innate immune health that will benefit people with pre-existing conditions that put them at risk for, e.g., infections or autoimmunity and in particular whether a particular treatment will be effective, be ineffective or even cause adverse consequences. Response Specificity can be measured by the experimental and analytical methods described herein.
In one embodiment, the Response Specificity is quantified by first tracking the activity and/or expression of one or more immune response genes at single cell resolution in a plurality of, e.g., hundreds of macrophages or monocytes, at a single time point or over time, after exposure to a panel of stimuli, individually or in combination. The immune response genes are selected to capture diverse responses to various immune stimuli described herein. In one embodiment, the present specification uses either the 10X Genomics systems for genome-wide profiling or the BD Rhapsody system for single cell profiling the expression of 500 immune response genes. In one embodiment, protein expression or upregulation is measured. In one embodiment, a combination of RNA and protein expression is assessed, which may be from different genes or in some embodiments from the gene. From analysis of multiple samples, an optimal time after stimulation for assessing effect on immune response genes may be selected to provide a standard method for obtaining the expression data. Such assessment may then be applied to each of the stimuli and each of the assessed expressions, to identify optimal time point after stimulation to measure the response. As a result, a single time point after exposure of macrophages to the panel of stimuli is selected such that a facile assessment routine is established for measuring Response Specificity.
Thus, in one embodiment, the present disclosure provides a method of determining innate immune system responsiveness of a subject, comprising (i) obtaining a blood sample from the subject; (ii) exposing monocytes or monocyte-derived cells (e.g., macrophages, dendritic cells) derived from the blood samples to one or a combination of stimuli such as cytokines, pathogen-associated molecular patterns, and damage-associated molecular patterns; (iii) determining responses using genome wide RNA profiling, single cell mRNA expression of one or more of stimuli-related genes, or protein expression, in response to the stimuli in the monocytes or monocyte-derived cells (e.g., macrophages, dendritic cells); and (iv) determining a response specificity score characterizing the innate immune system responsiveness of the subject, wherein the determination comprises use of information theoretic and machine learning methodologies.
In one embodiment, the blood samples comprise peripheral blood mononuclear cells. In one embodiment mononuclear cells from the blood sample are analyzed. In one embodiment mononuclear cells are differentiated into macrophages or dendritic cells. In one embodiment macrophages are analyzed. In one embodiment dendritic cells are analyzed.
In one embodiment, the exposing is to single cells.
In one embodiment, the determining responses is at a single time point after exposing, or at a plurality of time points after exposing. In one embodiment, the plurality of time points includes a zero time point. In one embodiment, the plurality of timepoints are within about 24 hours of exposing. In one embodiment, the plurality of timepoints are within about 12 hours of exposing. In one embodiment, the plurality of timepoints are within about 8 hours of exposing. In one embodiment, the plurality of timepoints are within about 3 hours of exposing. In one embodiment, the exposing after multiple timepoints comprises 0, 0.5, 1, 3, 5 and 8 hours after exposing. In one embodiment, the exposing after multiple timepoints consists of 0, 0.5, 1, 3, 5 and 8 hours after exposing. In one embodiment, the plurality of timepoints are within about 3 hours of exposing. In one embodiment, the determining the response specificity score is based on a plurality of time points after exposing. In one embodiment, the plurality of timepoints of exposing includes an exposing with no stimuli.
In one embodiment, the determining responses for a plurality of timepoints comprises integral, fold change, peak amplitude, speed, or any combination thereof.
Examples of cytokine stimuli include, but are not limited to, TNF, IFNγ, IL-6, IL-2 or any combination thereof. Examples of pathogen-associated molecular patterns stimuli include, but are not limited to, Pam3CSK4 (abbreviated P3CSK4), CpG, LPS, polyI:C, flagellin or any combination thereof. In one embodiment, the pathogen-associated molecular patterns stimuli are from fungi, protozoa, helminths, bacteria, viruses, or any combination thereof. Examples of damage-associated molecular patterns stimuli include, but are not limited to, products derived from lysed cells, necrotic cells, injured cells, or cells treated with heat (e.g., heat shock) or pressure. In one embodiment, the combinations of aforementioned stimuli may be applied to cells.
In one embodiment, the stimuli comprise LPS, polyI:C, IFNβ, P3CSK4, CpG and TNF. In one embodiment, the stimuli consist of LPS, poly I:C, INFβ, P3CSK4, CpG and TNF.
In one embodiment, a no stimulus control is included. In one embodiment, the stimuli-related genes can be IkBa, p-RelA, pSTAT1, pSTAT2, pSTAT3, pSTAT5, p-IRF3, p-p38, p-ERK, p-JNK, or any combination thereof. Single-cell RNA expression is determined. In one embodiment, antibodies to stimulus-responsive proteins such as IkBa, p-RelA, pSTAT1, pSTAT2, pSTAT3, pSTAT5, p-IRF3, p-p38, p-ERK, p-JNK, may be used, singly or any combination thereof.
In one embodiment, the stimuli-related genes comprise Tnf, 116, Cc15, Cc110, Nfkbia, or any combination thereof.
In one embodiment, the top 5 macrophage polarization state-related genes which are evaluated for each stimulus are:
In one embodiment, the polarization state of macrophages in the sample are determined. In one embodiment, the macrophages are M0, M1, M2 or any combination thereof. In one embodiment, macrophage types are identified using Irf1, Fg12 and Tgtp1 gene expression. In one embodiment, M0 macrophages are identified by Mx2, Nfkbiz and Egr3 gene expression. In one embodiment, M1 macrophages are identified by Mx2, Tnf and Gna15 gene expression. In one embodiment, M2 macrophages are identified by Cited, Ehd1 and Pou2f2 gene expression.
In one embodiment, the determination of the response specificity score involves an information theoretic algorithm to calculate mutual information or channel capacity. In one embodiment, the determination of the response specificity score involves machine learning algorithm such as ensemble of decision trees, nearest neighbors, or neural nets. In one embodiment, the determination of the response specificity score involves a pre-established relationship between responses of the stimuli-related genes to the stimuli, a pattern of expression among signaling systems, a pattern of expression of gene regulatory systems, a temporal progression of any of the foregoing, or combinations thereof.
In one embodiment, scREAL-TIME analysis provides the response specificity score.
In some embodiments, scREAL-TIME comprises the following components:
Calculation of trajectory features. For comparing stimulus-specificity, trajectories were first scaled 0-1 across all stimuli, and the trajectory features were calculated on the scaled trajectories. This was necessary as spline fitting across principal components generated occasional negative values for some cells, especially at the beginning of the trajectory. Scaling maintained the same relative dynamics and amplitudes, and increased the reliability of the trajectory feature calculation. Integral was calculated by integrating each single cell trajectory curve from 0 to 8 hours. Peak amplitude was calculated by taking the maximum value over 8 hours. Max log fold change was calculated for each gene by taking the fold change between the peak amplitude and the same cell's starting value at 0 hours. Activation speed was calculated by taking the derivative of the curve at 1 hour.
Calculation of mutual information. An information theoretic approach was used to identify either individual genes or combinations of genes providing the highest maximum mutual information between ligand identity and gene expression. Error bars on mutual information calculations were done using 10 bootstraps on 50% of the data. Estimation of maximum mutual information was implemented using the R package SLEMI (T. Jetka, K. Nienaltowski, T. Winarski, S. Błoński, M. Komorowski, Information-theoretic analysis of multivariate single-cell signaling responses. PLoS Comput Biol. 15, e1007132 (2019), which uses a statistical learning-based approach to more accurately and more efficiently calculate maximum mutual information for data types with higher dimensional outputs. The max MI was first calculated for each gene individually, using all stimuli (highest theoretical max MI=2.58 bits), as well as for all pairs of stimuli (highest theoretical max MI=1.0 bits).
As will be seen in the description below, the method for determining innate immune system responsiveness as generally described herein can be used to guide therapy for a patient by comparing a patient's RS from a single time point or during the course of monitoring (e.g., regular periodic physical examination) or followed after the diagnosis and/or during the course of treatment of a diseases, to guide selection of therapy (and/or contraindicate others) and monitor recovery. In some embodiments, the methods disclosed herein for obtaining RS, when combined with querying a data bank containing RS data of a plurality of healthy, ill, and ill patients undergoing therapy, the health status and/or effective therapeutic regimen can be identified.
In another embodiment, the present disclosure provides a method for treating a subject having a condition or disease, comprising the steps of: (a) determining the responsiveness of the subject's innate immune system by a method comprising: (i) obtaining a blood sample from the subject; (ii) exposing monocytes or monocyte-derived cells (e.g., macrophages, dendritic cells) derived from the blood samples to one or a combination of stimuli such as cytokines, pathogen-associated molecular patterns, and damage-associated molecular patterns; (iii) determining responses using genome wide RNA profiling, single cell mRNA expression of one or more of stimuli-related genes, or protein expression, in response to the stimuli in the monocytes or monocyte-derived cells (e.g., macrophages, dendritic cells); and (iv) determining a response specificity score characterizing the responsiveness of the subject's innate immune system, wherein the determination comprises use of information theoretic and machine learning methodologies; (b) using the subject's response specificity score to query an information repository comprising (i) innate immune system responsiveness of a plurality of healthy subjects and patients having the condition or disease, and (ii) a correlation between the innate immune system responsiveness and a therapeutic outcome of the patients to one or more therapeutic regimens, and identifying a correlation value for the subject; and (c) treating the subject with a therapeutic regimen when the correlation value indicates a positive therapeutic benefit for the subject; or avoiding treating the subject with a therapeutic regimen when the correlation value indicates an absence of or a negative therapeutic benefit for the subject.
In one embodiment, the blood samples comprise peripheral blood mononuclear cells. In one embodiment, the exposing is to single cells.
In one embodiment, the determining responses is at a single time point after exposing, or at a plurality of time points after exposing. In one embodiment, the plurality of time points includes a zero time point. In one embodiment, the plurality of timepoints are within about 24 hours of exposing. In one embodiment, the plurality of timepoints are within about 12 hours of exposing. In one embodiment, the plurality of timepoints are within about 8 hours of exposing. In one embodiment, the plurality of timepoints are within about 3 hours of exposing. In one embodiment, the exposing after multiple timepoints comprises 0, 0.5, 1, 3, 5 and 8 hours after exposing. In one embodiment, the exposing after multiple timepoints consists of 0, 0.5, 1, 3, 5 and 8 hours after exposing. In one embodiment, the determining the response specificity score is based on a plurality of time points after exposing. In one embodiment, the plurality of timepoints of exposing includes an exposing with no stimuli.
In one embodiment, the determining responses for a plurality of timepoints comprises integral, fold change, peak amplitude, speed, or any combination thereof.
Examples of cytokine stimuli include, but are not limited to, TNF, IFNγ, IL-6, IL-2 or any combination thereof. Examples of pathogen-associated molecular patterns stimuli include, but are not limited to, Pam3CSK, CpG, LPS, polyI:C, flagellin, or any combination thereof. In one embodiment, the pathogen-associated molecular patterns stimuli are from fungi, protozoa, helminths, bacteria, viruses, or any combination thereof. Examples of damage-associated molecular patterns stimuli include, but are not limited to, products derived from lysed cells, necrotic cells or injured cells. In one embodiment, the combinations of stimuli can be IFNγ and IL-6. In one embodiment, the stimuli-related genes can be IkBa, p-RelA, pSTAT1, pSTAT2, pSTAT3, pSTAT5, p-IRF3, p-p38, p-ERK, p-JNK, or any combination thereof.
In one embodiment, the stimuli comprise LPS, polyI:C, IFNβ, P3CSK4, CpG and TNF. In one embodiment, the stimuli consist of LPS, poly I:C, IFNβ, P3CSK4, CpG and TNF.
In one embodiment, a no stimulus control is included. In one embodiment, the stimuli-related genes can be IkBa, p-RelA, pSTAT1, pSTAT2, pSTAT3, pSTAT5, p-IRF3, p-p38, p-ERK, p-JNK, or any combination thereof. Single-cell RNA expression is determined. In one embodiment, antibodies to stimulus-responsive proteins such as Ma, p-RelA, pSTAT1, pSTAT2, pSTAT3, pSTAT5, p-IRF3, p-p38, p-ERK, p-JNK, may be used, singly or any combination thereof.
In one embodiment, the stimuli-related genes comprise Tnf, 116, Cc15, Cc110, Nfkbia, or any combination thereof.
In one embodiment, the top 5 macrophage polarization state related genes which are evaluated for each stimulus are:
In one embodiment, the polarization state of macrophages in the sample are determined. In one embodiment, the macrophages are M0, M1, M2 or any combination thereof. In one embodiment, macrophage types are identified using Irf1, Fg12 and Tgtp1 gene expression. In one embodiment, M0 macrophages are identified by Mx2, Nfkbiz and Egr3 gene expression. In one embodiment, M1 macrophages are identified by Mx2, Tnf and Gna15 gene expression. In one embodiment, M2 macrophages are identified by Cited, Ehd1 and Pou2f2 gene expression.
In one embodiment, the stimuli-related genes can be obtained by genome-wide RNA profiling. In one embodiment, the stimuli related protein expression can be assessed using antibodies, mass spectroscopy, or other methods. In one embodiment, the determination of the response specificity score involves a machine learning algorithm that uses t-distributed stochastic neighbor embedding (t-SNE). In one embodiment, the determination of the response specificity score involves a pre-established relationship between responses of the stimuli-related genes to the stimuli, a pattern of expression among signaling systems, a pattern of expression of gene regulatory systems, a temporal progression of any of the foregoing, or combinations thereof.
In one embodiment, scREAL-TIME analysis provides the response specificity score. In one embodiment, scREAL-TIME comprises dimensionality reduction, k-means clustering, weighted random walks, spline fitting and recovering gene trajectories. scREAL-TIME is described in detail hereinabove.
In one embodiment, the information repository or data bank used in the above method comprises innate immune system responsiveness of patients at a single or at one or more time points of the condition or disease, for example, at disease diagnosis or onset, at the time when symptoms appear, or during treatment. In one embodiment, the condition or disease can be cancer, inflammatory disease, autoimmunity, high BMI, or advanced age. Other conditions include transplant rejection, tumor progression, tumor immunotherapy, sepsis, or infection.
The terms “comprise”, “comprises”, “comprising”, “includes”, “including”, “having” and their conjugates mean “including but not limited to”.
As used herein, the singular form “a”, “an” and “the” include plural references unless the context clearly dictates otherwise. For example, the term “an enzyme” or “at least one enzyme” may include a plurality of enzymes, including mixtures thereof.
Throughout this application, various embodiments of the present disclosure may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
Whenever a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range. The phrases “ranging/ranges between” a first indicate number and a second indicate number and “ranging/ranges from” a first indicate number “to” a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals therebetween.
Unless otherwise defined, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the invention, exemplary methods and/or materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and are not intended to be necessarily limiting. Each literature reference or other citation referred to herein is incorporated herein by reference in its entirety.
In the description presented herein, each of the steps of the invention and variations thereof are described. This description is not intended to be limiting and changes in the components, sequence of steps, and other variations would be understood to be within the scope of the present invention.
It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as suitable in any other described embodiment of the invention. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements.
Each step or component of the assessment disclosed herein is described in further detail below. Such descriptions are merely exemplary of how the step may be carried out; a skilled artisan will recognize variations that will achieve the same purposes. Such variations are fully embraced herein.
Subjects. The disclosure provides for the evaluation of the Responsive Specificity of a subject's or patient's macrophages as an indication of the status of the innate immune system, thus any healthy subject or one having a presumed or overt condition or disease or under a course of acute or chronic therapy may be so evaluated. Moreover, to establish correlations between Response Specificity of a particular subject and the potential benefit, lack of benefit or contraindication to a particular therapy and even duration or intensity of a regimen (e.g., dose level, dosing frequency) thereof (e.g., pharmacologic, life-style modification, etc.), an information repository comprising data on a plurality of subject data on which the correlation between Response Specificity and particular subject's therapeutic history is obtained.
Obtaining blood samples from the subject. The methods described herein are typically carried out using an aliquot of whole blood taken from an antecubital vein, thought the method is not so limiting and a blood sample from an indwelling catheter, sample of blood from a blood donation, or even a finger stick may provide the requisite sample. An anticoagulant such as heparin may be used to prevent clotting. PMBCs may be preserved for future testing by freezing, e.g., in 10% DMSO/90% FBS (fetal bovine serum).
Exposing monocytes, macrophages or other monocyte-derived from the blood samples. In one embodiment, peripheral blood mononuclear cells (PBMCs) are isolated from the blood sample. Methods well known in the art may be employed, such as, by density gradient centrifugation (e.g., using Ficoll), blood collection tube separation components, cell sorting techniques, etc.
Monocytes may be isolated or enriched from PBMCs by methods such as, but not limited to, pan-monocyte positive selection or CD14+ positive-selecting cell sorting (e.g., Miltenyi). Isolated or enriched monocytes may be incubated for a period to allow equilibration, e.g., in complete medium with 1% FBS for 2 hours.
PBMCs (monocytes) may be differentiated into macrophages by any one of a number of methods such as but not limited to using macrophage colony-stimulating factor (M-CSF) alone, or in combination with so-called polarizing cytokines such as IFNγ (M1), IL4 (M2), IL10, or IL13. In one embodiment, monocytes are differentiated into dendritic cells.
Exposure of monocytes or monocyte-derived cells to stimuli. Macrophages are immune sentinel cells that respond specifically to different stimuli. Innate immune macrophages have pattern-recognition receptors capable of sensing pathogens. Many host defense functions may also be detrimental. Macrophages have distinct expression programs activated in response to different immune threats. This disclosure provides for the assessment of the responsiveness of an individual's macrophages (or monocytes or other monocyte-derived cell types) to a panel of stimuli to generate a score (Response Specificity) that is useful for gauging the individual's potential response to innate immune system challenge by a treatment regimen for a particular condition or disease, among other purposes.
Monocytes, macrophages or other monocyte-derived cells obtained as described above are exposed in vitro to individual or combinations of stimuli such as those described further below. In one embodiment, a plurality of similar monocyte or monocyte-derived cell samples in medium are prepared in plates or tubes, and the stimulus/stimuli added (e.g., in a vehicle, carrier, buffer, culture medium, or other suitable solution or suspension). Cells may be incubated at 37° C. in 5% CO2 for a specified time before cells are harvested for subsequent steps. Concentrations of each or combinations of stimuli may be at 1, 5, 25 or 125 mcg/mL (such optimal concentrations for each stimulus or combination to be determined from optimization studies), or where the stimulus is based on a cell extract, for example, may be quantified by protein content, original cell count, or any other assay or quantification process (e.g., to be able to establish a particular stimulus level or amount for consistency across measurements). In one embodiment, an optimal duration of exposure is provided for each stimulus. In one embodiment, a single duration of exposure is provided for all stimuli. In one embodiment a time course of exposure is obtained for one or more stimulus conditions, from sampling at different time points or separate aliquots prepared for harvesting at each time point.
In some embodiments, a trajectory is obtained at a series of time points after exposure, such as over one hour, two hours, three hours, four hours, eight hours, 12 hours, 18 hours or 24 hours after exposure to stimuli. In some embodiments the trajectory is between 0 and 8 hours. In some embodiments the trajectory is between 0 and 3 hours. In some embodiments timepoints include 0, 0.5, 1, 3, 5 and 8 hours. In some embodiments, each stimulus is followed by the same timepoints. In some embodiments each stimulus is followed by a particular series of timepoints that may be the same or different from any other timepoint. In some embodiments, a time course with no stimulus is determined.
In some embodiments, the determining responses for a plurality of timepoints comprises integral, fold change, peak amplitude, speed, or any combination thereof. In some embodiments, integral is calculated by integrating each single cell trajectory curve from 0 to 8 hours. In some embodiments, peak amplitude is calculated by taking the maximum value over 8 hours. In some embodiments, max log fold change is calculated for each gene by taking the fold change between the peak amplitude and the same cell's starting value at 0 hours. In some embodiments, activation speed is calculated by taking the derivative of the curve at 1 hour. These particular assessments are merely exemplary of other selections of timepoints, durations, baseline point, and other factors in providing data for determination of the response specificity score.
As shown in the examples and figures herein, initial assessment of the Response Specificity was evaluated using a larger panel of stimuli, assessing gene expression among a large number of genes, and over a time course of assessment to track the temporal alterations in gene expression after stimulation. For a more readily deployable method to assess Response Specificity for processing patient samples, a subset of stimuli, measured genes, and single time point after stimulation (or specific time point for each stimulus) may be selected to streamline the process while capturing useful data from each individual. However, the disclosure herein is not so limited as to the selection of the stimuli, the genes assessed (as RNA and/or protein), or the use of a single vs. two or more time points, in providing a Response Specificity. The Response Specificity may therefore be determined for a particular combination of the aforementioned parameters, or it may be a composite of two or more assessments (e.g., a single time point assessment and a time-course assessment). In some embodiments, different measures of Response Specificity may be designed for different purposes, such as a static (single stimulus exposure time) or dynamic (two or more time points, such as 0, 0.5, 1, 3, 5, and 8 hours). Such variations in the type of Response Specificity are fully embraced by the disclosure herein.
Cytokines. Cytokines stimuli include but are not limited to TNF, IFNβ, IFNγ, IL-2 and IL6, and any combination thereof. Such combinations include IFNγ and IL6.
Pathogen-associated molecular patterns. Pathogen-associated molecular patterns stimuli (PAMPS) are derived from fungi, protozoa, helminths, bacteria, viruses, or any combination thereof. Non-limiting examples of PAMPS include Pam3CSK, CpG, LPS, polyI:C, flagellin, or any combination thereof.
Damage-associated molecular patterns. Damage-associated molecular patterns (DAMPS) may be derived from lysed cells, necrotic cells, or injured cells. Non-limiting examples of stimuli from such sources include preparations of lysed cells, preparations of or supernatants from necrotic cells, preparations or supernatants from injured cells.
In some embodiments, the combination of LPS, poly I:C, IFNβ, P3CSK4, CpG and TNF are the stimuli. In some embodiments, the stimuli comprise LPS, polyI:C, IFNβ, P3CSK4, CpG and TNF. In some embodiments, the stimuli consist of LPS, poly I:C, IFNβ, P3CSK4, CpG and TNF. In some embodiments, a no stimulus control is included. In some embodiments, no stimulus is evaluated.
Stimuli-related genes. The disclosure is not limited to any particular set of genes or proteins to be profiled. Genes to be profiled in the aforementioned samples including but are not limited to IkBa, p-RelA, pSTAT1, pSTAT2, pSTAT3, pSTAT5, p-IRF3, p-p38, p-ERK and p-JNK. In other embodiments, genes are selected from immune response genes, such as but not limited to Cc13, Cxcl10, Cxcl2, TNF, Irf7, Rgs, Adgre1, I11b, I11a, Lgals9, Cc15, Clec4e, Ddx58, Tnfrsf1b, Cd69, Cc12, Cc14, Cd274, 116, Bc12a1a, Cd14, 1115, Il15ra, Ier3, Ifitm3, Cc19, Lgals1, Ctsd, Hmox1, Btg1, Fth1 and H2.K1, and any combination thereof.
In some embodiments, the stimuli-related genes comprise Tnf, 116, Cc15, Cc110, Nfkbia, or any combination thereof. In some embodiments, the top 5 macrophage polarity stimuli-related genes which are evaluated for each stimulus are:
In some embodiments, the polarization state of macrophages in the sample are determined. In some embodiments, the macrophages are M0, M1, M2 or any combination thereof. In some embodiments, macrophage types are identified using Irf1, Fg12 and Tgtp1 gene expression. In some embodiments, M0 macrophages are identified by Mx2, Nfkbiz and Egr3 gene expression. In some embodiments, M1 macrophages are identified by Mx2, Tnf and Gna15 gene expression. In some embodiments, M2 macrophages are identified by Cited, Ehd1 and Pou2f2 gene expression.
In one embodiment, the genes are selected from those available on the Rhapsody platform. Selected immune response genes or proteins may be categorized based on their primary known biological function (e.g., gene ontology, such as pro-inflammatory, anti-inflammatory, anti-viral, anti-oxidant, cell death regulator, monocyte attractant, lymphocyte attractant) or on their responsiveness to specific signaling pathways and transcription factors (e.g., such as target genes of NFκB, IRF, AP1 transcription factors).
Determining single cell mRNA expression. After exposure of aliquots of monocytes or monocyte-derived cells (e.g., macrophages, dendritic cells) to the one or more stimuli for a particular time period (that may include zero time or without exposure or exposure to vehicle only, as a control), cells are processed for transcriptomic analysis by, in one embodiment, single cell mRNA expression. In one embodiment, selected immune response genes of single cells in microwells are analyzed by PCR amplification and detected using barcoding.
As described herein, such measurements may be made by, for example, the 10X Genomics platform that profiles expression of all genes in the genome, or the BD Rhapsody scRNAseq platform system, that can profile 500 immune response genes. See, for example, Gao et al., The Comparison of Two Single-cell Sequencing Platforms: BD Rhapsody and 10× Genomics Chromium, Curr Genomics, 2020 December; 21(8):602-609.
Protein expression profiles may also be used to assess Response Specificity. Proteins produced by monocytes or monocyte-derived cells upon stimulation as described herein may be identified by specific assays for such proteins, or methods including but not limited to mass spectroscopic protein identification.
In some embodiments, both RNA and protein expression data are used in the generation of the Response Specificity. In some embodiments the one or more RNA and one or more proteins that comprise the Response Specificity are different. In some embodiments, one or more gene expression measures by both RNA and protein level comprises the Response Specificity analysis.
For initial determination of the optimal selection of stimuli, combination of stimuli, duration of exposure, and immune response related genes to profile, channel capacity will be calculated which comprises evaluation of upstream multi-transcription factor signaling networks and the downstream gene regulatory networks. Channel capacity is calculated for all genes, for single genes, for sets of genes controlled by the same regulatory mechanism, or genes performing different functions (e.g., cytokines, tissue remodelers, feedback regulators).
Once an optimal protocol comprising stimuli and responsive genes, and duration of exposure for each (or a single duration that optimized available of data across the stimuli and genes), the protocol for assessing response specificity in subject and patient samples is established, and a single or set of numerical values can be ascribed to each monocyte/macrophage sample based on such analysis. An information repository comprising such data is prepared, together with each subject's or patient's health status, any treatments thereof and outcomes thereof.
Determining single cell protein expression. Labeled antibodies, antigen-binding fragments or receptor reagents to expressed proteins may be used to identify stimuli-induced responses in the monocytes or monocyte-derived cells. In another embodiment, mass spectroscopy is used to identify proteins expressed by stimulated cells. As with the RNA expression methods described above, the disclosure is not limited to any particular method of quantifying gene expression, and embraces any and all such methods. In some embodiments, gene expression may be measured by both RNA and protein expression for individual genes.
Determining a response specificity score. As described above, the response specificity score is a value or values characterizing the innate immune system responsiveness of the subject, based on the testing of monocytes or monocyte-derived cells such as macrophages and dendritic cells, as described herein. The determination comprises use of information theoretic and machine learning methodologies. In one embodiment, the determination of the response specificity score involves a machine learning algorithm that uses t-distributed stochastic neighbor embedding (t-SNE). In one embodiment, the determination of the response specificity score involves a pre-established relationship between responses of the stimuli-related genes (and/or proteins) to the stimuli, a pattern of expression among signaling systems, a pattern of expression of gene regulatory systems, a temporal progression of any of the foregoing, or combinations thereof. In one embodiment, the method comprises scREAL-TIME, as described herein.
In another embodiment, scREAL-TIME is used to analyze the stimuli data generated as described above. scREAL-TIME comprises the following components:
Calculation of trajectory features. For comparing stimulus-specificity, trajectories were first scaled 0-1 across all stimuli, and the trajectory features were calculated on the scaled trajectories. This was necessary as spline fitting across principal components generated occasional negative values for some cells, especially at the beginning of the trajectory. Scaling maintained the same relative dynamics and amplitudes, and increased the reliability of the trajectory feature calculation. Integral was calculated by integrating each single cell trajectory curve from 0 to 8 hours. Peak amplitude was calculated by taking the maximum value over 8 hours. Max log fold change was calculated for each gene by taking the fold change between the peak amplitude and the same cell's starting value at 0 hours. Activation speed was calculated by taking the derivative of the curve at 1 hour.
Calculation of mutual information. An information theoretic approach was used to identify either individual genes or combinations of genes providing the highest maximum mutual information between ligand identity and gene expression. Error bars on mutual information calculations were done using 10 bootstraps on 50% of the data. Estimation of maximum mutual information was implemented using the R package SLEMI (T. Jetka, K. Nienaltowski, T. Winarski, S. Błoński, M. Komorowski, Information-theoretic analysis of multivariate single-cell signaling responses. PLoS Comput Biol. 15, e1007132 (2019), which uses a statistical learning-based approach to more accurately and more efficiently calculate maximum mutual information for data types with higher dimensional outputs. The max MI was first calculated for each gene individually, using all stimuli (highest theoretical max MI=2.58 bits), as well as for all pairs of stimuli (highest theoretical max MI=1.0 bits).
Various embodiments and aspects of the present invention as delineated hereinabove and as claimed in the claims section below find experimental support in the following examples.
This and the ensuing examples focus on developing ways to measure Response Specificity of macrophages in response to various stimuli. By quantifying this property, one can ascribe a score to macrophages derived from different physiological or pathological conditions and identify its molecular drivers to address both fundamental biological questions and advance clinical studies, including the establishment of normal score ranges and detection of changes due to disease. The importance of innate immune health in guarding against many diseases suggests that Response Specificity could advance to be a useful clinical measurement, prescribed to people, for example, with or at risk of inflammatory disease.
As macrophages function as single cells, their Response Specificity must be measured at the single cell level. This is challenging because (i) biological responses show a high degree of cell-to-cell variability and dynamically evolve over hours, and (ii) single cell measurements of cellular transcriptomes (e.g., single cell RNA-seq) typically have high technical uncertainty and are snapshot time point assays because it is a destructive technology.
These challenges to quantify Response Specificity are addressed in a two-step approach. First, a new experimental workflow is used to track the activity of a key immune response related genes (e.g., transcription factor NFκB, used in this example) at single cell resolution in hundreds of macrophages over time. NFκB is dynamically regulated and its temporal trajectories show a high degree of stimulus-specificity. These high-quality trajectory datasets can be used to develop powerful information theoretic and machine learning approaches to quantify NFκB Response Specificity, and construct mathematical modeling to identify determinants of specificity and confusion. Expansion of the method to encompass a panel of stimuli and a larger set of responsive genes that together, distinguish among innate immune responsiveness across a population of healthy and non-healthy individuals (of e.g., different ages, body compositions, lifestyles, health status, acute disease, chronic disease, duration or stage of disease, etc.) to provide a composite value or score. Such value, as described herein, has valuable diagnostic, prognostic and therapeutic utility. It should be noted that although NFκB and in some cases bone-marrow derived macrophages are used in the initial design of the methodology for determining Response Selectivity, later examples herein provide the experimental basis for a method using peripheral blood derived mononuclear cells, a panel of stimuli to which samples of such cells are exposed (including after differentiation), and analysis of expression of genes by single cells to provide the data input to determine Response Selectivity.
The conceptual innovation is that hallmark of innate immune cells is Response Specificity. Response Specificity could be developed into clinical measure of innate immune health that could benefit people with pre-existing conditions that put them at risk for, e.g., infections or autoimmunity, as well as guide treatment and selection of the drug(s), dose, timing, and monitoring of effectiveness. Response Specificity can be measured by the following Experimental and Analytical Innovations.
A new knock-in mVenus-RelA reporter mouse has been developed that allows tracking of NFκB dynamics in primary cells without the artifact of ectopic expression or immortal cell lines. An optimized workflow is also developed for long term live cell microscopy, including automated image analysis.
A new experimental workflow has been developed (e.g., with the BD Rhapsody system) for single cell profiling the expression of 500 immune response genes selected to capture the diversity of responses in bulk RNA-seq experiments. The Rhapsody single cell platform produces more quantitatively reliable data and is substantially more cost-effective for 10,000's of cells than other approaches (Table 1). However, the disclosure herein is not so limited to any particular method for scRNA sequencing.
(1) Machine Learning classification to detect confusion. Machine learning approaches that provide measures of specificity and precision, and, importantly, detect conditions of confusion (off-diagonal) have been developed.
(2) Dynamic Mutual Information (dMI). A Markov model-enabled method is developed to quantify the information present in dynamic trajectories and information accumulation over time.
(3) Modeling to better characterize biological variability. Mechanistic modeling approaches that leverage knowledge of biochemical regulatory mechanisms to better characterize single cell responses are developed. A key limitation of single cell measurements is technical noise, which is difficult to separate from biological variability. While smoothing functions distort many of the informative features in gene expression trajectories, model simulations may preserve these features and examine whether they can account for biological heterogeneity. Furthermore, a key limitation of scRNA-seq as a snapshot assay can be overcome by leveraging knowledge of a gene regulatory mechanism in kinetic models that are fit to the data.
The downstream result of Response Specificity in signaling networks is the transcription of genes encoding potent inflammatory molecules, tissue remodelers, or cell death initiators. The production of these molecules can be harmful to the host. For example, the hyper-inflammatory cytokine storms recently seen in COVID-19 may be attributed to loss of specificity in the transcriptomes macrophage must produce to effectively control the infection without collateral damage. Environmental context and prior exposures, either pre-existing conditions in people susceptible to infectious disease or the effect of vaccination, may be an additional important factor in the development of severe or mild symptoms. Measuring and quantifying transcriptomic Response Specificity across different environmental contexts will identify genes and regulatory processes that control this property of macrophage function. Thus, the Response Specificity is affected by the health of the PBMC donor.
Macrophages each function independently as single cells, rather than as an organ, to respond to the environment. Thus, the accuracy of each macrophage's transcriptomic response to the danger signal encountered is vital to averting damaging run-away inflammation. Here, a single cell RNA-seq (scRNAseq) experimental workflow is established to measure Response Specificity.
By stimulating macrophages with diverse immune ligands and obtaining single cell transcriptomes using the BD Rhapsody single cell platform, one can measure the different transcriptomic responses of macrophage cells to various stimuli over a time-series. In preliminary results, 31 samples (5 time points×6 stimuli, plus unstimulated) on naïve M0 macrophages were collected and it was observed that transcriptomes were indeed stimulus-specific, with each ligand producing a population of cells transcriptomically distinct. To measure how prior inflammatory contexts perturb Response Specificity, one can polarize macrophages using IFNβ, IL-4, IL-13, IL-10, or IFNβ to produce macrophages of different epigenetic states, which are thought to be either pro- (M1 associated states) or anti-inflammatory (M2 associated states). One can also stimulate the macrophages with a set of diverse ligands targeting both pathogen receptors and cytokine receptors and collect single cell data over a time-series (
These experiments could deliver a well-controlled map of physiological context-dependent macrophage transcriptomes and their time evolution, onto which macrophage responses in disease contexts could be compared against.
There has yet to be a quantification of transcriptomic Response Specificity, a key property of macrophage function. scRNA-seq data enable this calculation, and also provides insight into the genes that contribute most to alterations in Response Specificity in physiological and pathological inflammatory contexts. Because use of machine learning models and information theoretic calculations have been shown to be amenable to analysis of high dimensional single cell data as discussed above, one can leverage these analyses to analyze the scRNA-seq data to provide quantitative values to assess Response Specificity. Classification and mutual information would enable quantitative comparison of Response Specificity in different inflammatory contexts and point to genes important for ligand discrimination.
Machine Learning Approach: To quantify the specificity of macrophage gene expression profiles at each of the 5 time points, one can train machine learning classification algorithms on the scRNA-seq data. One could use an ensemble learning approach, training models on a set of machine learning algorithms of different classes such as: decision trees, k-Nearest Neighbors, linear and radial kernel Support Vector Machines, and Naïve Bayes, to avoid computational biases caused any single machine learning algorithm. Machine learning models can be trained by 10-fold cross-validation on 70% of the single cell data to minimize overfitting. Classification accuracy, precision, and F1 scores can be calculated on the remaining 30% of the data, for the 8 stimuli measured in experiments.
Information Theoretic Approach: As a second metric for quantifying specificity, one can calculate the channel capacity using existing methods that describes the upper bound on the ligand discriminability measured in macrophage transcriptomes. The channel in this scenario includes both the multi-transcription factor signaling network, and the gene regulatory network induced by the immune ligand. Channel capacity can be calculated for all genes, single genes, sets of genes controlled by the same regulatory mechanism, and classes of genes performing different functions (i.e., cytokines, tissue remodelers, feedback regulators, etc.).
It is expected that later time points will contain more stimulus-specific information, but for certain pairs of stimuli, earlier time points will display greater ligand discriminability, pointing to the contribution of temporal regulation in defining stimulus-specificity. From the machine learning models one can extract genes that best classify particular ligands at each time point. Channel capacity calculations could allow insight into how different subsets of inflammatory gene programs contribute to ligand distinguishability. Comparing naïve M0, M1-IFNγ, M1-IFNβ, M2a-II4, M2a-II13, and M2c-II10 macrophages using classification and channel capacity metrics could also ultimately reveal the context-dependent importance of specific genes for distinguishing stimuli.
Macrophages activate stimulus-specific sets of genes that also need to be produced with proper timing. However, because scRNA-seq measurements are endpoint and high-throughput transcriptional data of endogenous genes is snapshot in nature, thus far in the field tracing the dynamics of gene expression for single cells has been an outstanding problem. Although single cell expression trajectories have been captured for a few genes at a time with reporter constructs, technological limitations prevent the same for hundreds of endogenous genes. Thus, an analytical framework is needed to model gene expression trajectories that faithfully capture observed distributions in scRNA-seq data. A more accurate analysis of the heterogeneity of induced inflammatory gene programs is described herein. Single cell expression trajectories would allow a better quantification of response specificity. Once developed, expression at single time points for each gene or, ideally for all genes measured, would simplify the workflow to obtain Response Specificity on PBMC samples. Such data collection at a single time point is embodied herein.
To better understand the temporal component of response specificity, a new computational framework is developed to analyze heterogeneous response trajectories using mathematical models, which can be used to investigate the sources and mechanisms behind gene expression variation. Starting from time-series, one could stimulate each gene's gene regulatory mechanism (GRM) with the ODE models. One can use stimulus-specific transcription factor activity curves as input into the model. For each gene, one can scan the stimulus-specific expression trajectories for all plausible gene regulatory mechanisms that involve the previously described combinatorial and temporal action of 4 effector proteins that control either transcription or mRNA half-life. One could first fit simulated trajectories to the data mean using Levenberg-Marquardt least-squares fitting or gradient descent. Then one could use Markov Chain Monte Carlo sampling to generate parameter distributions that result in trajectories fitted to the single cell data distribution. With single cell trajectories in hand, one could calculate dMI and classification metrics on the trajectories as was done for NFκB signaling data described above. Unlike the multiple algorithms already developed for signaling networks, to reconstruct the gene regulatory network of the single cells one could bin the trajectories for each gene and assume a number of cell archetypes to group trajectories of different genes into a single cell, constrained by data-derived gene-gene correlations.
In one embodiment, trajectory information could improve the quantification of the true Response Specificity by incorporating the temporal component. The mechanistic modeling approach described herein would allow a framework to learn how gene regulatory mechanisms control heterogeneity of inflammatory genes (e.g., genes controlled by NFκB “OR” IRF may have more variable expression across cells, while genes controlled by ‘AND” gate activity of multiple TFs may have less expression heterogeneity). Both dMI and classification would point to features of temporal control of gene expression, including speed of gene activation, or fold change of gene expression, essential features of appropriate deployment of innate immune gene programs. However, the invention is not so limiting as to a single time point of assessment or a plurality of timepoints for generating the Response Specificity.
The health of the innate immune system is a key determinant of an individual's susceptibility to infectious agents and auto-inflammatory disease. Response Specificity, as a fundamental functional property of innate immune cells, may serve as the thermometer to measure Innate Immune Health. Experimental and analytical workflows have been described above to measure the Response Specificity of macrophages via two single cell assays: either live-cell microscopy of gene expression activity, or scRNA-seq to characterize how Response Specificity may be affected by specific polarizing cytokines that, in vivo, define the tissue microenvironment. This example examines how macrophage Response Specificity may be affected by inflammatory disease. First, mouse models of inflammatory disease are examined, e.g., use the mVenus-RelA reporter to measure Response Specificity as well as transcriptomic Response Specificity. Then transcriptomic Response Specificity of monocyte-derived macrophages from human donors (e.g., healthy, with autoimmune disease, or at risk of autoimmune disease due to receiving checkpoint inhibitors for tumor-immunotherapy) are measured. The goal for these studies is to enable eventual definition of healthy vs. unhealthy ranges of Response Specificity scores (e.g., MI) for a given set of stimuli, with the vision that Response Specificity could be commonly measured in the clinic for people at risk for inflammatory disease conditions, similar to CBC panels, glucose-tolerance tests, or tests for liver and kidney enzymes. What score may be associated with auto-immune risk, and further determination of which genes, regulatory regions, or stimuli are most reliably informative would inform future translational studies in developing a cost-effective, targeted test of Response Specificity that can be offered in the clinic as a measure of Innate Immune Health.
This example describes applying the Response Specificity workflow to macrophages derived from mouse models of auto-immune or inflammatory disease to determine how these diseases may be reflected in Response Specificity. This is a basis for subsequent human donor studies.
In one embodiment, three genetic mouse models of auto-immune and inflammatory disease can be used. Macrophages responses may be altered in a cell-autonomous manner or through an altered cytokine milieu produced by other cells. Hence, not only bone-marrow-derived macrophages, but also peritoneal macrophages that have been conditioned in vivo are studied. The workflow described above can be applied:
Model 1: Sjögren's syndrome (SS) is a systemic autoimmune disorder involving progressive destruction of tissues exposed to the environment, such as eye, mouth and throat, and skin rashes. Interestingly, several genetic variants in regulators of the inflammatory transcription factor NFκB are associated with SS patients and a mouse strain containing similar variants recapitulates some of the SS pathognomonic characteristics. Preliminary results reveal a diminished NFκB Response Specificity in BMDMs sensing cytokine TNF, the bacterial PAMP LPS, and the viral PAMP poly(I:C), and that information accumulation was diminished during the 0.75-2 hour phase, which coincides with the function of the IκBα negative feedback loop that is disrupted in this mouse model.
Model 2: C57Bl/6 AireGW/+ mice develop Sjögren's syndrome and autoimmune retinopathy. This mouse is a model for human patients with the G228W dominant-negative mutations in AIRE, who are predisposed to endocrine autoimmunity. These mice are protected from autoimmunity with early-life injections with anti-TNF. AIRE normally promotes self-tolerance by participating in negative selection of self-reactive T-cells. In an ex vivo model of human macrophages, loss of AIRE increased susceptibility to Candida albicans infection, which was associated with failures in secretion of IL-1β, IL-6, or TNF. Thus, macrophage Response Specificity may be affected, but via cytokine conditioning rather than cell-autonomously.
Model 3: Absence of tonic TNF conditioning. Anti-TNF therapy is widely prescribed but is associated with paradoxical inflammatory symptoms, such as psoriasis. Preliminary studies with macrophages deficient in tonic TNF indicate that both NFκB and transcriptomic responses are less specific, as responses to TNF and Pam3CSK are more similar than in healthy controls.
With these studies, it is expected to find that macrophage Response Specificity is diminished in mouse models of auto-immune or auto-inflammatory disease and that specific gene sets and ligands would be most informative about such abnormal function.
Response Specificity of Macrophages from Humans at Risk for Inflammatory Disease
Following studies in the above mouse model systems, one can apply the optimized workflow described herein for quantifying transcriptomic Response Specificity in macrophages from human donors. One underexplored but increasingly prevalent cause of autoimmunity are patients who have been treated with anti-tumor checkpoint-blockade drugs, a fraction of whom are at risk for developing autoimmune syndromes. PD-1 and PD-L1, the receptors and ligand inhibited by checkpoint blockade, are essential for immune tolerance. PD-1 and PD-L1 are expressed on multiple cell types, including T-cells directly targeted in immunotherapy but also on macrophages. In mice, knockout of these genes lead to lupus-like syndromes, type 1 diabetes, or arthritis. In humans, the effect of PD-1/PD-L1 inhibition on innate immune function is not well understood. In both mice and humans, checkpoint inhibitor-induced autoimmune symptoms can be ameliorated with TNF blockade, suggesting that overproduction of key macrophage-response cytokines may be a contributor to loss of self-tolerance.
In one embodiment, one could measure Response Specificity of PBMC-derived macrophages in human cohorts that develop autoimmunity as a result of treatment with checkpoint inhibitors intended to block PD-L1/PD-1 on T-cells. To include appropriate controls, one could measure 4 donor groups: (1) without autoimmunity, not on checkpoint blockade (healthy); (2) with autoimmunity, not on checkpoint blockade therapy; (3) without autoimmunity, on checkpoint blockade; and (4) with autoimmunity and on checkpoint blockade therapy. One could isolate monocytes from PBMCs, culture macrophages ex vivo, stimulate them with the 8 immune ligands described above, and collect single cell transcriptomes using the Rhapsody scRNAseq workflow described above. The data can be analyzed using the classification and information theoretic algorithms described herein.
It is anticipated that compared to Group 1 healthy controls, both Group 2 and 4 patients would have decreased Response Specificity, and the ligands that are confused as well as the identified gene programs contributing to mis-regulation would guide refinements and downscaling of the assay. Comparison of Group 3 macrophages would control for the effect of tumor-conditioning on macrophage Response Specificity.
The optimal selection of genes to comprise a panel for routine determination of Response Specificity is identified by analysis of heatmaps of bulk RNA sequencing data, conducting principal component analysis (PCA), and identifying the target panel therefrom.
This example describes combining high-dimensional flow cytometry and single cell RNA sequencing (scRNAseq) with information theory (IT) to more completely characterize signaling networks and functionality in innate immune cells in breast cancer patients as compared to healthy controls.
To enhance our understanding of the immune system immediate response, innate immune cells from peripheral blood samples taken from breast cancer patients are studied. Flow cytometry-based profiling of the signaling network and scRNAseq profiling of the resulting transcriptome are used to identify biomarkers that are prognostic and useful in informing therapeutic options. Information theory algorithms are used to analyze high dimensional data that includes many nonlinear interactions in order to establish clinically informative scores for how the functional health of innate immune cells are altered in breast cancer patients, at the level of both cellular signaling and gene expression.
Workflow for Flow Cytometry. Blood samples from healthy donors are collected at 0, 1, and 6 months to provide three independent time points. Blood samples are processed into PBMCs and preserved. Banked frozen PBMCs are thawed and analyzed in batches to minimize variability. To measure the effects of cytokines (e.g., TNF, IFNγ, IL6) and PAMPs (e.g., CpG, LPS, P3CSK, and polyI:C) stimulation on the monocytes isolated from PBMCs, individual stimuli are used alone or in combination at 15, 30, and 60 min after stimulus application.
To investigate whether IFN-γ and IL-6 interact with each other in the same cells, one can simultaneously stimulate PBMCs with IFN-γ and IL-6 at varying combinations of concentrations (e.g., 1, 5, 25, and 125 ng/ml) and measured the effects of IL-6 on IFN-γ-induced pSTAT1 AMFI. Representative analytes include, but are not limited to, IkBa, p-RelA, pSTAT1, pSTAT2, pSTAT3, pSTAT5, p-IRF3, p-p38, p-ERK, and p-JNK. Possible study time points can be 10′, 30′, 60′, 120′, and 240′.
Workflow for scRNAseq (BD Rhapsody) Profiling of Monocyte Responses. To establish an efficient transcriptomic assay to probe the health of PBMC-isolated monocytes, one can freeze PBMCs from healthy donors in 90% FBS/10% DMSO. Monocytes can be selected using either pan-monocyte negative-selection or CD14+ positive-selection MACS sorting, which typically yields 15-20% monocyte cells from healthy PBMCs. After allowing the monocytes to equilibrate in complete media with 1%1-BS for 2 hrs, one can stimulate with 8 cytokine or PAMP ligands (e.g., TNF, IFNβ, IFNγ, IL6, CpG, LPS, P3CSK, polyI:C) at 2 doses, collecting at 4 and 10 hrs time points for scRNAseq. The BD Rhapsody scRNAseq platform can profile 500 immune response genes, which are selected to be genes with maximum response diversity from bulk data.
Channel capacity can be calculated to quantify single cell transcriptomic responses across stimuli at each time point and determine which stimuli are most informative. Using existing methods, channel capacity describes the upper bound on the ligand discriminability. With scRNAseq data, the channel includes both the upstream multi-transcription factor signaling network and the downstream gene regulatory network. Channel capacity can be calculated for all genes, single genes, sets of genes controlled by the same regulatory mechanism, or genes performing different functions (e.g., cytokines, tissue remodelers, feedback regulators, etc.). The overall information capacity for different sets of ligands and doses can be compared. Completion of these studies would allow one to select 4 ligands at the most informative doses, as well as 1 time point, for the response specificity assay to be conducted in breast cancer patients as described below.
Profile PBMCs from Breast Cancer Patients Vs Healthy Donors
Proper transcription of inflammatory response genes may be altered in circulating monocytes already exposed to the tumor microenvironment and may be prognostic for the clinical course of breast cancer patients. Previous studies have shown, for example, that IFNγ signaling is altered in monocytes from ER+ breast cancer patients. Conversely, detection of the type of misregulation of gene programs in monocytes of breast cancer patients would help inform therapeutic options. It is hypothesized that breast cancer patient monocytes would exhibit loss of transcriptomic response specificity due to dysregulation of the gene regulatory strategy of a subset of genes, which can be quantified by information theoretic analyses of stimulus-response scRNAseq data.
Flow cytometry workflow and scRNAseq workflow have been described above. To enable the eventual definition of healthy vs. unhealthy ranges of monocyte response scores for a given set of stimuli, one can apply mutual information calculations and machine learning classifiers to healthy vs cancer patients to identify most informative set of analytes and their interactions. One could compare analyte combinations from both flow and scRNAseq data to determine which might be more informative in reflecting innate immune function in breast cancer patients.
Completion of these studies would help determine which genes, single or combinations, show differential responses, or reduced stimulus-specificity. These genes can be used to establish ranges for response specificity score for healthy innate immune cells vs those from breast cancer patients. As an example from a previous study, this concept and experimental/analysis approaches were applied to innate immune cells from healthy vs Sjogren's Syndrome mice. Stimulus-specific gene sets were identified in that autoimmune disease and both machine learning classifiers and information capacity calculations were able to score the reduced functionality of innate immune cells from the disease model. Different sets of genes would be identified for scoring healthy vs breast cancer monocytes and may also be mechanistically instructive.
Results from these studies would validate new immune biomarkers related to the innate immune cells. It would build a comprehensive network of multiple signaling pathways extended to innate immune cell populations over time in the healthy and cancer immune system. By integrating concepts of information theory (mutual information and channel capacity) it would lead to a novel computational framework that allows us to gain valuable mechanistic insights into the information flow through the immune signaling networks. By the observation of dynamic responses to stimuli, one would be able to create functional scores of immune cell states with clinical implications.
Innate immune health is a critical determinant of susceptibility to infectious and inflammatory diseases. For instance, monocytes from severely ill COVID-19 patients exhibit defective IFN responses. However, there are currently no clinical measures to assess innate immune health and predict an individual's risk of poor outcomes in diseases such as COVID-19 and cancer.
This example examines how the specificity of monocyte responses is impacted by obesity, which is a major risk factor for severe COVID-19 and other infectious diseases, with the ultimate goal of developing a diagnostic test to predict broad susceptibility to infectious and inflammatory diseases. It is hypothesized that monocytes from obese individuals exhibit inappropriate, less-specific responses. One could identify key indicator genes that can be developed into a clinical diagnostic test to define innate immune health and predict risk for poor outcomes in COVID-19 and other infectious and inflammatory diseases.
In one embodiment, monocytes isolated from PBMCs obtained from lean (e.g. BMI 18.5-25) and obese (e.g. BMI >40) individuals could be stimulated with microbial and inflammatory ligands (e.g. LPS, Pam3CSK4, CpG, poly(I:C), flagellin, R848, TNF-α, IFN-b) for 8 hours, and expression of 500 genes (including cytokines, chemokines, other inflammatory mediators) selected for being highly stimulus-specific in healthy individuals can be examined as described above. The Rhapsody scRNAseq workflow and data analysis described above are used in the experiments. The primary metrics could be classification F1 score and channel capacity. Based on mouse studies, and assuming a common standard deviation as high as +/−20%, with n=10 samples, one can achieve 92% power using a two-sided two-sample t-test, at 0.05 significance level. Clinical records of relevant comorbidities would allow us to stratify the donor population further to identify relevant biomarkers.
Macrophages were obtained by differentiating immortalized myeloid progenitors (iMPs) in DMEM/10% FBS+30% L929 supernatant for a total of 10 days: iMPs were initially thawed into 10 cm non-adherent petri dishes for 3 days. On day 0 of differentiation, cells were then washed once and transferred into differentiation media (DMEM/10% FBS, 30% L929 supernatant, 1% PS, 1% L-Glut, b-Me (1:1000)). Cells were replated into 6 cm plates with new media on day 7, at a density of ˜20k cells/cm2. On day 10, the iMP-derived macrophages (iMPDMs) were stimulated with 100 ng/mL lipopolysaccharide (LPS, Sigma Aldrich), 10 ng/mL murine TNF, and 50 μg/mL low molecular weight polyinosine-polycytidylic acid (Poly(I:C)), 100 nM synthetic CpG ODN 1668 (CpG), 500 U/ml IFNβ, or media only Untreated control. For polarized macrophages, cells were incubated in 50 ng/ml IFNγ or 50 ng/ml IL4 for 24 hours prior to stimulation on day 10.
C57Bl/6 mice were obtained from Jackson labs. Two male mice were combined for each condition in order to obtain sufficient numbers of cells for the assay: 90 wks old (000664 C57BL/6J), 17 wks old (380050 C57BL/6J/DIO high fat diet (60% fat diet)), 17 wks old (380056 C57BL/6J/DIO controls (10% fat diet)). Peritoneal macrophages were extracted by injecting 10 mL PBS+1% FBS into the peritoneal space, shaking gently, and then pulling out as much fluid as possible, typically ˜8 ml. Macrophages were plated in DMEM +10% FBS and allowed to rest 24 hrs. Floating cells after that time were washed away, and remaining adherent macrophages were stimulated with the same ligand concentrations as for iMPDMs: 100 ng/mL lipopolysaccharide (LPS, Sigma Aldrich), 10 ng/mL murine TNF (R&D), 50 μg/mL low molecular weight polyinosine-polycytidylic acid (Poly(I:C)), 500 U/ml IFNβ, or media only Untreated control. Cells were washed once with cold PBS after 3 hrs of stimulation and lifted into suspension for the Rhapsody scRNAseq assay. The investigators were not blinded to the identity of the animal models during the experiments or outcome assessment.
To select genes for single-cell targeted gene profiling, existing bulk transcriptomic profiling of macrophage responses were analyzed. Bulk RNAseq data from Cheng et al. (Cell Systems, 4:330-343, 2017) was obtained from GEO GSE68318. Counts were converted to counts per million (cpm) using the package edgeR (Bioinformatics, 26:139-140, 2010), and genes with cpm >4 in at least three samples were retained. Induced genes were gathered by calculating fold changes at each of the 14 stimulus conditions available in the dataset, at each timepoint, against the unstimulated controls. Genes were retained as induced genes if they met the threshold of log 2 (fold change)>2 and p-value <10-5, which resulted in 1502 genes.
Because PCA identifies a new basis that maximizes variance within the rotated data, it was ideal for identifying genes that varied in expression level across different stimuli. PCA was performed centered and unscaled on the induced genes across all time points for the 14 stimuli in the dataset. The loadings matrix obtained from the PCA was used to calculate a rank score for each gene. The rank was computed as the radial distance of each gene j from the origin, over the top 20 PCs: scorej=Σx=120(PCxj)2, where PCxj is the component x loadings value for gene j. The top 480 ranked genes were included in the panel, and the remaining 20 genes were manually selected to add genes such as cell type markers, macrophage polarization markers, and transcription factors (Listing 1). As a visual confirmation of the approach, k-means clustering was performed on all induced genes and loadings were colored by cluster. As expected, genes with the highest absolute loadings values in each principal component tended to be spread across different clusters, and the top genes in each principal component exhibited distinct patterns across stimuli.
To identify an optimal set of the most distinct stimuli from the set of 14 used in the bulk transcriptomic data, tensor components analysis (TCA) was performed. TCA is a higher dimensional parallel of PCA— whereas PCA is performed on a genes×sample matrix, TCA is performed on a higher-order tensor by folding the gene expression matrix into a genes×stimuli×timepoint tensor. The bulk RNAseq data consisted of N genes over S stimuli with T timepoints per stimulus, which formed a third-order tensor X of dimensions N×S×T (a three-dimensional array). Tucker decomposition (Psychometrika, 31:279-311, 1966) was performed using the package rTensor. This decomposes the tensor into a core tensor G of dimensions R1×R2×R3, multiplied by a matrix U(i) along each mode,
X=G×
1
U
(1)×2U(2)×3U(3)
The first five components, which explained 92% of the variance in the data, were retained. The stimulus loadings matrix U(2) of dimensions S×R2 was hierarchically clustered on the first five components, and stimuli that each occupied separate branches of the hierarchical tree were selected. Stimuli that represented whole bacterial organisms or viruses were not selected, in favor of isolated bacterial or viral components.
Rhapsody scRNAseq
To collect the adherent macrophages for scRNAseq using the Rhapsody platform, macrophage cells were washed 1× with cold PBS, then lifted into suspension by incubating at 37 C for 5 minutes with Accutase, which resulted in cell viability typically >85%. Cells were centrifuged at 4 C, 400 g for 5 minutes, and resuspended in PBS+2% FBS. Cells were hash-tagged with anti-CD45-hashtags (BD Rhapsody #633793) and loaded onto the cartridge following manufacturer's instructions (BD Rhapsody #633771), with the following modifications, which helped ensure sufficient cell viability for the subsequent steps: Incubation with hashtags was performed for 30 mins on ice, instead of 20 mins at room temperature; only two washes were performed after hashtag incubation to minimize cell loss. Each cartridge was then loaded with a total of ˜36k cells across 12 hash-tagged samples (˜3k cells/sample). Libraries were prepared according to manufacturer's instructions (BD Rhapsody #633771) and sequenced 2×100 on Novaseq 6000.
Motif enrichment of induced genes and selected genes was performed using HOMER (Molecular Cell, 38:576-589, 2010), with a motif search range of −1000 to +100 of the TSS of each gene. Individual motif hits were placed into five categories: bZIP (AP1 family TF motifs), IRF (IRF and ISRE motifs), RHD (Rel Homology Domain NFκB family motifs), ETS (Erythroblast Transformation Specific family TF motifs), and Zf (Zinc finger motifs). To summarize the overall enrichment of particular transcription factor families within the gene sets, the average −ln(p value) of motifs in each category was calculated, and a second log transform was taken for plot visualization.
Gene ontology on selected genes versus unselected genes was performed using clusterProfiler (OMICS: A Journal of Integrative Biology, 16:284-287, 2012) against a background of all genes. Cutoff values of p-value <0.01, Benjamini-Hochberg q-value <0.05, and minimum gene set size >5 were used. Ontologies were grouped if they had a similarity proportion greater than 0.7. The top three Biological Processes ontology terms for each group were plotted.
scRNAseq Data Processing
Raw fastq files were processed using the BD Rhapsody™ Targeted Analysis Pipeline (version v1.0) hosted on Seven Bridges Genomics. Distribution-Based Error Correction (DBEC)-adjusted UMI counts (molecules per cell) were used in the downstream analysis. Multiplets, cells with undetermined barcodes, and cells with less than 80 features were removed from the analysis. Due to the selected 500 gene panel comprised of largely inducible genes, the assumption that the total number of RNAs per cell is constant does not hold. Counts were therefore normalized using the package ISnorm, rather than the more standard approach of dividing by total counts per cell. PCA was performed centered and unscaled using the R function prcomp, and UMAP and tSNE were performed on the top 20 PCs.
Machine learning classification models were trained using scRNAseq data from naïve macrophages. The data was split 70%/30% into a training group and a testing group. Using only the training data, a random forest model was trained using repeated 10-fold cross validation, with 3 repeats. The parameter mtry, which is the number of variables randomly selected as candidate features for each decision tree split, was set to √(total number of features). As alternative models, weighted k-nearest neighbors and neural network model were also trained on the same dataset. Random forest, weighted kNN, and neural network models were implemented using the R package caret (classification and regression training) (Journal of Statistical Software, 28:1-26, 2008). After the model was trained, the remaining held-out data was tested, with each cell assigned a soft probability prediction for each ligand. The highest probability ligand was the final prediction. Ensemble modeling was performed by majority voting, taking the predicted stimuli from each of the individual models and choosing the most common stimulus predicted for each cell among the different machine learning models. Macrophages of other polarization conditions, M1 (IFNγ) and M2 (IL4), were tested using the model trained on M0 naïve macrophages.
Feature importance was extracted from the trained random forest model, which is calculated by how much information is lost at each node/split of the decision trees. This value was calculated based on Gini impurity: Σi=1C(fregi×(1−freqi)), across all unique category labels C. The feature importance is then the product (decrease in node impurity)*(probability of reaching that node), scaled so the top feature has a value of 100.
An information theoretic approach was used to identify either individual genes or combinations of genes providing the highest maximum mutual information between ligand identity and gene expression. Error bars on mutual information calculations were done using 10 bootstraps on 50% of the data. Estimation of maximum mutual information was implemented using the R package SLEMI (PLoS Comput Biol., 15, e1007132, 2019), which uses a statistical learning-based approach to more accurately and more efficiently calculate maximum mutual information for data types with higher dimensional outputs. The max MI was first calculated for each gene individually, using all stimuli (highest theoretical max MI=2.58 bits), as well as for all pairs of stimuli (highest theoretical max MI=1.0 bits). To relate the max MI to the either the mean and variance of each gene, max MI was plotted against either the average Fano factor
across all stimuli S for each gene, or the mean squared deviation
To estimate the maximum mutual information of the best combination of 1, 2, 3, . . . , N genes, we first started from a list of the top 20 genes that individually had the best max MI value. For each of these single dimension channels, every combination of two genes was scanned, and again ranked the best combination of two genes and retained the top 20. This process was repeated for each additional gene until the gain in max MI for each additional gene leveled off. Retaining only the top 20 sets at each dimension made the calculation more computationally feasible, while still allowing the possibility for gene combinations that are not simply additive of the previous dimension's highest max MI combination.
For gene-specific pairwise calculation of MI, max MI between pairs of stimuli was calculated for each gene at 3 hrs using the single-cell RNAseq data. Max MI was plotted against gene regulatory groups between the two stimuli. Correlation coefficients were calculated using the max MI values for signaling pairs for every NFκB signaling codon, and the max MI values for the corresponding gene expression pairs for every NFκB target gene. Genes without any correlation p-value <0.25 across the six codons were removed from the display. The Pearson's coefficient was plotted on the heatmap, and genes were hierarchically clustered using complete linkage.
The Response Specificity Profile is a collection of values that collectively summarize the distinguishability of macrophage responses to different stimuli. The response landscape on which the Response Specificity Profile is calculated was obtained first by principal component analysis performed centered and unscaled on all stimuli across M0, M1 (IFNγ), and M2 (IL4) cells. This initial matrix can be written as =N×P, where N is the total number of measured genes and P is the stimulated single cells of different macrophage subtypes. This matrix rotation of M by PCA represents the landscape of physiological macrophage responses. Calculation of maximum mutual information using the PC scores was then performed for all possible pairs of stimuli. The advantage of using PC scores to perform mutual information analyses lies in the reduction of noise that would otherwise result in overfitting. Overfitting due to the large number of features was observed to result in saturation of the maximum mutual information values to the theoretical maximum for all pairs. Max MI was therefore calculated on the top three PCs (capturing 42% of the variance in the data), using a truncation based on the PCA scree plot, as each subsequent component after Component 5 added only ˜1% more to the variance explained. All error bars were generated by 50 iterations of 50% bootstrap resampling of the complete single cell dataset.
The Response Specificity Profile of new samples was evaluated by projection of the new gene expression data onto the dimensionality-reduced space. The initial PCA was defined as =WT×M, where S is the r×P scores matrix, and W is the N×r loadings matrix, the scores of the new projected data is then given by Snew=WT×Mnew.
Since cells from disease models are projected into the same basis, scores from any new projected data now sit in the same lower-dimensional space and can be compared to the Response Specificity of samples within initially defined response landscape. For new samples, the maximum mutual information between ligand and transcriptomic output was again calculated for all available pairs of ligands using the PC scores. The summary score delta Response Specificity Index (ΔRSI) was generated by the following: ΔRSI=√Σ(RSIp−RSIpM0)2 across all pairs of stimuli p.
Code for data processing and analysis, and processed single cell data, including raw counts and normalized data, are available on the Github repository.
To quantify the degree of Response Specificity in macrophages, an experimental workflow was developed using a targeted mRNA sequencing approach (Adv. Exp. Med. Biol., 1129:63-79, 2019) that was both cost-effective and reduced the technical noise of single-cell genome-wide RNA sequencing approaches (
To identify immune stimuli that would best represent Response Specificity, bulk RNAseq data from macrophages responding to 14 different pathogen or cytokine ligands was analyzed to determine the ligands that induce diverse macrophage responses (
Because gene programs are induced dynamically, the quantification of Response Specificity may depend on the time-point at which gene expression measurements are taken. To compare different timepoints of stimulation, pairwise stimuli distances were calculated at 1, 3, and 8 hrs. Ligand-pair distances were most distinct at 3 hrs (
A comparison of the single-cell RNAseq data from the targeted approach (Rhapsody) to the genome-wide approach (10× Genomics) showed that both had similar concordance in their means to bulk data (
To determine how much single-cell heterogeneity affected the stimulus-specificity of responses, the overlap in gene expression response distributions was assessed next. PCA revealed that IFNβ, LPS, and PIC response distributions were best distinguished, with minimal overlap of their 95% confidence regions on the first two components (44% variance explained), while TNF, CpG, and P3C appeared overlapping (
To identify genes that may be driving the observed stimulus-specificity, an information theoretic approach was employed. Ligand information was considered as transmitted through a channel comprised of cell signaling and gene regulatory networks, both affected by pre-existing biological heterogeneity and the stochasticity of biochemical reactions, to produce gene expression responses (
It was found that 95% of the measured genes on their own conveyed no more than 1 bit of information (
How then can macrophages achieve high Response Specificity? One possibility is that a combination of genes that show low cell-to-cell heterogeneity may specify stimulus-specific responses to diverse stimuli. To assess this possibility, the same information theoretic framework was employed to calculate the maximum mutual information provided by the best-performing combinations of genes. Indeed, the best combination of two genes (Cmpk2 and Wiz) already allowed for a maximum MI of almost 2 bits, with the gain in max MI plateauing to −2.25 bits as larger combinations were tested (
Gene combinations that worked well together did so by distinguishing complementary ligand pairs (
High Response Specificity requires not only that mean population gene expression is distinct, but also that the cell-to-cell heterogeneity is low such that distributions of single cell responses have limited overlap. Hence, whether Response Specificity was limited by small differences in means or wide distributions was investigated. It was found that mean squared deviation (MSD), which summarizes differences in means across ligands, correlated more strongly to Response Specificity (r=0.8) than average Fano factor, which summarizes average gene heterogeneity (r=−0.5) (
Interestingly, a pattern emerged where the Fano Factors of cytokine/chemokine genes were higher than expected, diminishing their Response Specificity despite distinct mean expression levels (
In macrophage responses, four signaling pathways and their downstream gene regulatory factors are combinatorially activated and are responsible for transmitting information about extracellular ligands to the nucleus (
In contrast, for the P3CSK vs. TNF stimulus pair, only NFκBlp38 target genes showed specificity because both stimuli activate AP1 and NFκB and fail to activate IRF, while differing only in p38 activation (
In addition to combinatorial pathway control, the dynamics of NFκB activation also specify gene expression. Stimulus-specific NFκB temporal dynamics involve six NFκB signaling codons that convey information about the stimulus to target genes: “Speed”, “Peak Amplitude”, “Oscillations”, “Duration”, “Total Activity”, and “Early vs. Late” activity (
Macrophages show remarkable functional pleiotropy that is dependent on microenvironmental context. Thus, polarization by prior cytokine exposure may alter their capacity for stimulus-specific responses. To test this hypothesis, macrophages were polarized into M1 (IFNγ) and M2 (IL4) states that represent opposing ends of the macrophage polarization spectrum (
It was found that Response Specificity was reduced in both polarization states for every set of the best gene combinations, as calculated by max MI, indicating that polarized macrophages may function more as specialized effectors and less as sentinels that serve a primary role of distinguishing immune threats (
While a global analysis indicated that polarization diminished macrophage Response Specificity, the genes most affected differed for each macrophage state. In addition, a smaller set of largely NFκB response genes also showed increased rather than diminished Response Specificities under each polarization condition (
The Response Specificity Profile of Stimulus Pairs Assesses the Functional State of Macrophages, and Readily Distinguishes M0 vs. M1 vs. M2 Macrophages
Because the Response Specificity of particular genes was distinctly altered by each cytokine context (
Calculating the difference in max MI from the M0 state, and thereby the Delta Response Specificity Profile, provided a characteristic signature of the functional state of the macrophage (
Peritoneal Macrophages from Old and Obese Mice Show Distinctive Alterations in their Response Specificity Profiles
Next, whether the Response Specificity Profile might reveal aberrations in macrophages derived from mice at risk for chronic inflammatory disease was tested. Peritoneal macrophages (PMs) were taken from healthy mice (17 weeks old), old mice (90 weeks old), and high-fat-diet obese mice (17 weeks old) and the Response Specificity workflow was performed using five ligands: LPS, TNF, PIC, P3C, and IFNβ (
To identify individual genes that showed particularly high losses of Response Specificity in these in-vivo-conditioned macrophages, max MI values for each gene were compared (
Taken together, quantifying the Response Specificity of macrophages revealed that this functional hallmark of immune sentinel cells is affected not only by polarizing cytokines used in pre-conditioning regimes in vitro, but also by the microenvironments in vivo that are evidently distinct in obese and old mice. The observed differences in Response Specificity suggest that quantifying post-stimulation single-cell response distributions could be valuable for assessing the health of macrophage function.
Mounting stimulus-appropriate immune responses is a key property of healthy macrophage function. Macrophage Response Specificity is a function of the stimulus-specific engagement of signaling pathways and may be diminished by molecular noise that results in cell-to-cell heterogeneity. While Response Specificity of one macrophage immune signaling pathway has been characterized, the Response Specificity of immune gene expression responses arising from all macrophage signaling pathways has remained unquantified. By developing the experimental and computational tools to do so, it was found that the high gene expression specificity observed was generated by sufficiently narrow response distributions in combinations of genes that respond with distinct patterns across stimuli, but that the contribution of often-measured cytokine genes was limited by high cell-to-cell heterogeneity of expression. Mechanistically, it was found that Response Specificity is generated by stimulus-specific activation of interferon or MAPKp38 signaling, or by differences in NFκB dynamics Loss of IRF gene specificity by microenvironmental polarization was the key driver in altering Response Specificity Profiles. Given that Response Specificity is context-dependent, peritoneal macrophages from old and obese mice were profiled, revealing highly specific changes in the Response Specificity Profile that reflected a different functional health status. These findings suggest that Response Specificity could be a means to characterize the innate immune health of human donors.
The ability to measure and subsequently quantify Response Specificity was enabled by a quantitative assay for cost-effective, reliable scRNAseq. The targeted sequencing approach pursued here resulted in less technical noise than genome-wide approaches, due to improved reverse transcriptase efficiency and increased sequencing depth per gene. However, even with technical improvements, the remaining measurement noise still may result in underestimates of the true Response Specificity. Future efforts on smaller gene lists may allow the use of even less noisy measurement approaches like single molecule fluorescence in situ hybridization (smFISH) that capitalize on the smaller sets of informative genes identified in this study.
The information theoretic approach used here to quantify Response Specificity, previously employed to quantify the information transmission capacity of signaling pathways, led to the finding that only a few of each pathway's target genes maintained the information available from signaling activity. Theoretically, information loss from signaling to gene expression is minimal without noise, while with noise, maintaining minimal information loss is only possible under select optimal promoter or chromatin conditions. A further consideration is that an individual gene can be regulated either by a single pathway or by multiple pathways, for example the NFκB target gene Tnf whose mRNA half-life is regulated by stimulus-induced MAPKp38. This combinatorial control explains why in principle some single genes like Tnf can hold more information than available from a single signaling pathway. But to achieve high gene expression Response Specificity in either scenario, signaling information must be interpreted by gene regulatory mechanisms without amplifying the cell-to-cell heterogeneity in signaling activity that already degrades stimulus-specificity, and also without introducing further heterogeneity through pre-existing chromatin heterogeneity or molecular noise. In this context, it is not surprising that the Response Specificity of most individual genes was low. However, as macrophages do not rely on only a single gene to mount a biological response to a specific immune threat, even with information loss at each particular promoter, the overall Response Specificity observed through combinations of a few genes from complementary pathways was still high.
Within the body, macrophages are exposed to polarizing microenvironments in physiological scenarios, as well as in pathological inflammatory contexts such as aging or obesity. As circulating cells, they are potentially reporters of even localized infections. Indeed, profiling the transcriptome or epigenome of circulating cells or macrophages has revealed molecular markers or signatures that are prognostic for therapeutic efficacy or alternative disease courses, such as in persistent infectious disease and cardiovascular and autoimmune diseases. However, as seen in human COVID-19 studies, it can be unclear which individuals have poor or vigorous immune health until they are challenged by infection. Here, it is considered that macrophage functions are deployed in response to immune threats and that stimulus responses are a function not only of the steady-state transcriptome or epigenome but also the dynamics of signaling complexes, membranes, and transport rates. It is reasoned that macrophage responses to different stimuli reveal a functional pleiotropy not evident at steady state, and that the stimulus-specific deployment of functions is key to healthy immunity. Indeed, it was found that macrophages conditioned in vitro by defined polarizing cytokines, as well as macrophages isolated from obese or old mice, showed distinctly altered Response Specificity profiles due to both cytokine genes and regulators within macrophage metabolic and signaling pathways.
In developing the Response Specificity Profile, it was found that analyzing single genes and pairs of stimuli provided more insight than aggregating the data together. For example, Response Specificity for each pair of stimuli differed for each polarization condition, but this information is lost when calculating Response Specificity for all stimuli at once. Instead, an aggregate score of alterations in the Response Specificity Profile (delta RSI) provide a first indication of differences, and the full Response Specificity Profile pinpointed aberrancies in select stimulus pairs that may be a diagnostic for a specific condition, such as macrophages from obese mice confusing TNF and PIC responses. For initial surveys of Response Specificity Profiles, measuring the expression of a large number of genes in response to multiple stimuli is important, as the most informative genes and stimulus comparisons are different across various macrophage states. A PCA approach was employed to characterize the response landscape and identify genes important across conditions, by using the gene weights in principal components. But the Response Specificity of individual genes differed greatly between the two disease models tested here, emphasizing the importance of gene-by-gene analysis in characterizing the Response Specificity Profile to make biologically meaningful predictions.
Characterizing the macrophage Response Specificity Profile may prove useful in clinical scenarios. Many steady-state metrics of immune health exist, such as the complete blood count that is a mainstay of clinical lab tests. However, tests for functional immune responses, already used in select clinical scenarios such as tuberculosis testing or allergy testing, are also rapidly emerging. Multiple such assays rely on ex vivo stimulation of extracted clinical samples to diagnose immunosuppression or phagocytic ability, and have included transcriptomic profiling studies of stimulus-responses on peripheral blood that identified inter-individual variation among healthy donors and the genes driving those differences. As seen with existing assays, to what extent stimulus-response measurements provide more reliably prognostic information than steady-state molecular profiling may depend on the health condition being studied.
The assay of single cell macrophage responses presented herein may identify outlier response cells within each donor sample through the Response Specificity Profile, that may be associated with risk for aberrant inflammatory responses or diminished innate immune defenses, or that may be reflective of an ongoing inflammatory or infectious condition that is not otherwise presented. The Response Specificity Profile allows new samples to be compared readily to a healthy range, a property which could be the basis for a clinically deployable measure of innate immune health. The stimulus-response data may also identify specific genes with aberrant response distributions in patient cohorts or in individual donors. Identifying such genes may provide cost-effective prognostic markers for specific cohorts or point to the underlying etiology of poor immune function. To realize this promise, large-scale clinical studies will be required to establish connections between Response Specificity Profiles and risk for disease. However, as a quantifiable property of macrophage function that changes with conditioning cytokines or states of health and disease, macrophage Response Specificity Profile may be a viable approach for measuring the health of innate immune function in the clinic.
Macrophages respond to immune threats via the dynamic induction of hundreds of immune response genes. However, technological limitations have precluded the measurement of temporal trajectories of gene expression at single cell resolution. The present example describes a method, scREAL-TIME, to model the ensemble of single-cell real-time expression trajectories that account for time-series scRNAseq from stimulated macrophages. It was found that dynamical information from single-cell macrophage expression trajectories held a much greater capacity for stimulus-response specificity than apparent from time-point scRNAseq measurements, and uncovered previously unmeasurable correlations in the expression of pro-inflammatory cytokines. The heterogeneity of dynamical features of gene expression was highly dependent on microenvironmental context. Consequently, the polarization state of single cells could be predicted more readily by response dynamics than by steady-state or time-point response measurements. The findings presented herein emphasize the importance of accounting for dynamical information about the cellular transcriptome in characterizing the functionalities and functional states of immune cells.
Macrophages were generated by differentiating immortalized myeloid progenitors (HoxB4 iMPs) as previously described. HoxB4-iMPs initially thawed and expanded in 10 cm non-adherent petri dishes for 3 days. They were then washed once and placed in differentiation media for a total of 10 days (DMEM/10% FBS, 30% L929 supernatant, 1% PS, 1% L-Glut, b-Me (1:1000)), with replating onto 6 cm plates with new media on day 7, at a density of ˜20k cells/cm2. For polarized macrophages, cells were incubated in 50 ng/ml IFNγ or 50 ng/ml IL4 for 24 hours prior to stimulation on day 10. On day 10, macrophages were stimulated with one of six stimuli: 100 ng/mL lipopolysaccharide (LPS, Sigma Aldrich), 10 ng/mL murine TNF, and 50 μg/mL low molecular weight polyinosine-polycytidylic acid (Poly(I:C)), 100 nM synthetic CpG ODN 1668 (CpG), 500 U/ml IFNβ, or media only Untreated control. Samples were collected at multiple timepoints post-stimulation, including 15 or 30 minutes, 1 hr, 3 hrs, 5 hrs, and 8 hrs.
Macrophage scRNAseq
To collect the adherent macrophages for scRNAseq using the BD Rhapsody platform, macrophage cells were washed 1× with cold PBS, then lifted into suspension by incubating at 37 C for 5 minutes with Accutase, which resulted in cell viability typically >85%. Cells were centrifuged at 4 C, 400 g for 5 minutes, and resuspended in PBS+2% FBS. Cells were hash-tagged with anti-CD45-hashtags (BD Rhapsody #633793) and loaded onto the cartridge following manufacturer's instructions (BD Rhapsody #633771), with the following modifications, which helped ensure sufficient cell viability for the subsequent steps: Incubation with hashtags was performed for 30 mins on ice, instead of 20 mins at room temperature; only two washes were performed after hashtag incubation to minimize cell loss. Each cartridge was then loaded with a total of ˜36k cells across 12 hash-tagged samples (˜3k cells/sample). Libraries were prepared according to manufacturer's instructions (BD Rhapsody #633771) and sequenced 2×100 on Novaseq 6000.
scRNAseq Processing
Raw fastq files were processed using the BD Rhapsody™ Targeted Analysis Pipeline (version v1.0) hosted on Seven Bridges Genomics. Distribution-Based Error Correction (DBEC)-adjusted UMI counts (molecules per cell) were used in the downstream analysis. Multiplets, cells with undetermined barcodes, and cells with less than 80 features were removed from the analysis. Due to the selected 500 gene panel comprised of largely inducible genes, the assumption that the total number of RNAs per cell is constant does not hold. Counts were therefore normalized using the package ISnorm, rather than the more standard approach of dividing by total counts per cell. PCA was performed centered and unscaled using the R function prcomp, and UMAP and tSNE were performed on the top 20 PCs.
scREAL-Time
Dimensionality Reduction: Principal Component Analysis (PCA) was performed on the time-series single-cell RNAseq data. PCA was performed centered and unscaled across all stimuli simultaneously, generating a lower dimensional data embedding that captured the gene expression information of individual cells.
k-means clustering: It is assumed in this method that many representative archetypes of cells can estimate the spectrum of behavior of individual cells. Conceptually then, with enough archetypes linked combinatorially across timepoints, the full range of heterogeneous behavior of individual cells can be estimated in a manner that minimizes the heterogeneity arising from technical noise. k-means clustering was utilized on the top 50 PC scores with k=20 to determine archetypes at each time point. At any given measured time-point, a cell may be classified as a member of one of k archetypes. For each archetype at a given time point, a consensus measurement was utilized—an average value for each PC score determined by the members of said archetype from the data.
Weighted Random Walks: The probability of archetype connections across timepoints was labeled by the density of cells belonging to that archetype and the Euclidean distance of the mean of each archetype. Upon assigning these probabilities, random walks were generated where each simulation belonged to a particular observed archetype at a given timepoint, adopting the archetype's consensus PC measurements.
Spline fitting: For each random walk, unmeasured timepoints were interpolated across by fitting polynomial spline interpolants with three degrees of freedom. This linked the lower dimensional cell-level data, resulting in a trajectory for each cell in PC space, with gene expression trajectories for each cell embedded and recoverable from the original PCA loadings.
Recovering gene trajectories: The approximately invertible property of PCA was utilized to invert dimensional reduction transformation using the top 50 PCs, requiring use of the loadings matrix to scale and rotate the lower dimensional cell-based data. This back projected the random walk trajectories from the lower dimensional space to reflect individual gene trajectories for every cell. Each random walk, i.e., each cell, was subsequently represented as a continuous curve in gene expression space.
For comparing stimulus-specificity, trajectories were first scaled 0-1 across all stimuli, and the trajectory features were calculated on the scaled trajectories. This was necessary as spline fitting across principal components generated occasional negative values for some cells, especially at the beginning of the trajectory. Scaling maintained the same relative dynamics and amplitudes, and increased the reliability of the trajectory feature calculation. Integral was calculated by integrating each single cell trajectory curve from 0 to 8 hours. Peak amplitude was calculated by taking the maximum value over 8 hours. Max log fold change was calculated for each gene by taking the fold change between the peak amplitude and the same cell's starting value at 0 hours. Activation speed was calculated by taking the derivative of the curve at 1 hour.
An information theoretic approach was used to identify either individual genes or combinations of genes providing the highest maximum mutual information between ligand identity and gene expression. Error bars on mutual information calculations were done using 10 bootstraps on 50% of the data. Estimation of maximum mutual information was implemented using the R package SLEMI, which uses a statistical learning-based approach to more accurately and more efficiently calculate maximum mutual information for data types with higher dimensional outputs. The max MI was first calculated for each gene individually, using all stimuli (highest theoretical max MI=2.58 bits), as well as for all pairs of stimuli (highest theoretical max MI=1.0 bits).
In response to inflammatory stimuli, macrophages dynamically transcribe hundreds of immune response genes. Cell-to-cell heterogeneity in the expression of these response genes arises at each regulatory step, including the kinetics of ligand-receptor binding, downstream adapter recruitment, kinase activation, and transcription factor localization, all of which may also be affected by context-dependent signaling crosstalk and chromatin remodeling.
To measure context-, stimulus-, and time-dependent single-cell macrophage responses, the BD Rhapsody platform was used to perform targeted scRNAseq on >100,000 individual macrophages (
Conceptually, polarizing macrophages with microenvironmental cytokines pushes stimulus-responses into different channels across the response landscape (
Inspecting specific genes revealed timepoint-, stimulus-, and polarization state-dependent changes in expression heterogeneity. For example, in response to LPS, the expression of the proinflammatory cytokine Tnf exhibited high single-cell heterogeneity from to 1 hour in both M0 and M2 (IL4) macrophages, but lower heterogeneity in M1 (IFNγ) macrophages (
Identifying the Ensemble of Single-Cell Expression Trajectories that Account for Time-Series scRNAseq Data
To examine the amount of stimulus information contained in immune response gene trajectories, a method was developed to identify the ensemble of single-cell real-time transcriptomic trajectories that would account for the measured time-series scRNAseq data: single-cell Real-time Ensemble Archetype Linkage of Time-series Measurements (scREAL-TIME) (
Applying scREAL-TIME to data from stimulated M0 macrophages, continuous single cell trajectories were recovered from 0 to 8 hours after macrophage stimulation, matching the range of the measured data. Taking Tnf for example, the method imputed the linkage of disconnected data points, providing the full time-course trajectory that recapitulated the magnitude and heterogeneity of the scRNAseq values at measured timepoints (
The trajectories imputed via scREAL-TIME allowed cells to be clustered using information from the entire time-course of gene expression, rather than solely at a particular timepoint. The dynamics of gene expression of a single cytokine, Tnf, was able to distinguish three stimulus groups, IFNβ, bacterial PAMPs, and TNF/PIC (
With the temporal expression trajectories of multiple genes within single cells available, it was asked whether and which dynamical features of these trajectories were important in enabling the distinction of stimuli. Each cell expresses hundreds of immune response genes simultaneously, which may provide information in combination, through the coordinated timing and amplitude of their expression. To first visualize this multidimensional array of data, tensor components analysis was performed, an unsupervised dimensionality reduction approach analogous to PCA for two-dimensions. Tensor decomposition of population-level time-series RNAseq data (genes×stimuli×time) (
To quantify the dynamical patterns of expression trajectories, the presence of four dynamical features important in gene expression was considered: peak amplitude, max log fold change (max LFC), integral, and activation speed (
To examine how much the features of expression trajectories enable macrophage stimulus-specificity, the information content of each dynamical feature was assessed next. Information content in any particular feature was different for each genes, but it was found that “integral” and “peak amplitude” were on average the most stimulus-specific, while “max LFC” and “speed” were less informative (
It was noticed that for subsets of genes, the dynamic features provided much more stimulus information than a single timepoint, while other genes were associated with smaller gains in information. To investigate this further, the immune response genes were categorized based on which signaling pathways activate them. Interestingly, genes that were regulated by single transcription factors such as AP1, IRF, or NFκB showed an information gain of ˜1 bit, but genes regulated by an NFκBIIRF “sequential OR” gate showed gains of only ˜0.5 bits (
In transcriptome measurements of a cell population, genes regulated by the same signaling pathway are induced with highly correlated patterns that can be clustered into regulatory modules. In transcriptome measurements of single cells, target genes of different signaling pathways may also be correlated due to a similar chromatin or signaling environment within each cell. To determine whether the extent of gene-gene correlations across single cells was dynamics-dependent, correlations of gene-pairs that are targets of the same or different transcription factors were compared, for either single-timepoint scRNAseq data or single-cell expression dynamical features.
Two prominent NFκB regulated genes, Tnf and Nfkbia, showed weak to no correlation at any single response timepoint of 1 hr, 3 hrs, or 8 hrs (
Two cytokine genes regulated by different signaling pathways were then examined, NFκB-regulated Tnf and NFκBlIRF-regulated Cxcl10 (
How do dynamical features of gene expression change when macrophages are polarized by microenvironmental cytokines? scREAL-TIME was next applied to time-series scRNAseq data from stimulated M1 (IFNγ) and M2 (IL4) macrophages and it was confirmed that distributions of imputed trajectories in polarized macrophages were also appropriate fits to measured data distributions at each timepoint (
To quantify polarization-dependent alterations to the stimulus-specificity of gene expression trajectories, the stimulus information contained within the expression of three critical immune response cytokines was examined Tnf, Ccl5, and Cxcl10. At any single timepoint, stimulus-information did not exceed 1.2 bits for any polarization state (
To uncover whether gene-specific changes to stimulus-specificity were due to changes to the mean or variance of dynamical features, the single-cell distributions of dynamical features for Tnf vs. Cxcl10 were plotted. The Integral of Tnf expression had similar averages in M1 (IFNγ) and M0 macrophages, but single-cell variability was decreased (
To investigate the coordination of multiple genes in specifying stimulus-identity, the most informative gene combination when considering dynamical features rather than single timepoints was next identified. Indeed, for the dynamical feature “Integral” in M0 macrophages, max MI approached ˜2.5 bits with just three genes (Mx2, Nfkbiz, Egr3), near the theoretical maximum of 2.58 bits (distinguishing 6 stimuli). A max MI near the theoretical ceiling could also be achieved with three genes for M1 (IFNγ) (Mx2, Tnf, Gna15) and M2 (IL4) macrophages (Cited, Ehd1, Pou2f2), in contrast to gene combinations a single timepoint of 3 hours which reached at most 2 bits of information (
Which dynamical characteristics of these gene combinations provide for the stimulus-specificity? Hierarchical clustering of cell trajectories indicated that the genes selected for M0 macrophages contained complementary dynamical features in their stimulus-response trajectories. For example, the interferon-regulated antiviral gene Mx2 was activated in response to both LPS and IFNβ, but was induced with slower speed in response to LPS (
To further identify which genes most strongly differed in dynamical features across polarization states, and which stimulus-pair distinctions were affected by these changes, the Stimulus Response Specificity Profile for each dynamical feature was calculated. The Response Specificity Profile is an unsupervised characterization of the distance between pairs of stimulus-response distributions. By applying the Response Specificity Profile to dynamical features, differences in stimulus-response specificity (SRS) across macrophage states can be attributed to dynamical features that particularly affect the distinction between certain pairs of stimuli. It was found that M1 (IFNγ) macrophage responses lost SRS most prominently in the Max LFC feature across most stimulus pairs, for example especially between CpG and LPS. In contrast, M2 (IL4) macrophage responses lost SRS most prominently in the Speed dynamical feature, such as between P3C and IFNβ (
Using the gene loadings underlying the Response Specificity Profile and ΔRSI, the genes contributing most to loss of SRS in these two particular features of Max LFC and Speed were next identified. For the Max LFC dynamical feature, a group of IRF and NFκBlIRF genes were weighted most strongly, among them Ifit3 and Yid (
The context-dependent cell state of macrophages is typically assessed by protein markers or single cell transcriptomics in their quasi-steady state. Yet, macrophage functions are deployed stimulus-specifically, and how macrophages respond to stimuli is a function of the microenvironmental context. Therefore, in addition to containing accurate stimulus information, gene expression dynamics may also allow more accurate identification of a cell's functional state. Assessing stimulation-response dynamics integrates steady-state epigenetic information and probes the functional outcome of microenvironmental perturbations for each cell.
To examine the difference between steady-state measurements and response-dynamical features in distinguishing polarization states, the top three genes that individually best distinguished unstimulated M0, M1, and M2 macrophages (Irf1, Fgl2, Tgtp1) were chosen and have their expression values plotted. It was found that steady-state expression values poorly distinguished M0 and M2 macrophages, while the LPS response integral of the same three genes perfectly distinguished the three polarization states (
The top five genes identified by max MI were used to build statistical learning classifiers that predicted cell polarization states, based on 1) steady-state (0 hours) expression values, 2) single timepoint response expression values, or 3) response dynamical feature values (
The top 5 genes used to predict macrophage polarization state using the Integral feature are:
To examine the number of genes needed in each scenario that minimizes the prediction error, a multinomial LASSO (Least Absolute Shrinkage and Selection Operator)-penalized regression model was built on all measured genes. Importantly, LASSO regularization shrinks the coefficients of some variable to zero, thus allowing elimination of many predictors from the model to avoid overfitting the data. It was found that performing LASSO-regularized regression on steady-state expression values resulted in a model containing ˜150 genes, while models built on single time-point response values contained ˜50-120 genes, depending on the timepoint and the stimulus used (
It was assessed whether dynamic features provided more information about polarization state even for canonical marker genes of M1 (IFNγ) or M2 (IL4) polarization. For example, the well-known M1 macrophage marker Nos2 had on average higher expression at early timepoints in M1 macrophages, but single cell expression distributions of Nos2 overlapped across polarization states such that not all single cells could be identified as M1 based on this marker alone (
The present example showed that temporal trajectories of gene expression within single cells contain rich information about external stimuli and the functional state of the cell. Single-cell expression trajectories of just a handful of genes provided enough information for near perfect stimulus-response specificity, or for high accuracy identification of cell functional states. Stimulus-response gene expression is dynamic and heterogeneous, but current single-cell RNA measurement technology has not been able to capture both these characteristics simultaneously, as scRNAseq is destructive to cells. The newly developed single-cell trajectory imputation method scREAL-TIME revealed the heterogeneity of dynamic responses, thus quantifying macrophage stimulus-response specificity more realistically than could be done from previous time-point scRNAseq. When considering dynamical information, stimulus-response specificity was much higher than previously estimated, and expression trajectories were found to be both dramatically and subtly altered by the microenvironmental context of polarizing cytokines. Indeed, it was found that the stimulus-response trajectories of macrophages responding from different cytokine contexts were so distinctive that response dynamics were much more informative of the functional cell state than steady-state mRNA abundance measurements.
In developing scREAL-TIME, it was aimed to reduce the effect of technical noise (e.g., drop-outs in scRNAseq) and provide the least complex ensemble of trajectories that accounted for the heterogeneous datasets of all measured time points. The method first relied on PCA to compress the information from 500 genes per cell into a cell score, with gene-gene correlations maintained in the PCA loadings. To reduce the impact of technical noise and to decrease computational time, cells at each timepoint were binned into cell archetypes. While specifying too few archetypes may miss outlier behavior of rare cells, it was determined that 20 clusters per timepoint appropriately approximated the distributions of the data, as 205 (5 timepoints) equated 3.2 million possible paths, far exceeding the number of cells measured. Thus, the idea of cell archetypes reduced the impact of technical noise at each timepoint, while still maintaining sufficient coverage of possible across-timepoint biological variability. Using weighted random walks to link cell archetypes over timepoints resulted in trajectories that were both data-driven and biophysically plausible, as cells transcriptomically more similar across timepoints were linked with greater probability and no trajectories were imputed with rapidly oscillatory characteristics that contradicted knowledge of known mRNA half-lives. This probabilistic approach to linking data across timepoints also ensured that all permutations of paths were possible, but some were far more unlikely given the data. Identifying the trajectory paths for the entire cell in the latent space and then recovering individual gene trajectories by multiplying the PCA loadings matrix also allowed cellular gene-gene correlations to be preserved, which would not be the case if trajectory paths were identified for each gene independently.
scREAL-TIME was developed to address a gap left by prior computational and experimental methods that have aimed to analyze single-cell response dynamics Single-cell real-time trajectories may be imputed by fitting mechanistic models to measured time-series single-cell data distributions. This approach has been successfully applied to data of signaling proteins using models of signaling networks, but models of gene expression remain coarse and insufficiently predictive. Further, the gene expression data consists of hundreds of data points (500 genes) per cell, i.e., 500 single-cell distributions per time-point, posing additional challenges in fitting mechanistic models. Experimentally, the primary methodology for measuring single-cell gene expression trajectories is the MS2 reporter which allows live cell imaging of fluorescently tagged mRNA transcripts. However, this technology only images one or two genes at a time, precluding analysis of correlations or biological importance of genes expressed in combination. In addition, it requires genetic engineering of the genes of interest, which requires immortalized cells (as opposed to primary macrophages) and may affect expression dynamics. While the MS2 reporter provides higher time-resolution (5 minute intervals) than scREAL-TIME (15 minute intervals at the smallest, with even smaller being likely cost-prohibitive), it was aimed to make more frequent measurements at early phases of the immune response when immune response transcriptomes are known to be changing rapidly. In applying scREAL-TIME to other datasets, interpolating over larger measurement intervals of multiple hours should assume that single-cell transcriptomes are stable on the order of minutes during that time frame.
Applying scREAL-TIME to macrophages stimulated with diverse immune stimuli revealed new information, as the expression dynamics of immune-response genes could be analyzed at the single cell level. Each gene differed in their dynamic patterns of induction as well as their level of single-cell heterogeneity, which could be quantified by decomposing the trajectories into dynamical features such as integral, fold change, peak amplitude, and speed, an approach that has proved informative in cell signaling studies. The stimulus-response specificity of single genes or combinations of genes was determined. For most genes, the trajectory Integral was found to correlate best with the stimulus. Unlike features like speed or max fold change, Integral combines information from the entire time-course and may thus be most robust to both technical noise and biological heterogeneity. As cells must adapt to changes in their environment, stimulus-response specificity is an important biological characteristic, particularly in immune sentinel cells such as macrophages. Several studies have investigated the stimulus-information transmitted in biochemical signaling networks through live cell imaging of kinase activities or transcription factor nuclear translocation, including NFκB in macrophages. The trajectory vector of NFκB signaling was shown to distinguish different doses of ligands much more accurately than single timepoints of NFκB activity, and dynamical features NFκB trajectories were identified to best distinguish different ligands, analogous to what was found with gene expression. However, estimations of the stimulus-information within the dynamics of a single transcription factor like NFκB was consistently lower than that calculated for the dynamics of the most informative gene, as many of the gene trajectories examined are the outcome of multiple non-redundant signaling pathways that operate on genes with different timescales, such as Tnf (NFκB&p38) or Cxcl10 (NFκB|IRF). In addition, the expression dynamics obtained via scREAL-TIME consist of hundreds of trajectories per cell, enabling identification of complementary stimulus-specific gene dynamics, while measuring the trajectories of more than a few signaling proteins simultaneously in a single cell has not been feasible.
The hallmarks of healthy macrophage function have been described as stimulus-response specificity, context-dependence, and memory of prior exposures. These properties either define or reveal the functional state of macrophages: defined by context and history, and revealed by measuring stimulus-responses. In macrophages, stimulus-response gene expression dynamics are determined by 1) the activities of upstream signaling effectors (e.g., transcription factors NFκB, IRF, AP-1), whose abundances and nuclear translocation dynamics are strongly affected by contextual cytokines, and 2) the mechanisms and molecules available to interpret information within transcription factor activation, including chromatin opening mechanisms, nucleosome dynamics, and histone modifications, which can also be significantly altered by contextual cytokines that either prime or repress gene regulatory elements. For example, an IFNγ polarization context not only increases the nuclear availability of NFκB but also alters the epigenomic landscape of open chromatin and histone modifications. Thus, the cell's response to stimuli not only encodes information about the stimulus, but also about the cell's functional state.
Why might the functional state of a cell be predicted more accurately when accounting for dynamical features of expression rather than steady-state gene expression? Molecularly, response dynamics are more informative than steady-state omics profiles (chromatin, RNA, protein, etc.) because the functional state of a cell is not only defined by abundances of molecules but also by the rate constants that determine synthesis, degradation, association, dissociation and catalysis. For example, two cells may have the exact same amount of a ligand receptor at a particular time, but if one cell has higher synthesis or degradation of the receptor than the other, the two would respond differently to the same ligand concentration. Measuring responses integrates both the kinetic and abundance pieces of a cell's molecular profile at the time of stimulation, including availability of receptors, activities of kinases, and dynamics and localization of transcription factors. Capturing responses at the level of gene expression dynamics also captures the cell's chromatin state. Thus, the method of obtaining single cell inducible gene expression trajectories categorizes cell states in a manner unobtainable from any current standard steady-state cell profiling methods and allows a more precise phenotyping of the functional state than single timepoint response data.
Obtaining kinetic information for phenotyping the functional states of other cell types poses several challenges. Producing temporal trajectories is resource intensive as it requires multiple timepoints following a perturbation. In addition, macrophages respond to hundreds of possible stimuli, and stimuli that distinctly activate different combinations of signaling pathways were chosen herein, but the choice of stimulus may be more limited for other cell types. In theory, stimuli that perturb specific regulatory networks, rather than broad-acting drugs that do not act gene- or pathway-specifically, may provide more refined definitions of functional states through greater diversity in the dynamical features of each gene. Ideally then, stimuli should target physiological or pathological pathways crucial to the function of the cell being tested. For example, cancer cells can be treated with small molecule drugs inhibiting cell cycle or cytokines mimicking immunotherapy, Androgen Receptor+ prostate cancer with androgens, or adipocytes with fatty acids/sterols that activate lipid receptors/LXRs (liver X receptors). Other surrogate measurements for obtaining single cell kinetic information have been proposed, such as spliced vs. unspliced RNA, mRNA vs. protein, or mRNA vs. ATACseq data. However, the technical noise of many of these measurements is substantial and may thus deteriorate the additional information gained through these modalities.
The present study was focused on the dynamics of single-cell inducible gene expression in macrophages, key to the initiation of immune responses. Response trajectories were central to providing the high stimulus-response specificity that is essential to macrophage function, and supplied the kinetic information necessary for defining cell functional states. Future applications of defining cell states by response trajectories could allow more robust clustering and subtyping for macrophages derived from undefined cytokine environments of inflammatory diseases, or enable the identification of rare subclusters of single cells in other diseases where the function of outlier cells is pivotal, such as the responses of individual cancer cells to drugs or cell damage signals.
A data bank comprising scREAL-TIME data collected from various patient populations is created, including healthy patients, patients over the time course of their diseases, and patients who have undergone various types of therapy for each disease. It is envisioned that blood draws for routine or patient monitoring will undergo scREAL-TIME testing and an anonymous data bank with such data and patient histories will be amassed. Such scREAL-TIME bank provides the basis for extracting diagnostic and prognostic information and guiding therapeutic intervention, both for therapies that may be effective and those likely to be ineffective.
In one particular use, the data bank is populated from data on transplant recipients, such as kidney, liver or lung transplant recipients, their organ rejection status over time and following treatment with antirejection drugs. A query of the data bank with the new patient's scREAL-TIME assessment provides a recommended therapeutic regimen.
In other examples, uses include monitoring tumor progression, response to anti-inflammatory therapy in autoimmune disease, and progression of autoimmune diseases. Non-limiting examples of autoimmune diseases include but are not limited to rheumatoid arthritis, juvenile dermatomyositis, psoriasis, psoriatic arthritis, sarcoidosis, lupus, Crohn's disease, eczema, vasculitis, ulcerative colitis and multiple sclerosis.
In another example, the effectiveness of tumor immunotherapy can be monitored.
In another example, the health status of a patient with a persistent infection such as methicillin-resistant Staphylococcus aureus (MRSA) can be identified, and adjustment of the therapeutic course made based on scREAL-TIME assessment and comparison with similar patients in the data bank.
While certain features of the invention have been illustrated and described herein, many modifications, substitutions, changes, and equivalents will now occur to those of ordinary skill in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the invention.
This application claims priority to U.S. Provisional Patent Application Ser. No. 63/332,584, filed Apr. 19, 2022, which is incorporated herein by reference in its entirety.
This invention was made with government support under Grant Number AI127864, awarded by the National Institutes of Health. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
63332584 | Apr 2022 | US |