The contents of the electronic sequence listing (HDST-HUJI-P-016-US.xml; Size: 50,283 bytes; and Date of Creation: Nov. 7, 2023) is herein incorporated by reference in its entirety.
The present invention is in the field of graft versus host disease diagnostics.
Hematopoietic stem cell transplantation (HCT) is an essential and often the sole curative treatment strategy for high-risk hematologic malignancies. Graft versus host disease (GVHD), the foremost complication of allogeneic HCT, is a major limitation of this procedure, accounting for deleterious effects on quality of life and increased mortality from HCT. Currently, diagnosis of acute (aGVHD) and chronic GVHD (cGVHD) in bone marrow transplant patients is based on inaccurate, operator-dependent clinical markers, and less often on biopsies. These methods are time consuming, costly, invasive and yield late-stage diagnoses that negatively affect morbidity and mortality. In addition, current practice lacks accurate biomarkers for prediction of disease occurrence, identification of disease onset, prediction of disease response to treatment and accurate assessment of the actual response to treatment. Multiple prognostic and diagnostic biomarkers for cGVHD have been proposed, including IL2Rα, aminopeptidase N (CD13), IL4, IL6, TNFα, ST2, OPN, chemokine ligands such as CXCL9, CXCL10, and CXCL11, cellular biomarkers including immune cells subpopulations, and miRNA. However, none of these biomarkers have been clinically validated. In addition, all these markers are indicative of immune system derangement, lacking information on the damaged tissue targeted by the allo-immune process. Thus, there is an unmet need for simple objective tools that can aid the treating physician in easier identification and scoring and assist personalization of management in patients suffering from GVHD.
Classic liquid biopsies analyze circulating cell-free DNA (cfDNA) via genetic variations or mutations in the DNA of a fetus, a tumor or a transplanted solid organ. However, these approaches are blind to DNA released from cells with a normal genome, as would occur in organs damaged by pathologies such as GVHD. It has been shown that tissue-specific DNA methylation patterns can provide powerful, universal biomarkers for detecting the tissue origins of cfDNA, reflective of elevated turnover or damage in specific organs and regardless of the underlying pathology. For example, it has been shown that genomic loci specifically unmethylated in lung epithelial cells or in hepatocytes can serve as cfDNA biomarkers detect specific lung or liver injury.
There is a great need for accurate biomarkers for prediction of GVHD occurrence, identification of disease onset, prediction and assessment of disease severity and response to treatment.
The present invention provides methods of diagnosing graft versus host disease (GVHD) comprising receiving a measurement of cell-free DNA (cfDNA) quantity in a sample, wherein a cfDNA quantity above a predetermined threshold indicates the subject suffers from GVHD. Methods of diagnosing GVHD comprising applying a machine learning model to at least two parameters from a sample are also provided.
According to a first aspect, there is provided a method of diagnosing graft versus host disease (GVHD) in a subject in need thereof, the method comprising receiving a measurement of cell-free DNA (cfDNA) quantity in a fluid sample from the subject, wherein a cfDNA quantity above a predetermined threshold indicates the subject has GVHD, thereby diagnosing GVHD.
According to some embodiments, the subject underwent a hematopoietic stem cell transplant (HCT) from a donor.
According to some embodiments, the HCT occurred at least 100 days before the sample was taken.
According to some embodiments, the GVHD is chronic GVHD.
According to some embodiments, the method further comprises extracting the fluid sample from the subject, isolating cfDNA from the extracted sample and quantifying the isolated cfDNA.
According to some embodiments, the fluid is selected from peripheral blood, plasma and serum.
According to some embodiments, the cfDNA quantity is the total quantity of cfDNA in the fluid sample.
According to some embodiments, the cfDNA comprises cfDNA from the subject and cfDNA from the donor.
According to some embodiments, the method does not comprise isolating cfDNA only from the subject.
According to some embodiments, the predetermined threshold is the cfDNA quantity in a fluid sample from a healthy control subject or from a population of healthy control subjects.
According to some embodiments, the predetermined threshold is the cfDNA quantity in a fluid sample from a control subject that underwent an HCT and does not have GVHD.
According to some embodiments, the cfDNA is cfDNA originating from a specific tissue or cell type.
According to some embodiments, the specific tissue or cell type is selected from skin, lung, liver, intestine, B cells, T cells, CD8 positive T cells, eosinophils, monocytes, neutrophils and T regulatory cells (Tregs).
According to some embodiments, the specific tissue or cell type is selected from liver, B cells, T cells, CD8 positive T cells, eosinophils, monocytes, neutrophils and Tregs.
According to some embodiments, the cell type is an immune cell derived from a hematopoietic stem cell.
T According to some embodiments, the cfDNA is cfDNA originating from neutrophils, monocytes, eosinophils, T cell, CD8 cells, B cells, skin and liver.
According to some embodiments, the origin of the cfDNA was determined by
According to some embodiments, the measurement of cfDNA was generated by performing methylation sensitive sequencing on the cfDNA to produce the nucleotide sequence of the cfDNA including the methylation status of each cytosine in the nucleotide sequence and assigning sequencing reads as originating from the specific tissue or cell type based on the reads' nucleotide sequence and methylation status.
According to another aspect, there is provided a method of classifying a sample as originating from a subject suffering from GVHD or a subject not suffering from GVHD, the method comprising:
According to another aspect, there is provided a method of classifying a sample as originating from a subject suffering from GVHD or a subject not suffering from GVHD, the method comprising:
According to some embodiments, the method is a method of diagnosing GVHD in a subject in need thereof, and the sample is from the subject.
According to some embodiments, the ML model was trained on a training set comprising the at least two parameters in subjects suffering and not suffering from GVHD and the ML model outputs a diagnosis of having GVHD or not having GVHD.
According to some embodiments, the liver enzymes are selected from aspartate transaminase (AST), alanine transaminase (ALT), and alkaline phosphatase (ALP). According to some embodiments, the liver enzymes are selected from aspartate transaminase (AST), alanine transaminase (ALT), alkaline phosphatase (ALP) and gamma-glutamyl transpeptidase (GGTp).
According to some embodiments, the at least two parameters are cfDNA from two different tissues, cell types or both.
According to some embodiments, the at least two parameters comprise liver enzyme levels, WBC count and hemoglobin levels.
According to some embodiments, the at least two parameters comprise cfDNA quantity from monocytes, cfDNA quantity from eosinophils, cfDNA quantity from T cells, cfDNA quantity from CD8 cells, cfDNA quantity from B cells, cfDNA quantity from skin and cfDNA quantity from liver.
According to some embodiments, the cfDNA parameter is selected from: total cfDNA quantity, cfDNA quantity from neutrophils, cfDNA quantity from monocytes, cfDNA quantity from eosinophils, cfDNA quantity from Tregs, cfDNA quantity from T cells, cfDNA quantity from CD8 cells, cfDNA quantity from B cells, cfDNA quantity from skin, cfDNA quantity from intestine, cfDNA quantity from lung and cfDNA quantity from liver.
According to some embodiments, the at least two parameters further comprise liver enzyme levels.
According to some embodiments, the liver enzyme level comprises AST levels and ALT levels.
According to some embodiments, the at least two parameters comprise ALT level and total cfDNA quantity.
According to some embodiments, the at least two parameters further comprise a third parameter selected from: cfDNA quantity from monocytes, cfDNA quantity from skin, GGTp level, cfDNA quantity from neutrophils and cfDNA quantity from eosinophils.
According to some embodiments, the at least two parameters comprise ALT level, total cfDNA quantity and cfDNA quantity from monocytes.
According to some embodiments, the method further comprises treating a subject diagnosed with GVHD with an anti-GVHD therapeutic agent.
According to some embodiments, the method further comprises tapering immunosuppression treatment to a subject that is not diagnosed with GVHD.
According to another aspect, there is provided a method comprising:
According to another aspect, there is provided a method comprising:
According to some embodiments, the fluid sample is a plurality of fluid samples comprising fluid samples from both subjects that suffer from GVHD and subjects that do not suffer from GVHD.
According to some embodiments, the cfDNA quantities comprise total cfDNA quantities in the samples and the at least one clinical parameter comprises ALT level.
According to some embodiments, the cfDNA quantities further comprise cfDNA quantities from monocytes.
Further embodiments and the full scope of applicability of the present invention will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
The present invention, in some embodiments, provides methods of diagnosing graft versus host disease (GVHD) comprising receiving a measurement of cell-free DNA (cfDNA) quantity in a sample, wherein a cfDNA quantity above a predetermined threshold indicates the subject suffers from GVHD. Methods of diagnosing GVHD comprising applying a machine learning model to at least two parameters from a sample are also provided.
The invention is based on the surprising finding that total cfDNA levels and cfDNA levels from specific organ/cell types can diagnose and prognose cGVHD. cfDNA has been shown to be elevate in many diseases, such as myocardial infection, stroke, sepsis and autoimmune diseases (e.g., Sjögren's syndrome and SLE). Further, there is a known correlation between cfDNA levels and severity of tissue damage. However, past studies (see Duque-Alfonso et al., “Cell-free DNA characteristics and chimerism analysis in patients after allogeneic cell transplantation”, Clin. Biochem., 2018, February:52:137-141) that looked at cfDNA levels in HCT recipients who developed GVHD and those that didn't did not find predictive value in total cfDNA levels. Indeed, in Dugue-Alfonso it was only reported that if cfDNA from donor cells was removed there was a correlation between recipient only cfDNA and development of GVHD. Such a measure does not have broad clinical value as it would require a diagnostic test specifically designed for every pair of donors and recipients in order to distinguish their cfDNA. In contrast, the instant invention discloses that total cfDNA, cfDNA from specific tissues and several combined cfDNA and patient data models can accurately predict chronic GVHD without the need to resort to patient tailored analyses. In particular, a machine learning model trained on total cfDNA quantities, and alanine transaminase (ALT) levels or total cfDNA levels, ALT levels and cfDNA quantities from monocytes was able to very accurately predict GVHD presence even before clinical manifestation.
By a first aspect, there is provided a method of diagnosing graft versus host disease (GVHD) in a subject, the method comprising receiving a measurement of cell-free DNA (cfDNA) in a sample from the subject, wherein cfDNA above a predetermined threshold indicates the subject has GVHD, thereby diagnosing GVHD.
By another aspect, there is provided a method of classifying a sample as originating from a subject suffering from GVHD or a subject not suffering from GVHD, the method comprising:
By another aspect, there is provided a method of diagnosing GVHD in a subject, the method comprising:
By another aspect, there is provided a method comprising:
By another aspect, there is provided a method comprising: training a machine learning (ML) model to predict the presence of GVHD, o a training set, the method comprising receiving cfDNA quantities from at least two tissues or cell types measured in a fluid sample.
By another aspect, there is provided a method comprising:
By another aspect, there is provided a method comprising:
Graft-versus-host-disease (GVHD) is a life-threatening, complication that can arise following allogeneic hematopoietic cell transplantation. GVHD is the leading cause of post-transplantation morbidity and non-relapse mortality in hematopoietic stem cell transplants (HSCTs) and poses the greatest threat to transplantation success. GVHD can be classified as acute or chronic. Acute GVHD refers to GVHD (maculopapular rash, nausea, vomiting, anorexia, profuse diarrhea, ileus, or cholestatic hepatitis) occurring within 100 days after transplantation or Donor Lymphocyte Infusion (DLI). Chronic GVHD is distinctive from acute GVHD and is not simply an evolution of acute GVHD. Chronic GVHD is a syndrome of variable clinical features resembling autoimmune and other immunologic disorders such as scleroderma. The pathophysiology of the chronic GVHD syndrome may involve inflammation, cell-mediated immunity, humoral immunity, and fibrosis. Clinical manifestations nearly always present during the first year after transplantation, but some cases develop many years after HCT. Manifestations of chronic GVHD may be restricted to a single organ or site or may be widespread, with profound impact on quality of life. In some embodiments, GVHD is chronic GVHD. In some embodiments, GVHD is acute GVHD. In some embodiments, GVHD is chronic and/or acute GVHD.
In some embodiments, the method is an in vitro method. In some embodiments, the method is an ex vivo method. In some embodiments, the method is a diagnostic method. In some embodiments, the method is a method of detecting GVHD. In some embodiments, the method is a method of predicting GVHD. In some embodiments, the method is a method of predicting the presence of GVHD. In some embodiments, the method is a method assessing the risk. In some embodiments, the risk is risk of developing GVHD. In some embodiments, risk is risk of having GVHD.
In some embodiments, the subject is a mammal. In some embodiments, the subject is a human. In some embodiments, the subject is in need of a method of the invention. In some embodiments, the subject is in need of treatment. In some embodiments, the subject has received a hematopoietic stem cell transplant (HCT). In some embodiments, the subject underwent an HCT. In some embodiments, the subject has undergone an HCT. In some embodiments, the HCT is allogeneic stem cell transplant. In some embodiments, the HCT is reduced-intensity allogeneic stem cell transplant. In some embodiments, the HCT is from a donor. In some embodiments, the donor is the person that provided the stem cells for transplant to the subject. In some embodiments, the subject underwent the HCT within the last 100 days. In some embodiments, the subject underwent the HCT more than 100 days ago. In some embodiments, the subject underwent the HCT more than 100 days before the sample was taken. In some embodiments, the subject has not been diagnosed with GVHD. In some embodiments, the subject has not been diagnosed with acute GVHD. In some embodiments, the subject has not been diagnosed with chronic GVHD. In some embodiments, the subject has been diagnosed with acute GVHD. In some embodiments, the subject is at risk of developing GVHD.
In some embodiments, the sample is a fluid sample. In some embodiments, the fluid is a bodily fluid. In some embodiments, the bodily fluid is selected from at least one of: blood, serum, plasma, gastric fluid, intestinal fluid, saliva, bile, tumor fluid, breast milk, urine, interstitial fluid, cerebral spinal fluid and stool. In some embodiments, the bodily fluid is selected from blood, serum and plasma. In some embodiments, the bodily fluid is blood. In some embodiments, the blood is peripheral blood. In some embodiments, the sample is a sample that comprises cfDNA. In some embodiments, the sample is depleted of cells. In some embodiments, the sample obtained from the subject comprises cells and the cells are removed. In some embodiments, cfDNA is isolated from the sample. In some embodiments, the sample is a sample of isolated cfDNA. In some embodiments, the sample is isolated cfDNA from a sample obtained from the subject. In some embodiments, the sample is an isolated sample. In some embodiments, the sample is an enriched sample. In some embodiments, enriched is enriched for cfDNA. In some embodiments, the sample is a purified sample.
In some embodiments, the sample is from the subject. In some embodiments, the sample is from a subject. In some embodiments, the method is a method of diagnosing a subject and the sample is from the subject. In some embodiments, classifying a sample comprises diagnosing the subject. In some embodiments, classifying a sample as originating from a subject suffering from GVHD comprises diagnosing the subject as having GVHD. In some embodiments, classifying a sample as originating from a subject that does not suffer from GVHD comprises diagnosing the subject as not having GVHD.
In some embodiments, the method further comprises obtaining the sample from the subject. In some embodiments, the method further comprises extracting the sample from the subject. In some embodiments, the method further comprises receiving the sample. In some embodiments, the method further comprises depleting the sample of cells. In some embodiments, cells are intact cells. In some embodiments, cells are cellular debris. In some embodiments, the method further comprises isolating the cfDNA. In some embodiments, the method further comprises enriching for cfDNA. In some embodiments, the method further comprises purifying the cfDNA. In some embodiments, the method further comprises measuring the cfDNA. In some embodiments, the isolating/enriching/purifying is isolating/enriching/purifying cfDNA from the donor. In some embodiments, the isolating/enriching/purifying is not isolating/enriching/purifying cfDNA from the subject. In some embodiments, the method does not comprise isolating or purifying cfDNA only from the subject. In some embodiments, the method does not comprise measuring cfDNA only from the subject.
As used herein the terms “separating”, “excluding” or “isolating” is intended to mean that the material has been completely, substantially or partially separated, isolated, excluded or purified from other components, e.g., cells, or cell fragments including but not limited to membranes, proteins or nucleic acid molecules. As used herein, the term “isolated cfDNA” refers to cfDNA that is essentially free from contaminating cellular components, such as carbohydrate, lipid, proteins or other nucleic acid molecules such as genomic DNA or mRNA. In some embodiments, an isolated cfDNA is a purified cfDNA. In some embodiments, an isolated and/or purified cfDNA is at least 80% pure, at least 85% pure, at least 90% pure, at least 95% pure, at least 97% pure, at least 99% pure or 100% pure. Each possibility represents a separate embodiment of the invention. In some embodiments, an isolated and/or purified cfDNA is at least 80% pure. In some embodiments, an isolated and/or purified cfDNA is at least 90% pure.
As used herein, the term “cfDNA” refers to non-encapsulated DNA that is found in an organism outside of a cell. cfDNA is generally degraded DNA fragments of a size of about 50-220 nucleotides that is released for dying cells. In some embodiments, cfDNA is a small DNA molecule. In some embodiments, cfDNA is from apoptotic cells. In some embodiments, cfDNA is from necrotic cells. In some embodiments, cfDNA is cfDNA from cells killed by the immune system. In some embodiments, cfDNA is from cells of the immune system. In some embodiments, cfDNA is not cell-free fetal DNA (cfDNA). In some embodiments, cfDNA is not circulating tumor DNA (ctDNA). In some embodiments, cfDNA is not cell free mitochondrial DNA. In some embodiments, cfDNA is double stranded DNA. In some embodiments, cfDNA is single stranded DNA. In some embodiments, cfDNA is cell free nucleosomes. In some embodiments, cfDNA comprises DNA associated proteins. In some embodiments, the DNA associated proteins are histones. In some embodiments, cfDNA is methylated DNA. In some embodiments, cfDNA is unmethylated DNA. In some embodiments, cfDNA is cfDNA from the subject. In some embodiments, cfDNA is not cfDNA only from the subject. In some embodiments, cfDNA is cfDNA from the donor. In some embodiments, cfDNA is cfDNA from the subject and the donor. In some embodiments, cfDNA is cfDNA from a particular tissue or cell type.
In some embodiments, the sample is enriched for small DNA molecules. In some embodiments, small is smaller than 1000, 950, 900, 850, 800, 750, 700, 650, 600, 550, 500, 450, 400, 350, 300, 290, 280, 275, 270, 260, 250, 240, 230, 225, 220, 215, 210, 205, 200, 195, 190, 185, 180, 175, 170, 169, 168, 167, 166, 165, 160, 155 or 150 nucleotides. Each possibility represents a separate embodiment of the invention. In some embodiments, small is less than 500 nucleotides. In some embodiments, small is less than 220 nucleotides. In some embodiments, small is less than 200 nucleotides. In some embodiments, small is less than 169 nucleotides. In some embodiments, small is bigger than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 17, 20, 25, 30, 35, 40, 45, or 50 nucleotides. Each possibility represents a separate embodiment of the invention.
In some embodiments, the method comprises a size selection step. In some embodiments, the sample is size selected. In some embodiments, size selection is selection for small DNAs. In some embodiments, the size selection is SPRI bead size selection. In some embodiments, SPRI selection is SPRI bead size exclusion. SPRI beads are well known in the art and can be used to isolate DNA. By altering the concentration of SPRI beads one can alter the size of DNA that tends to bind. Increased numbers of beads lead to binding of smaller DNAs and fewer beads lead to binding of larger DNAs. In some embodiments, the concentration of SPRI beads is increased. In some embodiments, increased is as compared to a standard protocol. In some embodiments, the ratio of bead to sample is increased. In some embodiments, the ratio of bead to sample is at least 1:1, 1.1:1, 1.2:1, 1.25:1, 1.3:1, 1.4:1, 1.5:1, 1.6:1, 1.7:1, 1.75:1, 1.8:1, 1.9:1 or 2:1. Each possibility represents a separate embodiment of the invention. In some embodiments, the ratio of bead to sample is at least 1.8:1. In some embodiments, the ratio of bead to sample is about 1.8:1. In some embodiments, the ratio of bead to sample is at most 1.8:1, 1.9:1, 2:1, 2.1:1, 2.2:1, 2.25:1, 2.3:1, 2.4:1, 2.5:1, 2.6:1, 2.7:1, 2.75:1, 2.8:1, 2.9:1, 3:1, 3.5:1, 4:1, 4.5:1, 5:1. Each possibility represents a separate embodiment of the invention.
In some embodiments, measuring cfDNA comprises measuring cfDNA quantity. In some embodiments, measuring cfDNA comprises determining the source of the cfDNA. In some embodiments, measuring cfDNA does not comprises determining the source of the cfDNA. In some embodiments, measuring cfDNA does not comprises determining if the cfDNA is from cells from the donor or the subject. In some embodiments, measuring cfDNA comprises identifying the source of the cfDNA. In some embodiments, the source of the cfDNA is the tissue, cell type or cell of origin of the cfDNA. In some embodiments, the source of the cfDNA is the tissue or cell type of origin of the cfDNA. In some embodiments, the source of the cfDNA is the cell that died and released the cfDNA. In some embodiments, measuring cfDNA quantity comprises measuring the quantity of cfDNA from a particular source. In some embodiments, measuring cfDNA quantity comprises measuring the quantity of cfDNA from a plurality of sources. In some embodiments, measuring cfDNA quantity comprises measuring the quantity of cfDNA from the subject. In some embodiments, measuring cfDNA quantity comprises measuring the quantity of cfDNA from the donor. In some embodiments, measuring cfDNA quantity comprises measuring the quantity of cfDNA from the donor and the subject. In some embodiments, cfDNA quantity is the total quantity of cfDNA in the sample.
In some embodiments, identification of DNA methylation or histone modification at an informative genetic locus indicates the tissue of origin of the DNA. In some embodiments, identification of DNA methylation or histone modification at an informative genetic locus indicates the cell type of origin of the DNA. In some embodiments, identification of DNA methylation or histone modification at an informative genetic locus indicates the DNA originated from a particular tissue. In some embodiments, identification of DNA methylation or histone modification at an informative genetic locus indicates the DNA originated from a particular cell type. In some embodiments, identification of DNA unmethylation or lack of a modified histone at an informative genetic locus indicates the tissue of origin of the DNA. In some embodiments, identification of DNA unmethylation or lack of a modified histone at an informative genetic locus indicates the cell type of origin of the DNA. In some embodiments, identification of DNA unmethylation or lack of a modified histone at an informative genetic locus indicates the DNA originated from a particular tissue. In some embodiments, identification of DNA unmethylation or lack of a modified histone at an informative genetic locus indicates the DNA originated from specific cell type.
In some embodiments, the origin of the cfDNA was determined by analyzing methylation of CpGs at an informative locus within the cfDNA. In some embodiments, methylation is DNA methylation. In some embodiments, CpGs are CpG dinucleotides. In some embodiments, the genetic locus is selected from a genetic locus provided in any one of International Patent Applications WO2019012542, WO2019012543, WO2019175876 and WO2020212992. In some embodiments, the informative locus is uniquely methylated within the specific tissue or cell type. In some embodiments, the informative locus is uniquely unmethylated within the specific tissue or cell type. In some embodiments, unique is within the specific tissue or cell type and not in other tissues or cell types. In some embodiments, unique methylation/unmethylation allows identification of the origin of the cfDNA. In some embodiments, the identification is unique identification. In some embodiments, the identification is based on the methylation status of the informative locus. In some embodiments, methylation status is methylation or unmethylation. In some embodiments, the genetic locus is selected from a genetic locus provided herein. In some embodiments, the genetic locus is selected from a genetic locus identified by the primers provided in Table 1.
In some embodiments, the measurement of cfDNA comprises measurement of cfDNA methylation. In some embodiments, the measurement of cfDNA methylation comprises performing methylation sensitive sequencing. In some embodiments, the sequencing is next generation sequencing. In some embodiments, the sequencing produces a nucleotide sequence of the cfDNA. In some embodiments, the nucleotide sequence includes methylation status of a cytosine in the sequence. In some embodiments, the nucleotide sequence includes the methylation status of each cytosine in the sequence. In some embodiments, the nucleotide sequence includes the methylation status of all cytosines in the sequence. In some embodiments, the measurement comprises assigning sequencing reads to a specific tissue or cell type of origin. In some embodiments, the measurement comprises quantifying the sequencing reads that originate from a specific tissue or cell type. In some embodiments, the measurement is the quantity of reads. In some embodiments, the measurement is the relative number of reads. In some embodiments, the measurement is the genome equivalent number of reads. In some embodiments, the assigning is based on the combination of the nucleotide sequence and methylation status.
In some embodiments, the measurement of cfDNA comprises measurement of cfDNA associated proteins. In some embodiments, the measurement of cfDNA associated proteins comprises performing chromatin immunoprecipitation sequencing (ChIP-Seq). In some embodiments, the sequencing is next generation sequencing. In some embodiments, the sequencing produces a nucleotide sequence of the cfDNA associated with specific proteins. In some embodiments, the specific proteins are DNA binding proteins. In some embodiments, the specific proteins are selected from histones, histone variants, post-translationally modified histones, high mobility group (HMG) proteins and transcription factors. In some embodiments, the specific proteins are histones. In some embodiments, the histone is a histone variant. In some embodiments, the histone is a histone comprising a specific post-translational modification. According to some embodiments, the DNA-associated protein is selected from a histone, a high-mobility group (HMG) protein and a member of the transcriptional machinery. According to some embodiments, the histone is a histone variant and/or a modified histone. Examples of modified histones include, but are not limited to, Histone 3 monomethylated lysine 4 (H3K4me1), Histone 3 demethylated lysine 4 (H3K4me2), Histone 3 trimethylated lysine 36 (H3K36me3) and Histone 3 trimethylated lysine 4 (H3K4me3). In some embodiments, an antibody against the cfDNA associated protein is added to the sample and cfDNA associated with that protein is isolated. In some embodiments, the measurement comprises assigning sequencing reads to a specific tissue or cell type of origin. In some embodiments, the measurement comprises quantifying the sequencing reads that originate from a specific tissue or cell type. In some embodiments, the measurement is the quantity of reads. In some embodiments, the measurement is the relative number of reads. In some embodiments, the measurement is the genome equivalent number of reads. In some embodiments, the assigning is based on the combination of the nucleotide sequence and the protein associated with the cfDNA containing the sequence.
According to some embodiments, association of the DNA-associated protein with the genomic location is indicative of active transcription and the genomic location is within a tissue or cell type specific gene or enhancer element. According to some embodiments, association of the DNA-associated protein with the genomic location is indicative of silenced transcription and the genomic location is within a repressor element, or a gene silenced in the tissue or cell type.
Methods of measuring cfDNA are well known in the art and any such method may be employed. Measuring cfDNA quantity may be performed for example with a nanopore, a sequencer, a spectrophotometer (e.g., a NanoDrop), a fluorometer, and electrophoresis to name but a few. In some embodiments, the sequencer is a next-generation sequencer. In some embodiments, the measuring comprises sequencing the cfDNA. Methods of measuring or identifying the tissue or cell type of origin of cfDNA are also well known in the art. They include sequence analysis, DNA methylation analysis, analysis of DNA associated proteins (e.g., histone modification or variant analysis) and combinations of sequence analysis and epigenetic analysis (e.g., DNA methylation/histone modification). Methods of cfDNA source analysis and lists of informative loci can also be found in, for example, J. Moss, et al., “Comprehensive human cell-type methylation atlas reveals origins of circulating cell-free DNA in health and disease” Nat. Commun. 9, 5068 (2018); F. Mouliere, et al., “Enhanced detection of circulating tumor DNA by fragment size analysis” Sci. Transl. Med. 10 (2018); and International Patent Applications WO2019012542, WO2019012543, WO2019175876 and WO2020212992 herein incorporated by reference in their entirety.
In some embodiments, the cfDNA is total cfDNA. In some embodiments, total cfDNA comprises cfDNA from the donor and the subject. In some embodiments, the cfDNA is cfDNA originating from the donor. In some embodiments, the cfDNA is cfDNA originating from a specific tissue or cell type of the subject. In some embodiments, the cfDNA is cfDNA originating from a specific tissue or cell type. In some embodiments, the specific cell type is a hematopoietic cell. In some embodiments, the specific cell type is an immune cell. In some embodiments, the specific cell type is a cell derived from a hematopoietic stem cell. In some embodiments, the specific tissue or cell type is selected from skin, lung, liver, colon, B cells, T cells, CD8 positive T cells, eosinophils, monocytes, neutrophils and T regulatory cells (Tregs). In some embodiments, the specific tissue or cell type is selected from skin, lung, liver, intestine, B cells, T cells, CD8 positive T cells, eosinophils, monocytes, neutrophils and T regulatory cells (Tregs). In some embodiments, the specific tissue or cell type is selected from liver, B cells, T cells, CD8 positive T cells, eosinophils, monocytes, neutrophils and Tregs. In some embodiments, the specific tissue is skin. In some embodiments, the specific tissue is lung. In some embodiments, the specific tissue is liver. In some embodiments, the specific tissue is colon. In some embodiments, colon is intestine. In some embodiments, intestine is colon. In some embodiments, colon is the gastrointestinal (GI) tract. In some embodiments, intestine is the GI tract. In some embodiments, the specific cell type is selected from B cells, T cells, CD8 positive T cells, eosinophils, monocytes, neutrophils and T regulatory cells (Tregs). In some embodiments, the specific cell type is eosinophils. In some embodiments, the specific cell type is monocytes. In some embodiments, the specific cell type is neutrophils. In some embodiments, the specific cell type is B cells. In some embodiments, the specific cell type is T cells. In some embodiments, the specific cell type is CD8 positive cells. In some embodiments, CD8 positive cells are CD8 positive T cells. In some embodiments, CD8 positive cells are all CD8 positive cells. In some embodiments, the specific cell type is Tregs.
In some embodiments, the cfDNA is cfDNA originating from a plurality of specific tissues and/or cell types. In some embodiments, a plurality is at least 2, 3, 4, 5, 6, 7, 8, 9, 10 or 11 tissues and/or cell types. Each possibility represents a separate embodiment of the invention. In some embodiments, a plurality is at least 3 tissues and/or cell types. In some embodiments, a plurality is at least 5 tissues and/or cell types. In some embodiments, a plurality is at least 7 tissues and/or cell types. In some embodiments, the cfDNA is cfDNA originating from monocytes, eosinophils, T cell, CD8 cells, B cells, skin and liver.
In some embodiments, the predetermined threshold is the cfDNA measurement in a sample from a healthy control. In some embodiments, the predetermined threshold is the average of cfDNA measurements in a group of samples from a population of healthy controls. In some embodiments, the cfDNA measurement is the cfDNA quantity. In some embodiments, a healthy control is a subject without GVHD. In some embodiments, a healthy control is a subject that has not underwent an HCT. In some embodiments, a healthy control is a subject that underwent an HCT and does not have GVHD. In some embodiments, a healthy control is a subject that underwent an HCT and did not develop GVHD.
As used herein, the term “parameter” refers to any measurable characteristic of the sample. In some embodiments, measurements of at least two parameters are received. In some embodiments, at least two is at least 2, 3, 4, 5, 6, 7, 8, 9, or 10 parameters. Each possibility represents a separate embodiment of the invention. In some embodiments, at least 2 is at least 3. In some embodiments, at least 2 is at least 5. In some embodiments, at least 2 is at least 6. In some embodiments, at least 2 is at least 7. In some embodiments, at least 2 is at least 8. In some embodiments, at least 2 is at least 9.
In some embodiments, the parameter is a cfDNA parameter. In some embodiments, a parameter is a quantity of cfDNA. In some embodiments, the quantity of cfDNA is the total quantity of cfDNA. In some embodiments, cfDNA is cfDNA in the sample. In some embodiments, the quantity of cfDNA is quantity of cfDNA form a specific tissue or cell type. In some embodiments, a specific tissue or cell type is a plurality of specific tissues and/or cell types. In some embodiments, the plurality is selected from neutrophils, monocytes, eosinophils, T cells, CD8 cells, B cells, Tregs, skin, lung, intestine and liver. In some embodiments, the plurality is selected from neutrophils, monocytes, eosinophils, T cells, CD8 cells, B cells, skin and liver. In some embodiments, the plurality is selected from monocytes, eosinophils, T cells, CD8 cells, B cells, skin and liver. In some embodiments, the plurality is selected from monocytes, eosinophils, T cells, CD8 cells, and B cells. In some embodiments, the at least two parameters comprise cfDNA from monocytes quantity, cfDNA from eosinophils quantity, cfDNA from T cell quantity, cfDNA from CD8 cells quantity, cfDNA from B cells quantity, cfDNA from skin quantity and cfDNA from liver quantity.
In some embodiments, the cfDNA quantity is a cfDNA parameter. In some embodiments, a cfDNA parameter is a cfDNA amount. In some embodiments, an amount is a quantity. In some embodiments, the cfDNA parameter is selected from: total cfDNA quantity, cfDNA quantity from neutrophils, cfDNA quantity from monocytes, cfDNA quantity from eosinophils, cfDNA from Treg cells, cfDNA quantity from T cells, cfDNA quantity from CD8 cells, cfDNA quantity from B cells, cfDNA quantity from skin, cfDNA quantity from intestine, cfDNA quantity from lung and cfDNA quantity from liver. In some embodiments, the cfDNA parameter is total cfDNA quantity and cfDNA quantity from at least one specific tissue and/or cell type. In some embodiments, the cfDNA parameter is total cfDNA quantity and at least one of: cfDNA quantity from neutrophils, cfDNA quantity from monocytes, cfDNA quantity from eosinophils, cfDNA from Treg cells, cfDNA quantity from T cells, cfDNA quantity from CD8 cells, cfDNA quantity from B cells, cfDNA quantity from skin, cfDNA quantity from intestine, cfDNA quantity from lung and cfDNA quantity from liver. In some embodiments, the cfDNA parameter is total cfDNA quantity and at least one of: cfDNA quantity from monocytes, cfDNA quantity from skin, cfDNA quantity from neutrophils and cfDNA quantity from eosinophils. In some embodiments, the cfDNA parameter is total cfDNA quantity and cfDNA quantity from monocytes.
In some embodiments, the parameter is a clinical parameter. In some embodiments, a clinical parameter is a non-cfDNA parameter. In some embodiments, a clinical parameter is a blood parameter. In some embodiments, a blood parameter is a parameter that can be measured from a blood sample but does not comprise sequencing of cfDNA. In some embodiments, the parameter is a blood parameter. In some embodiments, the parameter is a blood cell count. In some embodiments, the blood cell count is white blood cells count (WBC). In some embodiments, the blood cell count is red blood cell count. In some embodiments, the parameter is a blood protein level. In some embodiments, the blood protein is hemoglobin.
In some embodiments, the blood protein is a liver enzyme. In some embodiments, the parameter is a liver enzyme. In some embodiments, the liver enzyme is a secreted liver enzyme. In some embodiments, the liver enzyme is a circulating liver enzyme. In some embodiments, the liver enzyme is a plurality of liver enzymes. In some embodiments, the liver enzyme is selected from aspartate transaminase (AST), alanine transaminase (ALT), alkaline phosphatase (ALP) and gamma-glutamyl transpeptidase (GGTp). In some embodiments, the liver enzyme is selected from AST and ALT. In some embodiments, the liver enzyme is ALT. In some embodiments, the liver enzyme is AST. In some embodiments, the liver enzyme is ALP. In some embodiments, the liver enzyme is GGTp. In some embodiments, the at least two parameters comprise liver enzyme levels, WBC count and hemoglobin levels. In some embodiments, the at least two parameters comprise liver enzyme levels and WBC count. In some embodiments, the at least two parameters comprise AST and ALT levels, WBC count and hemoglobin levels. In some embodiments, the at least two parameters comprise AST and ALT levels and WBC count.
In some embodiments, the at least two parameters comprise a cfDNA parameter and a clinical parameter. In some embodiments, the at least two parameters comprise a liver enzyme level and a cfDNA parameter. In some embodiments, the at least two parameters comprise a liver enzyme level and total cfDNA quantity. In some embodiments, the at least two parameters are selected from: ALT levels, total cfDNA quantity, cfDNA quantity from monocytes, cfDNA quantity from skin, GGTp levels, cfDNA quantity from neutrophils, and cfDNA quantity from eosinophils. In some embodiments, the at least two parameters comprise ALT level and total cfDNA quantity. In some embodiments, the clinical parameter is ALT level and the cfDNA parameter is total cfDNA quantity. In some embodiments, the at parameters comprise ALT level and total cfDNA quantity and further comprise at least one of: cfDNA quantity from monocytes, cfDNA quantity from skin, GGTp levels, cfDNA quantity from neutrophils, and cfDNA quantity from eosinophils. In some embodiment, the at least two parameters further comprise at least a third parameter selected from: cfDNA quantity from monocytes, cfDNA quantity from skin, GGTp level, cfDNA quantity from neutrophils and cfDNA quantity from eosinophils. In some embodiments, the at least two parameters are at least three parameters and comprise ALT level, total cfDNA quantity and cfDNA quantity from monocytes.
In some embodiments, the at least two parameters comprise cfDNA from monocytes quantity, cfDNA from eosinophils quantity, cfDNA from T cell quantity, cfDNA from CD8 cells quantity, cfDNA from B cells quantity, cfDNA from skin quantity, cfDNA from liver quantity and liver enzyme levels. In some embodiments, the at least two parameters comprise cfDNA from monocytes quantity, cfDNA from eosinophils quantity, cfDNA from T cell quantity, cfDNA from CD8 cells quantity, cfDNA from B cells quantity, cfDNA from skin quantity, cfDNA from liver quantity and AST and ALT levels.
In some embodiments, the ML model was trained on a training set. In some embodiments, the training set comprises the at least two parameters. In some embodiments, the training set comprises measurements of the at least two parameters. In some embodiments, the training set comprises at least one cfDNA parameter and at least one clinical parameter. In some embodiments, the parameters in the training set are from subjects suffering from GVHD. In some embodiments, the parameters in the training set are from healthy subjects. In some embodiments, healthy subjects are subjects not suffering from GVHD. In some embodiments, the parameters in the training set are from subjects suffering and not suffering from GVHD. In some embodiments, the parameters in the training set are from subjects suffering GVHD and healthy subjects. In some embodiments, the parameters are labeled. In some embodiments, the label indicates the source of the parameter. In some embodiments, the source is the subject that provided the sample from which the parameter was determined. In some embodiments, the label is GVHD or not GVHD. In some embodiments, the label is from a subject with GVHD or from a subject without GVHD. In some embodiments, the label is from a subject suffering from GVHD or from a healthy subject.
In some embodiments, the machine learning model is trained to predict GVHD presence. In some embodiments, the machine learning model is trained to detect GVHD presence. In some embodiments, the machine learning model is trained to diagnose GVHD. In some embodiments, the ML model outputs a diagnosis. In some embodiments, the ML model outputs a prediction. In some embodiments, diagnosis or prediction is having GVHD or not having GVHD. In some embodiments, the ML model outputs a GVHD probability. In some embodiments, the output is binary and consists of GVHD or no GVHD. In some embodiments, the output is quantitative and provides a probability score.
Machine learning models are well known in the art and any such model may be used. Models include, but are not limited to artificial neural networks, support vector machines (SVM) classifier and a k-nearest neighbor (k-NN) classifier. In some embodiments, the machine learning model is a classification model. In some embodiments, the machine learning model is a classifier. In some embodiments, the machine learning model is an SVM classifier. In some embodiments, the machine learning model is a k-NN classifier. In some embodiments, the machine learning model is selected from an SVM classifier and a k-NN classifier. In some embodiments, the algorithm is a boosting algorithm. In some embodiments, the ML model employs the algorithm. In some embodiments, the ML model is the algorithm. In some embodiments, the algorithm is a random forest algorithm. In some embodiments, the machine learning model implements a machine learning algorithm. In some embodiments, the machine learning model is a supervised model. In some embodiments, supervised is self-supervised.
In some embodiments, the method further comprises an inference stage. In some embodiments, the inference stage comprises applying the trained ML model. In some embodiments, the trained ML model is applied to the parameters. In some embodiments, the parameters are the received parameters. In some embodiments, the inference stage comprises diagnosing/predicting GVHD. In some embodiments, the inference stage is for diagnosing/predicting GVHD.
In some embodiments, the training set is generated. In some embodiments, the generating comprises compiling parameters. In some embodiments, compiling is combining. In some embodiments, the method comprises producing the training set. In some embodiments, the method comprises obtaining a sample. In some embodiments, the method comprises extracting cfDNA from the sample. In some embodiments, the method comprises quantifying the cfDNA. In some embodiments, the quantifying is quantifying the amount of cfDNA. In some embodiments, cfDNA from a plurality of tissues and/or cell types is quantified. In some embodiments, the training set is generated by labeling the cfDNA quantities. In some embodiments, the label is the origin of the cfDNA. In some embodiments, the origin is from a subject that suffers from GVHD or a subject that does not suffer from GVHD. In some embodiments, the origin is from a subject that suffers from GVHD or a healthy subject. In some embodiments, the healthy subject is a subject that has had a HSCT. In some embodiments, the generating comprises compiling a plurality of cfDNA quantities. In some embodiments, the cfDNA quantities are from a plurality of tissues and/or cell types. In some embodiments, the generating comprises compiling the labels. In some embodiments, the compiling is compiling the cfDNA quantities and their labels. In some embodiments, the plurality comprises labels from both subject that suffer from GVHD and subjects that do not suffer from GVHD. In some embodiments, the plurality comprises labels from both subject that suffer from GVHD and healthy subjects. In some embodiments, the training set compiles parameters and labels from subjects that suffer from GVHD and subjects that do not suffer from GVHD. In some embodiments, the training set compiles parameters and labels from subjects that suffer from GVHD and healthy subjects.
In some embodiments, the method further comprises treating a subject diagnosed with GVHD. In some embodiments, the treating is with an anti-GVHD therapy. In some embodiments, the anti-GVHD therapy comprises an anti-GVHD therapeutic agent. In some embodiments, the treating comprises administering the anti-GVHD therapy. In some embodiments, the therapeutic agent is an immunosuppressant. In some embodiments, the therapeutic agent is an anti-lymphocyte agent. In some embodiments, the lymphocyte is a T cell. In some embodiments, the lymphocyte is a natural killer cell. In some embodiments, the therapeutic agent is a steroid. In some embodiments, the steroid is a corticosteroid. In some embodiments, the steroid is prednisone. In some embodiments, the steroid is methylprednisolone. In some embodiments, the therapeutic agent is an anti-tumor necrosis factor alpha (TNFa) agent. In some embodiments, the anti-TNFa agent is an anti-TNFa antibody. In some embodiments, the antibody is a blocking antibody. In some embodiments, the subject is already receiving a steroid and the treating comprises administering an increased dose of the steroid. In some embodiments, the subject is already receiving a steroid and the treating comprises administering a different steroid. In some embodiments, the different steroid is a stronger steroid. In some embodiments, stronger comprises a stronger immunosuppressing effect. Anti-GVHD therapies are well known in the art and any such therapy may be used. Examples of anti-GVHD therapies include, for example, steroids, cannabidiol (CBD), antithymocyte globulin, cyclosporin, mycophenolate mofetil, mesenchymal stem cell transplant, infliximab, alemtuzumab, daclizumab, extracorporeal photopheresis, sirolimus, and pentostatin.
In some embodiments, the method further comprises not treating a subject not diagnosed with GVHD. In some embodiments, the method further comprises reducing immunosuppression of a subject not diagnosed with GVHD. In some embodiments, reducing is tapering. In some embodiments, immunosuppression comprises administering an immunosuppressant. In some embodiments, the immunosuppressant is steroids. In some embodiments, reducing is ceasing. In some embodiments, reducing is stopping.
By another aspect, there is provided a system comprising: at least one hardware processor; and a non-transitory computer-readable storage medium having stored thereon program code, the program code executable by the at least one hardware processor to perform a method of the invention.
By another aspect, there is provided a computer program product comprising a non-transitory computer-readable storage medium having program code embodied therewith, the program code executable by at least one hardware processor to perform a method of the invention.
The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire. Rather, the computer readable storage medium is a non-transient (i.e., not-volatile) medium.
Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
These computer readable program instructions may be provided to a processor of a general-purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
As used herein, the term “about” when combined with a value refers to plus and minus 10% of the reference value. For example, a length of about 1000 nanometers (nm) refers to a length of 1000 nm+−100 nm.
It is noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a polynucleotide” includes a plurality of such polynucleotides and reference to “the polypeptide” includes reference to one or more polypeptides and equivalents thereof known to those skilled in the art, and so forth. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely,” “only” and the like in connection with the recitation of claim elements, or use of a “negative” limitation.
In those instances where a convention analogous to “at least one of A, B, and C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “a system having at least one of A, B, and C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase “A or B” will be understood to include the possibilities of “A” or “B” or “A and B.”
It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination. All combinations of the embodiments pertaining to the invention are specifically embraced by the present invention and are disclosed herein just as if each and every combination was individually and explicitly disclosed. In addition, all sub-combinations of the various embodiments and elements thereof are also specifically embraced by the present invention and are disclosed herein just as if each and every such sub-combination was individually and explicitly disclosed herein.
Additional objects, advantages, and novel features of the present invention will become apparent to one ordinarily skilled in the art upon examination of the following examples, which are not intended to be limiting. Additionally, each of the various embodiments and aspects of the present invention as delineated hereinabove and as claimed in the claims section below finds experimental support in the following examples.
Various embodiments and aspects of the present invention as delineated hereinabove and as claimed in the claims section below find experimental support in the following examples.
Generally, the nomenclature used herein and the laboratory procedures utilized in the present invention include molecular, biochemical, microbiological and recombinant DNA techniques. Such techniques are thoroughly explained in the literature. See, for example, “Molecular Cloning: A laboratory Manual” Sambrook et al., (1989); “Current Protocols in Molecular Biology” Volumes I-III Ausubel, R. M., ed. (1994); Ausubel et al., “Current Protocols in Molecular Biology”, John Wiley and Sons, Baltimore, Maryland (1989); Perbal, “A Practical Guide to Molecular Cloning”, John Wiley & Sons, New York (1988); Watson et al., “Recombinant DNA”, Scientific American Books, New York; Birren et al. (eds) “Genome Analysis: A Laboratory Manual Series”, Vols. 1-4, Cold Spring Harbor Laboratory Press, New York (1998); methodologies as set forth in U.S. Pat. Nos. 4,666,828; 4,683,202; 4,801,531; 5,192,659 and 5,272,057; “Cell Biology: A Laboratory Handbook”, Volumes I-III Cellis, J. E., ed. (1994); “Culture of Animal Cells—A Manual of Basic Technique” by Freshney, Wiley-Liss, N. Y. (1994), Third Edition; “Current Protocols in Immunology” Volumes I-III Coligan J. E., ed. (1994); Stites et al. (eds), “Basic and Clinical Immunology” (8th Edition), Appleton & Lange, Norwalk, C T (1994); Mishell and Shiigi (eds), “Strategies for Protein Purification and Characterization—A Laboratory Course Manual” CSHL Press (1996); all of which are incorporated by reference. Other general references are provided throughout this document.
Methylation analysis: cfDNA was prepared and its concentration measured (in nanograms per milliliter plasma), then it was treated with bisulfite to expose the status of methylation and multiplex PCR (Neiman et al., “Multiplexing DNA methylation markers to detect circulating cell-free DNA derived from human pancreatic B cells, JCI Insight, 2020; 5(14), the contents of which are hereby incorporated by reference in its entirety) was performed to amplify marker loci. PCR products were sequenced on a NextSeq machine, and the fraction of molecules carrying a tissue-specific pattern of methylation was determined. This information, averaged over the markers for each tissue, was used to assess the relative contribution of each tissue/cell type to cfDNA. In addition, by multiplying the proportion of cfDNA from each tissue/cell type by the total concentration of cfDNA in a sample, the absolute concentration of cfDNA from each tissue in plasma was calculated and expressed in genome equivalents per milliliter plasma. Primer sequences of all markers are provided in Table 1.
Clinical assessment of HISCT patients in the chronic setting: To assess the utility of cfDNA for detecting organ damage in chronic GVHD, 101 plasma samples from 101 individuals were collected. These individuals were more than 100 days post allogeneic stem cell transplantation and arrived for clinical follow up at the BMT day care unit at Hadassah medical center. Patients agreeing to participate, regardless of their cGVHD status, signed an informed consent. Upon each visit blood was drawn for regular blood tests (extra 10 ml of blood was drawn for cfDNA analysis) and the patient underwent a full assessment by the treating physician which included cGVHD grading according to the 2014 National Institute of Health (NIH) criteria. During the course of this 38-month study, 65 patients were diagnosed at any point with clinically evident cGVHD, while 36 were not found to have clinical signs of cGVHD.
Patient Characteristics: The median age of patients was 47 years. Sixty five percent of the patients were males. The majority of patients were transplanted due to acute myeloid leukemia (57%), had a matched sibling (63%), were treated with a myeloablative conditioning regimen (64%) and received stem cells withdrawn from peripheral blood (PBSC, 92%). Most of the patients (55%) underwent a transplant from a matched gender while 25% were transplanted from a mismatched donor gender, in a female to male direction.
Fifty-seven percent of the 101 samples were collected from patients with a history of acute GVHD. None had signs of overlap (both acute and chronic) GVHD at the time of sampling. One patient developed liver GVHD one month post Donor Lymphocyte Infusion (DLI). The median time from transplantation was 783 days (range 101-7878 days). Half of the samples (49) were taken from patients receiving one or more immunosuppressive agents at the time of collection. Only seven patients had evidence of CMV viremia at the time of collection. One patient had biopsy proven colitis, which did not show CMV inclusion bodies while none of the remaining six had any evidence for CMV disease. Four patients were treated for CMV infection. Eleven patients had a positive EBV-PCR in peripheral blood (with a median of 300 copies/ml), none of which was clinically significant. One patient was positive in the upper respiratory tract for RSV and one for influenza. One patient had staphylococcus epidermis bacteremia. Chimerism levels were routinely monitored. Ninety eight percent of the samples were obtained from patients with a blood driven STR assay indicating 100% donor-derived hematopoietic cells. Two samples exhibited a donor chimerism ranging from 88% to 92%, precluding analysis of the relationship between degree of chimerism, cfDNA methylation profiles and a potential relapse. None of the samples were taken at the time of relapse.
Statistical analysis: Assessment of cfDNA plasma levels in healthy controls versus allogeneic transplanted patients with and without clinical signs of cGVHD was performed using nonparametric, unpaired, Mann Whitney test. Analyses were performed using GraphPad Prism (version 10.0.1) and results were considered statistically significant for p-values of ≤0.05.
Machine learning was used to evaluate the predictive power of both cfDNA and biochemical measurements in relation to clinical evident cGVHD. Multivariate logistic regression (MLR), XG boost and random forest (RF) classifiers were compared on the data set. MLR, XGboost, and RF had an average accuracy of 0.74, 0.67, and 0.65, respectively, by Repeated-K-fold cross-validation (K=5) with a standard deviation of 0.23, 0.22, and 0.3, respectively. As the MLR model had both higher accuracy with similarly robust results by cross-validation, MLR was applied for further analyses. Furthermore, MLR emerges as the most fitting estimator based on the following considerations:
It was hypothesized that measurements of cfDNA and blood biochemical values possess significant predictive potential for the presence of cGVHD. Shapley values were leveraged to gauge the magnitude of the predictive capability of each feature. This latter technique offers a principled approach to feature selection, promoting enhanced performance with reduced overfitting. Since higher cell-free DNA levels are expected to indicate cGVHD, the parameter space of the model was constrained to be non-negative for all coefficients. The performance was compared to an unconstrained optimization in order to explore the overfitting potential of the model. A total of 93 samples (for which data was available for all parameters) were used for the analysis. Shapley analysis (Lundberg and Lee, “A unified approach to interpretting model predictions”, Advances in Neural Information Processing Systems. 2017:4765-74, hereby incorporated by reference in its entirety) was performed on a collection of 17 features (consisting of GGT, ALP, ALT, AST, TBil, Total cfDNA level [presented in ng/ml], and organ specific cfDNA: cfSkin, cfLung, cfGI, cfLiver, cfNeutrophils, cfMonocytes, cfEosinophils, cfB cells, cfT cells, cfCD8 cells, cfTregs cell). Next, to robustly validate the predictive potential of the features, Repeated-K-Fold cross-validation was utilized. Repeated 5-fold cross-validations was conducted across the feature sets given a positive coefficient (constrained optimization). Each set, labeled n=1, . . . , 17, consists of the highest-ranking n features, meaning, set n=1 is the single top-ranking feature, set n=2 consists of the 2 top ranking features, set n=3 of the 3 top ranking features and so on. The metrics (Specificity, NPV, PPV, AUC and Precision) for each n were calculated. The selection of the best feature set was determined based on those achieving the highest AUC (Area Under the Curve) and demonstrating favorable performance across other metrics. Recall, specificity, AUC, NPV and PPV of logistic regression models trained using only the best feature set was calculated. A comparison between cfDNA features compared to blood biochemical features and to the combination of both (meaning the entire set) was performed. All analyses were performed using Python 3.10.
Using the model, accuracy ([(TP+ TN)/Total testing samples]×100%), specificity ([TP/(TP+FN)]×100%), sensitivity([TN/(TN+FP)]×100%, positive predictive value (PPV) ([TN/(TN+FN)]×100%) and negative predictive value (NPV)/precision ([TP/(TP+FP)]×100%) were measured.
Graphical representation of the tradeoff between specificity and sensitivity was done using the receiver operating characteristics curve (ROC). Area under the curve (AUC) was calculated in order to determine the ability of the classifier to distinguish positive and negative results. Spearman rank correlation was used to determine the significance of correlation between each pair of variables and other parameters.
By comparing publicly available methylomes of specific human tissues (see Moss et al., “Comprehensive human cell-type methylation atlas reveals origins of circulating cell-free DNA in health and disease”, Nat. Comm., 2018; 9(1):5068) genomic loci containing CpG sites have been identified that are uniquely unmethylated in specific tissues or cell types, relevant to cGVHD: These included hepatocyte (5 markers), skin (5 markers), lung epithelial cells (10 markers), and intestinal epithelial cells (7 markers). Informative loci were selected from Moss et al., 2018, “Comprehensive human cell-type methylation atlas reveals origins of circulating cell-free DNA in health and disease”, Nat. Commun., 9:5068, herein incorporated by reference in its entirety. Table 1 provides the primers for amplification of the methylation marker sequences used herein. Multiplex PCR cocktails have been designed to amplify all these loci from genomic DNA after bisulfite conversion and the products have been sequenced to determine the fraction of unmethylated DNA molecules present in the starting material.
The overall scheme of the experiment is shown in
Total and tissue-specific cfDNA concentration in samples from healthy individuals (median age 37 years old (range 24-68), 58% females), samples from HCT patients that had no evidence of cGVHD at the time of sampling, and samples from HCT patients defined by their treating physician as having clinically evident cGVHD were compared. The NIH 2014 criteria was used by the treating physician for defining disease severity (mild, moderate and severe) and organ scoring (0-3).
Analyzing 101 samples from 101 patients, HCT patients suffering from cGVHD (in any organ) had statistically significant higher concentrations of total cfDNA compared to HCT patients with no clinical evidence of cGVHD (p<0.0001) (
cfDNA signals from skin (p=0.0106,
Analysis of immune-derived cfDNA showed a significantly higher concentration of cfDNA originating from neutrophils, monocytes, eosinophils, and B and T lymphocytes in HCT patients diagnosed with clinical cGVHD as compared to those who were not (
Next, correlations between cfDNA parameters and cGVHD clinical scores among HCT patients were sought. A correlation matrix was produced for all 101 plasma samples for which all tested parameters were available. cfDNA parameters were highly correlated internally (for example, samples with high concentration of total cfDNA tended to also have high levels of organ specific cfDNA) (
An attempt was made to use machine learning to create a model which can aid the treating physician in predicting the likelihood that a patient has active cGVHD. Employing Shapley analysis on all 17 features (detailed in the Materials and Methods section) yielded positive Shapley values for 7 features (alanine transaminase (ALT), total cfDNA, cfDNA of monocytes, cfDNA of skin, gamma glutamyl transpeptidase (GGTp), cfDNA of neutrophils and cfDNA of eosinophils), which are presented as a distribution graph in
Finally, the performances of the models were compared to the exact equivalent set of models, where, instead of using a constrained optimization, an unconstrained optimization (allowing negative coefficients) was used. Shapley values were calculated for all 17 features (
Evidently, both constrained and unconstrained optimization techniques demonstrate comparable performance, suggesting minimal overfitting with either optimization technique. Moreover, emphasizing the high predictive capability of a small set of features, consisting of biochemical as well as cfDNA measurements.
Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims.
This application claims the benefit of priority of U.S. Provisional Patent Application No. 63/428,778, filed Nov. 30, 2022, the contents of which are all incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
63428778 | Nov 2022 | US |