The application herein incorporates by reference in its entirety the sequence listing material in the ASCII text file named “22.03.07 P104992W001 Sequence Listing”, created Jul. 31, 2023, and having the size of 30.8 kilobytes, filed with this application.
The invention relates to immunotherapy.
Cancer is a major burden of disease worldwide. Each year, tens of millions of people are diagnosed with cancer around the world, and more than half of the patients eventually die from it. In many countries, cancer ranks the second most common cause of death following cardiovascular diseases. With significant improvement in treatment and prevention of cardiovascular diseases, cancer has or will soon become the number one killer in many parts of the world. As elderly people are most susceptible to cancer and population aging continues in many countries, cancer will remain a major health problem around the globe.
Whilst the primary purpose of the immune system is to fight infections caused by external foreign agents such as pathogens, it also has the important function of attacking and eliminating cancer cells. Immunotherapy of cancer usually works by assisting the immune system in some way to fight cancer cells.
The inventors have identified chromosome conformation signatures that define states of the immune system that are relevant to therapy of cancer. This elucidates the role of this modality in regulation of the immune system and allows a ‘readout’ of in respect of how a patient's immune system will respond to immunotherapy. It has also allowed identification of certain types of responder population for which immunotherapy is not appropriate, and in fact be very harmful. This analysis at the level of the 3D architecture of the genome defined by chromosome interactions offers very early readouts of patient response to immunotherapy allowing decisions to be made at early disease stages as to the most appropriate therapies. Detection of the relevant chromosome interactions has according to the invention has been found to be robust, working across different immunotherapies and cancers.
The identified markers are consistent with deregulations in T cells, NK (natural killer) cells, macrophages, B cells and dendritic cells (DC) showing the role played by the specific set up at cellular level of the adaptive and innate immune system in individual patients as part of the cancer-host interaction which defines disease progression (hyper-progressors) and responsiveness to immunotherapy.
Accordingly, the invention provides a method of determining how an individual responds to immunotherapy for cancer comprising detecting the presence or absence in the individual of:
The method of determining how an individual responds to immunotherapy for cancer may comprise detecting the presence or absence in the individual of all of the chromosome interactions shown in Table 1 to thereby determine whether the individual will be responsive to immunotherapy.
Table 1 shows the universal marker set and how each marker relates to responsiveness (R) or non-responsiveness (NR) to immunotherapy.
Table 2 shows the marker set for detection of hyper-progressors and how each marker relates to hyper-progression (HS) or being stable (S).
Table 3 shows immune checkpoint molecules that can be targeted and/or modulated by the immunotherapy.
Table 4 to 6 provides examples of cancer immunotherapies for which responder status can be determined by the invention and are also the therapies that can be given to the individual based on the outcome of determination of the responder status according to the invention. These tables also show preferred cancers.
Table 7 shows markers relevant to the screens carried out in Example 2 to develop the set of markers shown in Table 8.
Table 8 shows a further universal marker set and how each marker relates to responsiveness (R) or non-responsiveness (NR) to immunotherapy.
Table 9 gives patient data for Example 2. The patients shown with an asterisk (*) were studied in the second screen described in Example 2.
The method of the invention may be referred to as the ‘process’ of the invention herein.
The chromosome interactions which are typed may be referred to as ‘markers’, ‘CCS’, ‘chromosome conformation signature’, ‘epigenetic interaction’ or ‘EpiSwitch markers’ herein.
The word ‘type’ will be interpreted as per the context, but will usually refer to detection of whether a specific chromosome interaction is present or absent. The typing will generally be by physical determination of whether the chromosome interaction is present.
The word ‘responder’ is used to refer to refer to response to immunotherapy, and covers both aspects relating to responsiveness to immunotherapy (the universal marker set) and detection of hyper-progressors. The term ‘responder group’ covers all four of the different groups discussed herein:
The chromosome interactions which are typed in the method of the invention are defined in Tables 2 and 8. The chromosome interactions which are typed in the method of the invention are further defined in Table 1. They are defined by means of the probe sequences which detect the ligated product made by an EpiSwitch method (see
The Epigenetic Interactions Relevant to the Invention
The chromosome interactions which are typed in the invention are typically interactions between distal regions of a chromosome, said interactions being dynamic and altering, forming or breaking depending upon the state of the region of the chromosome. That state will reflect how the immune system interacts with immunotherapy which is given which responder group the individual falls into.
The chromosome interaction may, for example, reflect if it is being transcribed or repressed. Chromosome interactions which are specific to responder ‘groups’ as defined herein have been found to be stable, thus providing a reliable means of measuring the differences between groups (for example reflecting different responses to immunotherapy).
Chromosome interactions specific to responder groups will normally be present before or in the early stages of a disease process, for example compared to other epigenetic markers such as methylation or changes to binding of histone proteins. Thus the process of the invention is able to provide valuable information about the way the immune system will react at an early stage. This allows early intervention (for example treatment) which as a consequence will be more effective and also allows early choices to be made of the type of treatment which is appropriate for the patient, and which treatments should not be used. Chromosome interactions also reflect the current state of the individual and therefore can be used to assess changes to disease status. Furthermore there is little variation in the relevant chromosome interactions between individuals within the same group.
The chromosome interactions which are detected in the invention could be impacted by changes to the underlying DNA sequence, by environmental factors, DNA methylation, non-coding antisense RNA transcripts, non-mutagenic carcinogens, histone modifications, chromatin remodelling and specific local DNA interactions. However it must be borne in mind that chromosome interactions as defined herein are a regulatory modality in their own right and do not have a one to one correspondence with any genetic marker (DNA sequence change) or any other epigenetic marker.
The changes which lead to the chromosome interactions may be impacted by changes to the underlying nucleic acid sequence which themselves do not directly affect a gene product or the mode of gene expression. Such changes may be for example, SNPs within and/or outside of the genes, gene fusions and/or deletions of intergenic DNA, microRNA, and non-coding RNA. For example, it is known that roughly 20% of SNPs are in non-coding regions, and therefore the process as described is also informative in non-coding situation. Typically regions of the chromosome which come together to form the interaction are less than 5 kb, 3 kb, 1 kb, 500 base pairs or 200 base pairs apart on the same chromosome.
The Process of the Invention
The process of the invention comprises a typing system for detecting chromosome interactions relating to responder status. Any suitable typing method can be used, for example a method in which the proximity of the chromosomes in the interaction is detected and/or in which a marker that reflects chromosome interaction status is detected. The typing method may be performed using the EpiSwitch™ system mentioned herein, which for example may be carried out by a method comprising the following steps (for example on DNA and/or a sample from the subject):
Detection of this ligated nucleic acid allows determination of the presence or absence of a particular chromosome interaction. The ligated nucleic acid therefore acts as a marker for the presence of the chromosome interaction. Preferably the ligated nucleic acid is detected by PCR or a probe based method, including a qPCR method.
In the method the chromosomes can be cross-linked by any suitable means, for example by a cross-linking agent, which is typically a chemical compound. In a preferred aspect, the interactions are cross-linked using formaldehyde, but may also be cross-linked by any aldehyde, or D-Biotinoyl-e-aminocaproic acid-N-hydroxysuccinimide ester or Digoxigenin-3-O-methylcarbonyl-e-aminocaproic acid-N-hydroxysuccinimide ester. Para-formaldehyde can cross link DNA chains which are 4 Angstroms apart. Preferably the chromosome interactions are on the same chromosome. Typically the chromosome interactions are 2 to 10 Angstroms apart.
The cross-linking is preferably in vitro. The cleaving is preferably by restriction digestion with an enzyme, such as Taql. The ligating may form DNA loops.
Where PCR (polymerase chain reaction) is used to detect or identify the ligated nucleic acid, the size of the PCR product produced may be indicative of the specific chromosome interaction which is present, and may therefore be used to identify the status of the locus. In preferred aspects the primers shown in any table herein are used, for example the primer pairs shown in Table 2 or 8 are used (corresponding to the chromosome interaction which is being detected). The primers shown in Table 1 may be used. Homologues of such primers or primer pairs may also be used, which can have at least 70% identity to the original sequence.
Where a probe is used to detect or identify the ligated nucleic acid, this is generally by Watson-Crick based base-pairing between the probe and ligated nucleic acid. Probe sequences as shown in any table herein may be used, for example the probe sequences shown in Table 2 or 8 (corresponding to the chromosome interaction which is being detected). Probe sequences as shown in Table 1 may be used. Homologues of such probe sequences may also be used, which can have at least 70% identity to the original sequence.
Typing according to the process of the invention may be carried out at multiple time points, for example to monitor the progression of the disease. This may be at one or more defined time points, for example at at least 1, 2, 5, 8 or 10 different time points. The durations between at least 1, 2, 5 or 8 of the time points may be at least 5, 10, 20, 50, 80 or 100 days. Typically there are 3 time points at least 50 days apart.
The Individual to Tested and/or Treated
The individual who is tested in the method of the invention is preferably a eukaryote, animal, bird or mammal. Most preferably the individual is a human. The individual may be male or female. In the case of a human individual they are typically aged 65 or above.
The invention includes detecting and treating particular groups in a population, typically differing in their responder status, for example their response to immunotherapy. The inventors have discovered that chromosome interactions differ between these groups, and identifying these differences will allow physicians to categorize their patients as a part of a particular group of the population. The invention therefore provides physicians with a process of personalizing medicine for an individual based on their epigenetic chromosome interactions. Such testing may be used to select how to subsequently treat the patient, for example the type of drug that will be administered. The process of the invention may be carried out to select treatment for an individual, for example whether or not to give any specific treatment mentioned herein is administered to the individual.
The individual that is tested in the process of the invention may have been selected in some way, for example based on a risk factor, symptom or physical characteristic. The individual may have been selected based on having a symptom of cancer and/or or being in the early stages of cancer.
The individual may be susceptible to any cancer mentioned herein and/or may be in need of any therapy mentioned in. The individual may be receiving any therapy mentioned herein. In particular, the individual may have, or be suspected of having, cancer, for example any specific cancer mentioned herein.
Types of Cancer
The cancer which is relevant to the invention can include any cancer mentioned herein, and preferably is melanoma, lung cancer, non-small cell lung carcinoma (NSCLC), diffuse large B-cell lymphoma, liver cancer, hepatocellular carcinoma, prostate cancer, breast cancer, leukaemia, acute myeloid leukaemia, pancreatic cancer, thyroid cancer, nasal cancer, brain cancer, bladder cancer, cervical cancer, non-Hodgkin lymphoma, ovarian cancer, colorectal cancer or kidney cancer. The cancer may be one which can be treated by immunotherapy, for example any specific immunotherapy mentioned herein.
Types of Immunotherapy
The invention relates to determining response to immunotherapy, and in particular whether the individual is responsive to immunotherapy and/or whether they are hyper-progressors in whom immunotherapy will cause acceleration of disease.
Preferably the response which is determined is to a therapy which comprises a molecule or cell that is relevant to the immune system, such as a composition that comprises an antibody or immune cell (for example a T cell or dendritic cell) or any therapeutic substance mentioned herein. It may be response to a substance that modulates or stimulates the immune system, such as a vaccine therapy. The immunotherapy may modulate, block or stimulate an immune checkpoint, and thus may target or modulate PD-L1, PD-L2 or CTLA4 or any other immune checkpoint molecule disclosed herein, and thus is preferably immunocheckpoint therapy. Preferably the response is responsiveness to an antibody therapy, or to any specific therapy disclosed herein. The therapy may be a combination therapy, for example any specific combination therapy disclosed herein.
In one embodiment the response is to a PD-1 inhibitor or PD-L1 inhibitor, including an antibody specific for PD-1 or PD-L1. PD-1 is ‘programmed cell death protein’ and PD-L1 is ‘programmed death-ligand 1’.
The term ‘antibody’ includes all fragments and derivatives of an antibody that retain the ability to bind the antigen target, for example single chain scFV's or Fab's.
The therapy may be mono or combination therapy, for example with immunocheckpoint modulators (preferably inhibitors) for PD-1 and or its ligand, PD-L1. The therapy could comprise administering at least one immunocheckpoint modulator, for example as disclosed herein, such as in any table, figure or example. The therapy could be a combination of an anti-PD-1 or anti-PD-L1 combined with another drug that targets a checkpoint like CTLA4 (Ipilimumab/Yervoy) or small molecules. The PD-1 inhibitors could be pembrolizumab (Keytruda) or nivolumab (Opdivo). The modulator of PD-L1 or therapeutic agent could be Atezolizumab (Tecentriq), Avelumab (Bavencio), Durvalumab (Imfinzi), CA-170, Ipilimumab, Tremelimumab, Nivolumab, Pembrolizumab, Pidilizumab, BMS935559, GVAXMPDL3280A, MEDI4736, MSB0010718C, MDX-1105/BMS-936559, AMP-224, MEDI0680.
The therapy may comprise administering agents that target and/or modulate interferon gamma or the JAK-START pathway.
The therapeutic agent may be any such agent disclosed in any table herein or may target any ‘target’ disclosed herein, including any protein disclosed herein. It is understood that any agent that is disclosed in a combination should be seen as also disclosed for administration individually.
Hyper-Progressors
Hyper-progression in cancer can be recognised in a straightforward manner by the skilled person, and it is preferably an increase in disease progression in cancer upon administration of immunotherapy and/or an adverse response to immunotherapy in an individual with cancer. It be measured using any suitable parameter for disease, such as a 2-fold increase in tumour size. It is typically more than a 50% increase in tumour burden within 60 days of administration of immunotherapy. In one aspect hyper-progressors can be defined as having less than 60 days progression-free after administration of immunotherapy and/or overall survival of less than 150 days after immunotherapy.
Choice of Treatment
Based on the results of testing by the method of the invention decisions can be made as to what treatments will be administered or not administered to the individual.
If a person is found to be responsive to immunotherapy they could be given any immunotherapy mentioned herein. In one aspect if an individual is found to be a non-responsiveness then can be given a combination therapy, such as any combination therapy listed herein. Typically a combination therapy comprises an antibody and a small molecule.
The Data in the Tables Provided Herein
Tables 1, 2 and 8 show specific markers which can be used to detect responder status. Their presence or absence can be used in such a detection (i.e. they are ‘disseminating’ markers). Tables 1 and 8 show markers which detect responsiveness to immunotherapy, and the table shows which are linked to responsiveness and which are linked to non-responsiveness. Table 2 shows markers which detect hyper-progressors, and the tables shows which are linked to being a hyper-progressor and which are linked to stable disease.
The markers are defined using probe sequences (which detect a ligated product as defined herein). The first two sets of Start-End positions show probe positions, and the second two sets of Start-End positions show the relevant 4 kb region.
The following information is provided in the probe data table:
Simple permutation-based estimation is used to determine how likely a given RP value or better is observed in a random experiment. This has the following steps:
The rank product statistic ranks chromosome interactions according to intensities within each microarray and calculates the product of these ranks across multiple microarrays. This technique can identify chromosome interactions that are consistently detected among the most differential chromosome interactions in a number of replicated microarrays. Where the p-value is 0 this indicates that there is very little variation in the Rank Product of the CCS across the samples, this is a good example of the signal to noise and effect size of CCS. Where p value is 0 and pfp is 0 this means that permutated Rank Product doesn't differ from the actual observed Rank Product. These methods are described Breitling R and Herzyk P (2005) Rank-based methods as a non-parametric alternative of the t-test for the analysis of biological microarray data. J Bioinf Comp Biol 3, 1171-1189.
The FC indicates prevalence of marker in each comparison, 2 means twice over average test, 1.5 means 1.5 over the average test, etc., and so FC indicates the weight of a marker to phenotype/group. The FC value can be used to give an indication of how many markers are needed for a highly effective test.
The probes are designed to be 30 bp away from the Taql site. In case of PCR, PCR primers are typically designed to detect ligated product but their locations from the Taql site vary. Probe locations:
Types of Detection
When detection is performed using a probe, typically sequence from both regions of the probe (i.e. from both sites of the chromosome interaction) could be detected. In preferred aspects probes are used in the process which comprise or consist of the same or complementary sequence to a probe shown in any table. In some aspects probes are used which comprise sequence which is homologous to any of the probe sequences shown in the tables.
The Approach Taken to Identify Markers and Panels of Markers
The invention described herein relates to chromosome conformation profile and 3D architecture as a regulatory modality in its own right, closely linked to the phenotype. The discovery of biomarkers was based on annotations through pattern recognition and screening on representative cohorts of clinical samples representing the differences in phenotypes. We annotated and screened significant parts of the genome, across coding and non-coding parts and over large sways of non-coding 5′ and 3′ of known genes for identification of statistically disseminating consistent conditional disseminating chromosome conformations, which for example anchor in the non-coding sites within (intronic) or outside of open reading frames.
In selection of the best markers we are driven by statistical data and p values for the marker leads. Selected and validated chromosome conformations within the signature are disseminating stratifying entities in their own right, irrespective of the expression profiles of the genes used in the reference. Further work may be done on relevant regulatory modalities, such as SNPs at the anchoring sites, changes in gene transcription profiles, changes at the level of H3K27ac.
We are taking the question of clinical phenotype differences and their stratification from the basis of fundamental biology and epigenetic controls over phenotype—including for example from the framework of network of regulation. As such, to assist stratification, one can capture changes in the network and it is preferably done through signatures of several biomarkers, for example through following a machine learning algorithm for marker reduction which includes evaluating the optimal number of markers to stratify the testing cohort with minimal noise. This may end with 3-20 markers.
Selection of markers for panels may be done by cross-validation statistical performance (and not for example by the functional relevance of the neighbouring genes, used for the reference name).
A panel of markers (with names of adjacent genes) is a product of clustered selection from the screening across significant parts of the genome, in non-biased way analysing statistical disseminating powers over 14,000-60,000 annotated EpiSwitch sites across significant parts of the genome. It should not be perceived as a tailored capture of a chromosome conformation on the gene of know functional value for the question of stratification. The total number of sites for chromosome interaction are 1.2 million, and so the potential number of combinations is 1.2 million to the power 1.2 million. The approach that we have followed nevertheless allows the identifying of the relevant chromosome interactions.
The specific markers that are provided by this application have passed selection, being statistically (significantly) associated with the condition or subgroup. This is what the data in the relevant table demonstrates. Each marker can be seen as representing an event of biological epigenetic as part of network deregulation that is manifested in the relevant condition. In practical terms it means that these markers are prevalent across groups of patients when compared to controls. On average, as an example, an individual marker may typically be present in 80% of the relevant responder group and in 10% of controls, and therefore the results of the testing by the method of the invention is straightforward to interpret and essentially amounts to a ‘binary readout’.
Simple addition of all markers would not directly represent the network interrelationships between some of the deregulations. This is where the standard multivariate biomarker analysis GLMNET (R package) can be brought in. GLMNET package helps to identify interdependence between some of the markers, that reflect their joint role in achieving deregulations leading to disease phenotype. Modelling and then testing markers with highest GLMNET scores offers not only identify the minimal number of markers that accurately identifies the patient cohort, but also the minimal number that offers the least false positive results in the control group of patients, due to background statistical noise of low prevalence in the control group. Typically a group (combination) of selected markers (such as 3 to 11) offers the best balance between both sensitivity and specificity of detection, emerging in the context of multivariate analysis from individual properties of all the selected statistical significant markers for the condition.
The tables herein show the reference names for the array probes (60-mer) for array analysis that overlaps the juncture between the long range interaction sites, the chromosome number and the start and end of two chromosomal fragments that come into juxtaposition.
In a preferred aspect all 11 of the markers of Table 1 are typed. In another preferred aspect all 11 of the markers of Table 2 are typed. In another preferred aspect all 8 of the markers of Table 8 are typed.
Samples and Sample Treatment
The process of the invention will normally be carried out on a sample. The sample may be obtained at a defined time point, for example at any time point defined herein. The sample will normally contain DNA from the individual. It will normally contain cells. In one aspect a sample is obtained by minimally invasive means, and may for example be a blood sample. DNA may be extracted and cut up with a standard restriction enzyme. This can pre-determine which chromosome conformations are retained and will be detected with the EpiSwitch™ platforms. Due to the synchronisation of chromosome interactions between tissues and blood, including horizontal transfer, a blood sample can be used to detect the chromosome interactions in tissues, such as tissues relevant to disease.
Preferred Aspects for Sample Preparation and Chromosome Interaction Detection
Methods of preparing samples and detecting chromosome conformations are described herein. Optimised (non-conventional) versions of these processes can be used, for example as described in this section.
Typically the sample will contain at least 2×105 cells. The sample may contain up to 5×105 cells. In one aspect, the sample will contain 2×105 to 5.5×105 cells.
Crosslinking of epigenetic chromosomal interactions present at the chromosomal locus is described herein. This may be performed before cell lysis takes place. Cell lysis may be performed for 3 to 7 minutes, such as 4 to 6 or about 5 minutes. In some aspects, cell lysis is performed for at least 5 minutes and for less than 10 minutes.
Digesting DNA with a restriction enzyme is described herein. Typically, DNA restriction is performed at about 55° C. to about 70° C., such as for about 65° C., for a period of about 10 to 30 minutes, such as about 20 minutes.
Preferably a frequent cutter restriction enzyme is used which results in fragments of ligated DNA with an average fragment size up to 4000 base pair. Optionally the restriction enzyme results in fragments of ligated DNA have an average fragment size of about 200 to 300 base pairs, such as about 256 base pairs.
In one aspect, the typical fragment size is from 200 base pairs to 4,000 base pairs, such as 400 to 2,000 or 500 to 1,000 base pairs.
In one aspect of the EpiSwitch process a DNA precipitation step is not performed between the DNA restriction digest step and the DNA ligation step.
DNA ligation is described herein. Typically the DNA ligation is performed for 5 to 30 minutes, such as about 10 minutes.
The protein in the sample may be digested enzymatically, for example using a proteinase, optionally Proteinase K. The protein may be enzymatically digested for a period of about 30 minutes to 1 hour, for example for about 45 minutes. In one aspect after digestion of the protein, for example Proteinase K digestion, there is no cross-link reversal or phenol DNA extraction step.
In one aspect PCR detection is capable of detecting a single copy of the ligated nucleic acid, preferably with a binary read-out for presence/absence of the ligated nucleic acid.
Processes and Uses of the Invention
The process of the invention can be described in different ways. It can be described as a process of making one or more ligated nucleic acids comprising (i) in vitro cross-linking of chromosome regions which have come together in a chromosome interaction; (ii) subjecting said cross-linked DNA to cutting or restriction digestion cleavage; and (iii) ligating said cross-linked cleaved DNA ends to form one or more ligated nucleic acids, wherein optionally detection of the ligated nucleic acid may be used to determine the chromosome state at a locus, and wherein preferably the chromosomal interactions may be 1, 3, 5, 8 or all the chromosome interactions of Table 1 or 2. In this process the chromosomal interactions may be 1, 3, 5 or 8 of the chromosome interactions of Table 8.
Homologues
Homologues of polynucleotide/nucleic acid (e.g. DNA) sequences are referred to herein. Such homologues typically have at least 70% homology, preferably at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98% or at least 99% homology, for example over a region of at least 10, 15, 20, 30, 100 or more contiguous nucleotides, or across the portion of the nucleic acid which is from the region of the chromosome involved in the chromosome interaction. The homology may be calculated on the basis of nucleotide identity (sometimes referred to as “hard homology”).
Therefore, in a particular aspect, homologues of polynucleotide/nucleic acid (e.g. DNA) sequences are referred to herein by reference to percentage sequence identity. Typically such homologues have at least 70% sequence identity, preferably at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98% or at least 99% sequence identity, for example over a region of at least 10, 15, 20, 30, 100 or more contiguous nucleotides, or across the portion of the nucleic acid which is from the region of the chromosome involved in the chromosome interaction. The homologues may have at least 70% sequence identity, preferably at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98% or at least 99% sequence identity across the entire probe, primer or primer pair.
For example the UWGCG Package provides the BESTFIT program which can be used to calculate homology and/or % sequence identity (for example used on its default settings) (Devereux et al (1984) Nucleic Acids Research 12, p387-395). The PILEUP and BLAST algorithms can be used to calculate homology and/or % sequence identity and/or line up sequences (such as identifying equivalent or corresponding sequences (typically on their default settings)), for example as described in Altschul S. F. (1993) J Mol Evol 36:290-300; Altschul, S, F et al (1990) J Mol Biol 215:403-10.
Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information. This algorithm involves first identifying high scoring sequence pair (HSPs) by identifying short words of length W in the query sequence that either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighbourhood word score threshold (Altschul et al, supra). These initial neighbourhood word hits act as seeds for initiating searches to find HSPs containing them. The word hits are extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Extensions for the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters W5 T and X determine the sensitivity and speed of the alignment. The BLAST program uses as defaults a word length (W) of 11, the BLOSUM62 scoring matrix (see Henikoff and Henikoff (1992) Proc. Natl. Acad. Sci. USA 89: 10915-10919) alignments (B) of 50, expectation (E) of 10, M=5, N=4, and a comparison of both strands.
The BLAST algorithm performs a statistical analysis of the similarity between two sequences; see e.g., Karlin and Altschul (1993) Proc. Natl. Acad. Sci. USA 90: 5873-5787. One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two polynucleotide sequences would occur by chance. For example, a sequence is considered similar to another sequence if the smallest sum probability in comparison of the first sequence to the second sequence is less than about 1, preferably less than about 0.1, more preferably less than about 0.01, and most preferably less than about 0.001.
The homologous sequence typically differs by 1, 2, 3, 4 or more bases, such as less than 10, 15 or 20 bases (which may be substitutions, deletions or insertions of nucleotides). These changes may be measured across any of the regions mentioned above in relation to calculating homology and/or % percentage sequence identity.
Homology of a ‘pair of primers’ can be calculated, for example, by considering the two sequences as a single sequence (as if the two sequences are joined together) for the purpose of then comparing against the another primer pair which again is considered as a single sequence.
The Threshold of Detection
The markers which are disclosed herein have been found to be ‘disseminating markers’ capable of determining responder status and tables 1 and 2 show which responder group each marker is present in (responder/non-responder to immunotherapy, or hyper-progressor/stable disease).
In practical terms it means that these markers are prevalent across the relevant responder group when compared to controls (as is shown by the FC value, for example). On average, as an example, an individual marker may typically be present in 80% of the relevant responder group and in 10% of controls. When testing an individual the result will be a combination of ‘present’ and ‘absent’ chromosome interactions for each of the markers shown in Table 1 or 2 allowing determination of the responder group for the individual. Typically presence/absence of at least 8 markers out of 11 compared to the ‘ideal’ result shown in the table can be used to assign the individual to a responder group.
Therapeutic Agents and Treatments
This section is relevant both to immunotherapies which define the responder group of the individual and also to therapy which may be given to individuals based on the results of the testing method of the invention.
The invention provides therapeutic agents for use in preventing or treating any condition mentioned herein. This may comprise administering to an individual in need a therapeutically effective amount of the agent. The invention provides use of the agent in the manufacture of a medicament to prevent or treat the condition, for example in individuals tested by the method of the invention.
The formulation of the agent will depend upon the nature of the agent. The agent will be provided in the form of a pharmaceutical composition containing the agent and a pharmaceutically acceptable carrier or diluent. Suitable carriers and diluents include isotonic saline solutions, for example phosphate-buffered saline. Typical oral dosage compositions include tablets, capsules, liquid solutions and liquid suspensions. The agent may be formulated for parenteral, intravenous, intramuscular, subcutaneous, transdermal or oral administration.
The dose of an agent may be determined according to various parameters, especially according to the substance used; the age, weight and condition of the individual to be treated; the route of administration; and the required regimen. A physician will be able to determine the required route of administration and dosage for any particular agent. A suitable dose may however be from 0.1 to 100 mg/kg body weight such as 1 to 40 mg/kg body weight, for example, to be taken from 1 to 3 times daily.
The invention provides an immunotherapeutic agent, preferably selected from any of tables 4 to 6, for use in a method of treating an individual identified as being responsive to immunotherapy, optionally said method comprising:
The invention provides (i) a combination immunotherapy or (ii) a therapeutic agent which is not an immunotherapy for use in a method of treating an individual identified as being non-responsive to immunotherapy, optionally said method comprising:
Screening for Therapeutic Agents
The invention provides a screening method to identify therapeutic agents for cancer comprising determining whether a candidate agent is able to cause a change to all of the chromosome interactions shown in Table 1 and/or Table 2. This screening method may comprise determining whether a candidate agent is able to cause a change to all of the chromosome interactions shown in Table 8.
Nucleic Acids of the Inventions
The invention provides certain nucleic acids, including probes and primers. Preferably the nucleic acids are DNA. It is understood that where a specific sequence is provided the invention may use the complementary sequence as required in the particular aspect.
The primers or probes shown in Table 1 or 2 may be used in the invention. In one aspect probes or primers are used which comprise any of: the sequences shown in Table 1 or 2; or fragments and/or homologues of any sequence shown in Table 1 or 2. The primers or probes shown in Table 8 may be used in the invention. In one aspect probes or primers are used which comprise any of: the sequences shown in Table 8; or fragments and/or homologues of any sequence shown in Table 8.
Labelled Nucleic Acids and Pattern of Hybridisation
The nucleic acids mentioned herein may be labelled, preferably using an independent label such as a fluorophore (fluorescent molecule) or radioactive label which assists detection of successful hybridisation. Certain labels can be detected under UV light.
Forms of the Substance Mentioned Herein
Any of the substances, such as nucleic acids or therapeutic agents, mentioned herein may be in purified or isolated form. They may be in a form which is different from that found in nature, for example they may be present in combination with other substance with which they do not occur in nature. The nucleic acids (including portions of sequences defined herein) may have sequences which are different to those found in nature, for example having at least 1, 2, 3, 4 or more nucleotide changes in the sequence as described in the section on homology. The nucleic acids may have heterologous sequence at the 5′ or 3′ end. The nucleic acids may be chemically different from those found in nature, for example they may be modified in some way, but preferably are still capable of Watson-Crick base pairing. Where appropriate the nucleic acids will be provided in double stranded or single stranded form. The invention provides all of the specific nucleic acid sequences mentioned herein in single or double stranded form, and thus includes the complementary strand to any sequence which is disclosed.
The invention provides a kit for carrying out any process of the invention, including detection of a chromosomal interaction relating to prognosis. Such a kit can include a specific binding agent capable of detecting the relevant chromosomal interaction, such as agents capable of detecting a ligated nucleic acid generated by processes of the invention. Preferred agents present in the kit include probes capable of hybridising to the ligated nucleic acid or primer pairs, for example as described herein, capable of amplifying the ligated nucleic acid in a PCR reaction. Preferred agents include any of the specific primers and probes disclosed herein and/or homologues of such primers and probes.
The invention provides a device that is capable of detecting the relevant chromosome interactions. The device preferably comprises any specific binding agents, probe or primer pair capable of detecting the chromosome interaction, such as any such agent, probe or primer pair described herein.
Detection Process
In one aspect quantitative detection of the ligated sequence which is relevant to a chromosome interaction is carried out using a probe which is detectable upon activation during a PCR reaction, wherein said ligated sequence comprises sequences from two chromosome regions that come together in an epigenetic chromosome interaction, wherein said process comprises contacting the ligated sequence with the probe during a PCR reaction, and detecting the extent of activation of the probe, and wherein said probe binds the ligation site. The process typically allows particular interactions to be detected in a MIQE compliant manner using a dual labelled fluorescent hydrolysis probe.
The probe is generally labelled with a detectable label which has an inactive and active state, so that it is only detected when activated. The extent of activation will be related to the extent of template (ligation product) present in the PCR reaction. Detection may be carried out during all or some of the PCR, for example for at least 50% or 80% of the cycles of the PCR.
The probe can comprise a fluorophore covalently attached to one end of the oligonucleotide, and a quencher attached to the other end of the nucleotide, so that the fluorescence of the fluorophore is quenched by the quencher. In one aspect the fluorophore is attached to the 5′end of the oligonucleotide, and the quencher is covalently attached to the 3′ end of the oligonucleotide. Fluorophores that can be used in the process of the invention include FAM, TET, JOE, Yakima Yellow, HEX, Cyanine3, ATTO 550, TAMRA, ROX, Texas Red, Cyanine 3.5, LC610, LC 640, ATTO 647N, Cyanine 5, Cyanine 5.5 and ATTO 680. Quenchers that can be used with the appropriate fluorophore include TAM, BHQ1, DAB, Eclip, BHQ2 and BBQ650, optionally wherein said fluorophore is selected from HEX, Texas Red and FAM. Preferred combinations of fluorophore and quencher include FAM with BHQ1 and Texas Red with BHQ2.
Use of the Probe in a qPCR Assay
Hydrolysis probes of the invention are typically temperature gradient optimised with concentration matched negative controls. Preferably single-step PCR reactions are optimized. More preferably a standard curve is calculated. An advantage of using a specific probe that binds across the junction of the ligated sequence is that specificity for the ligated sequence can be achieved without using a nested PCR approach. The processes described herein allow accurate and precise quantification of low copy number targets. The target ligated sequence can be purified, for example gel-purified, prior to temperature gradient optimization. The target ligated sequence can be sequenced. Preferably PCR reactions are performed using about 10 ng, or 5 to 15 ng, or 10 to 20 ng, or 10 to 50 ng, or 10 to 200 ng template DNA. Forward and reverse primers are designed such that one primer binds to the sequence of one of the chromosome regions represented in the ligated DNA sequence, and the other primer binds to other chromosome region represented in the ligated DNA sequence, for example, by being complementary to the sequence.
Choice of Ligated DNA Target
The invention includes selecting primers and a probe for use in a PCR process as defined herein comprising selecting primers based on their ability to bind and amplify the ligated sequence and selecting the probe sequence based properties of the target sequence to which it will bind, in particular the curvature of the target sequence.
Probes are typically designed/chosen to bind to ligated sequences which are juxtaposed restriction fragments spanning the restriction site. In one aspect of the invention, the predicted curvature of possible ligated sequences relevant to a particular chromosome interaction is calculated, for example using a specific algorithm referenced herein. The curvature can be expressed as degrees per helical turn, e.g. 10.5° per helical turn. Ligated sequences are selected for targeting where the ligated sequence has a curvature propensity peak score of at least 5° per helical turn, typically at least 10°, 15° or 20° per helical turn, for example 5° to 20° per helical turn. Preferably the curvature propensity score per helical turn is calculated for at least 20, 50, 100, 200 or 400 bases, such as for 20 to 400 bases upstream and/or downstream of the ligation site. Thus in one aspect the target sequence in the ligated product has any of these levels of curvature. Target sequences can also be chosen based on lowest thermodynamic structure free energy.
Particular Aspects
In particular aspects certain chromosome interactions are not typed, for example any specific interaction mentioned not mentioned herein. In some aspects only the markers of Table 1 or Table 2 are typed and no other markers are typed. In some aspects only the markers of Table 2 or Table 8 are typed and no other markers are typed. In some aspect only the markers of Table 1 and Table 2 are typed and no other markers are typed. In some aspect only the markers of Table 2 and Table 8 are typed and no other markers are typed.
Paragraphs Describing the Invention
The invention includes aspects described in the following numbered paragraphs:
Disclosure in Publications and Priority Applications
The contents of all publications mentioned herein are incorporated by reference into the present specification and may be used to further define the features relevant to the invention. The contents of all priority applications are incorporated by reference into the present specification and may be used to define the features relevant to the invention.
Techniques Used to Identify the Specific Relevant Chromosome Interactions
The EpiSwitch™ platform technology detects epigenetic regulatory signatures of regulatory changes between normal and abnormal conditions at loci. The EpiSwitch™ platform identifies and monitors the fundamental epigenetic level of gene regulation associated with regulatory high order structures of human chromosomes also known as chromosome conformation signatures. Chromosome signatures are a distinct primary step in a cascade of gene deregulation. They are high order biomarkers with a unique set of advantages against biomarker platforms that utilize late epigenetic and gene expression biomarkers, such as DNA methylation and RNA profiling.
EpiSwitch™ Array Assay
The custom EpiSwitch™ array-screening platforms come in 4 densities of, 15K, 45K, 100K, and 250K unique chromosome conformations, each chimeric fragment is repeated on the arrays 4 times, making the effective densities 60K, 180K, 400K and 1 million respectively.
Custom Designed EpiSwitch™ Arrays
The 15K EpiSwitch™ array can screen the whole genome including around 300 loci interrogated with the EpiSwitch™ Biomarker discovery technology. The EpiSwitch™ array is built on the Agilent SurePrint G3 Custom CGH microarray platform; this technology offers 4 densities, 60K, 180K, 400K and 1 million probes. The density per array is reduced to 15K, 45K, 100K and 250K as each EpiSwitch™ probe is presented as a quadruplicate, thus allowing for statistical evaluation of the reproducibility. The average number of potential EpiSwitch™ markers interrogated per genetic loci is 50, as such the numbers of loci that can be investigated are 300, 900, 2000, and 5000.
EpiSwitch™ Custom Array Pipeline
The EpiSwitch™ array is a dual colour system with one set of samples, after EpiSwitch™ library generation, labelled in Cy5 and the other of sample (controls) to be compared/analyzed labelled in Cy3. The arrays are scanned using the Agilent SureScan Scanner and the resultant features extracted using the Agilent Feature Extraction software. The data is then processed using the EpiSwitch™ array processing scripts in R. The arrays are processed using standard dual colour packages in Bioconductor in R: Limma*. The normalisation of the arrays is done using the normalisedWithinArrays function in Limma* and this is done to the on chip Agilent positive controls and EpiSwitch™ positive controls. The data is filtered based on the Agilent Flag calls, the Agilent control probes are removed and the technical replicate probes are averaged, in order for them to be analysed using Limma*. The probes are modelled based on their difference between the 2 scenarios being compared and then corrected by using False Discovery Rate. Probes with Coefficient of Variation (CV)<=30% that are <=−1.1 or =>1.1 and pass the p<=0.1 FDR p-value are used for further screening. To reduce the probe set further Multiple Factor Analysis is performed using the FactorMineR package in R.
* Note: LIMMA is Linear Models and Empirical Bayes Processes for Assessing Differential Expression in Microarray Experiments. Limma is an R package for the analysis of gene expression data arising from microarray or RNA-Seq.
The pool of probes is initially selected based on adjusted p-value, FC and CV<30% (arbitrary cut off point) parameters for final picking. Further analyses and the final list are drawn based only on the first two parameters (adj. p-value; FC).
Statistical Pipeline
EpiSwitch™ screening arrays are processed using the EpiSwitch™ Analytical Package in R in order to select high value EpiSwitch™ markers for translation on to the EpiSwitch™ PCR platform.
Step 1
Probes are selected based on their corrected p-value (False Discovery Rate, FDR), which is the product of a modified linear regression model. Probes below p-value <=0.1 are selected and then further reduced by their Epigenetic ratio (ER), probes ER have to be <=−1.1 or =>1.1 in order to be selected for further analysis. The last filter is a coefficient of variation (CV), probes have to be below <=0.3.
Step 2
The top 40 markers from the statistical lists are selected based on their ER for selection as markers for PCR translation. The top 20 markers with the highest negative ER load and the top 20 markers with the highest positive ER load form the list.
Step 3
The resultant markers from step 1, the statistically significant probes form the bases of enrichment analysis using hypergeometric enrichment (HE). This analysis enables marker reduction from the significant probe list, and along with the markers from step 2 forms the list of probes translated on to the EpiSwitch™ PCR platform.
The statistical probes are processed by HE to determine which genetic locations have an enrichment of statistically significant probes, indicating which genetic locations are hubs of epigenetic difference.
The most significant enriched loci based on a corrected p-value are selected for probe list generation. Genetic locations below p-value of 0.3 or 0.2 are selected. The statistical probes mapping to these genetic locations, with the markers from step 2, form the high value markers for EpiSwitch™ PCR translation.
Array Design and Processing
Array Design
Genetic loci are processed using the SII software (currently v3.2) to:
Array Processing
EpiSwitch™ biomarker signatures demonstrate high robustness, sensitivity and specificity in the stratification of complex disease phenotypes. This technology takes advantage of the latest breakthroughs in the science of epigenetics, monitoring and evaluation of chromosome conformation signatures as a highly informative class of epigenetic biomarkers. Current research methods deployed in academic environment require from 3 to 7 days for biochemical processing of cellular material in order to detect CCSs. Those procedures have limited sensitivity, and reproducibility; and furthermore, do not have the benefit of the targeted insight provided by the EpiSwitch™ Analytical Package at the design stage.
EpiSwitch™ Array in Silico Marker Identification
CCS sites across the genome are directly evaluated by the EpiSwitch™ Array on clinical samples from testing cohorts for identification of all relevant stratifying lead biomarkers. The EpiSwitch™ Array platform is used for marker identification due to its high-throughput capacity, and its ability to screen large numbers of loci rapidly. The array used was the Agilent custom-CGH array, which allows markers identified through the in silico software to be interrogated.
EpiSwitch™ PCR
Potential markers identified by EpiSwitch™ Array are then validated either by EpiSwitch™ PCR or DNA sequencers (i.e. Roche 454, Nanopore MinION, etc.). The top PCR markers which are statistically significant and display the best reproducibility are selected for further reduction into the final EpiSwitch™ Signature Set, and validated on an independent cohort of samples. EpiSwitch™ PCR can be performed by a trained technician following a standardised operating procedure protocol established. All protocols and manufacture of reagents are performed under ISO 13485 and 9001 accreditation to ensure the quality of the work and the ability to transfer the protocols. EpiSwitch™ PCR and EpiSwitch™ Array biomarker platforms are compatible with analysis of both whole blood and cell lines. The tests are sensitive enough to detect abnormalities in very low copy numbers using small volumes of blood.
Use of a Classifier
The method of the invention may include analysis of the chromosome interactions identified in the individual, for example using a classifier, which may increase performance, such as sensitivity or specificity. The classifier is typically one that has been ‘trained’ on samples from the population and such training may assist the classifier to detect any responder group mentioned herein.
The invention is illustrated by the following:
In working on populations of patients undergoing immunotherapy for cancer two distinct marker sets were developed: one is a universal marker set that allows responsiveness to therapy to be detected across a range of cancers and specific therapies, and the second is a marker set that detects hyper-progressors who should never be treated with particular types of immunotherapy.
We have now defined a specialised distinct and optimised panel of 11 biomarkers (the universal set), from a few hundreds identified in an original array screen and later tested on specific cohorts of patients. The unique feature of each of these 11 markers is that each of them is statistically significant (as part of the discovered core) across all PD-1/PD-L1 cases in all tested oncological indications, defining a universal core of response/non-response to treatment by PD-1/PD-L1. A classifier using these 11 markers works very robustly as a distinct performance entity across all tested patient cohorts.
As background to the present work,
In contrast, for the universal 11 marker set the list of all treatments by various therapeutic assets PD-1/PD-L1 and various oncological indications we have worked with is shown in List A below:—PD-1 and PD-L1 assets, such as pembrolizumab, durvalumab, avelumab, atezolizumab, in melanoma, NSCLC, Lung, HCC, Bladder, Prostate, NPC, Parotid Gland, Alveolar Soft Part Sarcoma. The universal 11 marker classifier works well across all those cohorts and identifies universal profile that delivers robust baseline classification for response/non-response, irrespective of which exactly PD-1 or PD-L1 treatment and what exactly type of cancer was tested. We capture with 11 markers a very specific conducive/non-conducive epigenetic systemic network set up of features which define outcomes in immune-checkpoint therapies.
Turning to the second set of markers, a very serious issue in cancer immunotherapy is the consistent presence of a subgroup of patients who should never be treated with PD-1/PD-L1 therapies. They are called hyper-progressors (or super-progressors), where progressor means progression into disease. These patients upon treatment react very differently—their rate of tumour growth shoots up and they die very quickly, essentially in a matter of weeks.
Hyper-progressors can be defined as patients who respond adversely to immuno-checkpoint immuno-oncology treatment by demonstrating significant reduction in either progression-free survival (as a measure of survival in response to drug treatment) (PFS<60 days) or overall survival (OS<150 days).
Average trials demonstrate between 8-15% of their patients as showing a super-progressor profile. Currently, there are no means to identify and exclude these patients to prevent serious adverse effect of the immunotherapy. In most studies hyper-progressors are categorised with the bigger group of non-responders, also termed as progressors/progressive disease (PD). Most of non-responders are patients who do not benefit from immunotherapy. Today, the use of checkpoint inhibitors is justified by the overall benefits among the percentage of patients who respond to immunotherapy (10-70%).
Here we utilized patients from immunotherapy cohorts, focusing on those with a PFS/OS within the range of hyper-progressors and those beyond those survival time limits (marked S as “Standard” in the slides and tables). The hyper-progressor markers identify patient profile that is predicted to demonstrate short PFS/OS upon the treatment. On a group of 32 patients (equal arms of H and S) tested before treatment, when compared to PFS and OS after the treatment 11 super-progressor markers predicted correctly 15 out of 17H (sensitivity 0.88), 14/15 (specificity 0.93), 15/16 (PPV 0.94), 14/16 (0.875).
These markers are specifically selected to identify and exclude patients prior to treatment on the basis of a predicted severe reduction in their survival as a consequence of immunotherapy. This could be seen as a subgroup of the bigger cohort of non-responders that could be identified and predicted by the universal 11 marker set.
Whilst the present work has been carried out on patients with melanoma, lung cancer, hepatocellular carcinoma (liver cancer), bladder, prostate, nasal cancer, parotid gland (salivary gland cancer), alveolar soft part sarcoma (soft tissue cancer), it also applies to other cancers where immune-checkpoint inhibitors PD-1/PD-L1 are used for therapy, such as breast cancer, cervical cancer, colon cancer, head and neck cancer, Hodgkin lymphoma, kidney cancer, stomach cancer, rectal cancer, and any solid tumour.
Mechanism of Detection
The marker sets that have been identified capture a network of deregulations at the level of cellular 3D genomics, which reflects a network of deregulated cell types acting in conjunction to sustain and advance the pathological or physiological phenotype of cancer. So far, observing statistically significant chromosome conformations as evidence of deregulation in conjunction with cell sub-typing CD loci, we can state that observed universal signatures contain and represent deregulations in T cells, NK (natural killer) cells, macrophages, B cells and dendritic cells (DC). This emphasizes the role played by the specific set up at cellular level of the adaptive and innate immune system in individual patients as part of the cancer-host interaction, which defines disease progression (hyper-progressors) and responsiveness to immuno-checkpoint inhibitors PD-1/PD-L1.
Methods
Initial studies of chromosome interactions were carried out on the following populations (work described in
List A—Initial Work
Observational Longitudinal:
The marker sets were developed using the following patients:
Training 80 patients all NSCLC
Test: 38 Patients all NSCLC
Test: 20 Samples Mixture
For testing the universal markers a blind cohort was collected to test the feature of Non-Response. This cohort was collected in Malaysia and consisted of 21 patients who all provided blood samples at baseline prior to immunotherapy. All patients had previous lines of therapy. 3 check-points were used: Atezolizumab (anti-PD-L1), Durvalumab (anti-PD-L1) and Pembrolizumab (anti-PD-1). There were 7 disease indications: Lung, HCC, Bladder, Prostate, NPC, Parotid Gland and Alveolar soft part sarcoma. 11 of the patients had multiple collections: 2-4. 3 patients had up to 4 collections. The ethnicity of the Patients was either Han Chinese or Indonesian.
Patient 12 on Durvalumab shows the profile of Responder over 4 collections, and a score over the second collection entering into grey zone R/NR. Overall profile of probabilities actually get stronger over time for Responder.
Patient 1 on Atezolizumab, shows an interesting initial non-response, but becomes a late responder.
Patient 17 on Pembrolizumab is an NPC patient, and shows Response profile over both samplings.
This emphasizes the ability of EpiSwitch™ Markers to find Non-Responders and Responders to capture common features of host response profile for multi check-point inhibitors under diverse oncological conditions.
For the work relating to hyper-progressors,
Approach to Analysing the Patients
A specific approach was taken to the way the patient population was analysed. The relevant patient cohorts had either been given prior therapy (one round of cisplatin-based chemotherapy) or had been treated with check-point inhibitors only. We also only looked at patients with a defined response, so either complete response, partial response or no response. We removed from the analysis patients who experienced stable disease.
The EpiSwitch™ nested PCR platform data output was analysed with multiple statistical techniques, including, but not limited to, established univariate (Fishers Exact test) and multivariant (permutated GLMNET, Random Forest with SHapley Additive exPlanations values (SHAP)), procedures.
For development of the diagnostic and prognostic EpiSwitch™ classifiers, the following statistical analysis were used: (i) XGBoost: A gradient boosted decision tree algorithm. An ensemble of weak decision tree models is generated and combined to produce one strong classification model (level wise tree growth); (ii) Logistic Principal Component Analysis (PCA): Principal component Analysis optimized to use binary data; and (iii) GLMNET: Generalized linear model fitted via penalized maximum likelihood technique (iv) LightGBM: gradient boosting framework that uses tree based learning algorithms (vertical leaf tree growth).
A SHAP analysis is shown below for universal marker set.
Markers ranked by their SHAP scores, with the best marker being OBD117_029_0.31. The SHAP (SHapley Additive exPlanations) value is a united approach to explaining the output of any machine learning model. There are three important benefits.
The first one is global interpretability—the collective SHAP values can show how much each predictor contributes, either positively or negatively, to the target variable. This is like the variable importance plot but it is able to show the positive or negative relationship for each variable with the target.
The second benefit is local interpretability—each observation gets its own set of SHAP values This greatly increases its transparency. We can explain why a case receives its prediction and the contributions of the predictors. Traditional variable importance algorithms only show the results across the entire population but not on each individual case. The local interpretability enables us to pinpoint and contrast the impacts of the factors.
Third, the SHAP values can be calculated for any tree-based model (our model is an XGboost, boosted tree based model), while other methods use linear regression or logistic regression models as the surrogate models.
This represents the analytical pipeline for marker selection. The high performance of the two markers sets is shown in the figures and tables. In particular for the universal marker set all 7 types of cancer shown in
Immune checkpoint inhibitors are a class of drugs targeting a narrow set of proteins in a specific regulatory network present in immune cells, like T cells, and some cancer cells of the patients. The checkpoint protein targets and the network they control help keep immune responses from being too strong and provide additional protection from the autoimmune conditions, but in the case of cancer can keep T cells from killing cancer cells. Use of immune checkpoint inhibitors helps to reactivate the immune response in cancer patients, with efficacious outcome and improved survival in patients.
Immune checkpoint inhibitors act by resetting and activating immune response by targeting either: 1) PD-1 (Nivolumab, Pembrolizumab, Cemiplimab, Camrelizumab, Tislelizumab, Sasanlimab); or 2) PD-1 ligand, called PD-L1 (Avelumab, Atezolizumab and Durvalumab).
The present work has asked several questions, including if, taking into account the role played by immune system of the patient in successful response to immune checkpoint inhibitors and the limited number of targets for the therapy, particularly PD-1 receptor and its ligand PD-L1, one can discover and validate EpiSwitch biomarkers in a qPCR format of detection for baseline patients that would universally predict response/non-response to treatment in advance, irrespective of the type of the checkpoint inhibitor used and across the spectrum on oncological conditions.
MIQE compliant qPCR format is the standard for clinical PCR based tests. This format is very different from nested PCR or array format, due to its limitations on primer and probes sequence designs and continuous range of detection, traditionally measured though Cq cycle numbers.
The following steps were undertaken as part of discovery and validation of these biomarkers (table 9 shows patient data):
The selection of best markers at stage 2 and 3 was carried out using a linear model. The linear model is fitted to the Cq values for each marker, comparing PR v PD, PR v SD and PD v SD. The coefficients of the fitted models describe the differences between the CCSs in each of the comparisons. The linear models are then used to compute moderated t-statistics log-odds of differential CCSs by empirical Bayes moderation of the standard errors towards a global value (0 log or 1 linear). The markers are then ranked by the adjusted p value and their CCSs abundance difference between the groups. Markers between PR v PD are given more weight.
The immunotherapy checkpoint classifier is built using CatBoost. Catboost is a member of the Gradient Boosted Decision Trees (GBDT's) machine learning ensemble techniques (see Hancock and Khoshgoftaar; J. Big Data (2020) 7:94).
Types of Format for the Test
Both nested and qPCR formats impose stringent limitations on which of the array-based marker leads could be successfully translated into one or the other format. This is particularly true in case of qPCR format where we have to determine if we can use two primers and a fluorescent probe over the 3C juncture (this one is similar to the array probe), which 1) have unique sequence across the whole genome for specific detection, 2) have very similar annealing temperatures, 3) show how efficacy in amplification in single PCR procedure, and not in two sequential reactions as required by nested PCR (first with one pair of primers, second one with another pair of primers). Requirements for qPCR are much more stringent and selective.
Based on qPCR evaluation of 8 conditional chromatin-conformations as blood-based regulatory biomarkers, individuals could be evaluated for likelihood of response to Immune Checkpoint Inhibitor (ICI) monotherapies. In the cross talk between patient tumour microenvironment and patient immune system, the PD-1 pathway, comprising receptor Programmed Death 1 (Pd-1) and its ligand PD-L1, mediates local immunosuppression in the tumour microenvironment. The present work directly relates to ICI that were antagonists targeting PD-1 (Pembrolizumab) or its ligand, PD-L1 (Atezolizumab, Avelumab & Durvalumab). Stratification based on 8 marker classifier places patients into groups of likely responders or non-responders to ICI monotherapy, prior to the application of the therapy. The classification is applied across all ICI monotherapies against PD-L1 and its ligand PD-L1, in the context of all oncological indications used in ICI monotherapy treatments.
This application is a 371 National Stage filing and claims the benefit under 35 U.S.C. § 120 of International Application No. PCT/GB2022/050561, filed 3 Mar. 2022, which claims priority to U.S. Provisional Application No. 63/156,659, filed 4 Mar. 2021 and U.S. Provisional Application No. 63/282,284 filed 23 Nov. 2021, each of which is incorporated herein by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/GB2022/050561 | 3/3/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63156659 | Mar 2021 | US | |
63282284 | Nov 2021 | US |