This Invention relates to biomarkers useful for the diagnosis and/or prognosis of endometriosis. The biomarkers ae also useful for monitoring treatment of the disease and can also be used as targets for therapeutic intervention.
Endometriosis Is a common gynaecological condition in which uterine or uterine-Ike endometrial and/or stromal cells are found exterior to the uterus in various sites and tissues of the body. These sites are numerous and can vary from pelvic infiltrations (superficial or deeply infiltrating) to ovarian cysts (also known as endometriomas), vaginal, vulval, bowel and bladder infiltrations, skin and scar lesions and lesions in the lymphatic system, brain and lungs (though these ae rarer). It can potentially be found in any site of the body and presents with non-specific signs and symptoms. Symptoms of endometriosis vary widely and can range from pain, e.g. pelvic pain, dysmenorrhea, dyspareunia, dyschesia, dysuria, dyschesia, fatigue and infertility causing numerous physical and psychosocial co-morbidities and severely limiting the qualify of patient's life [1]. Endometriosis is a chronic and possibly hormonally responsive condition which recurs, potentially leaving scars and fibrosis in the tissue it affects.
Endometriosis is found almost exclusively in women, generally in their reproductive years. The average age at diagnosis is 25-29 years, but there is an estimated delay in diagnosis of disease quoted at around 6.7 years in one multicentre trial [2]. US studies report an average delay in diagnosis of 11.7 years and UK studies estimate 8 years of diagnostic delay [3]. The disease in itself affects around 5%-15% of women in their reproductive age varying in prevalence between populations. It is estimated to affect over 70 million women worldwide 14). Diagnostic incidence rates have been quoted in 28.4% (95% Cl: 20-32.7%) of African to 54.5% (95% Cl: 44.2-64.7%) in South American countries [5]. Endometriosis is prevalent in 0.5-5% of fertile women and 25-40% in infertile women making it a leading cause of infertility [6,7]. In one Norwegian study the lifetime prevalence was documented at 2.2% 181.
There are no defined causative factors but risk factors associated with disease vary, and suggestions include age (rare before and after menarche and menopause respectively), social class and race [9](Increased incidence in higher social classes). In utero exposure to multiple pregnancies and diethylstilbestrol [10], infertility [11], low parity [11], oral contraceptive use and age of commencement or stopping of oral contraceptive pill [12], family history [13,14], smoking [15], diet [7], exercise [16], body mass index [17], dioxin [18], a history of immune disorders [19], association with non-Hodgkin's lymphoma [20], association with pigmentary traits [21], links with melanoma [22] and association with ovarian (serous, mucinous, endometroid, clear cell, other subtypes) and endometrial cancer have at been published. Most studies describing links to diet, exercise and familial inheritance show conflicting evidence and are by no means definitive.
Theories of causation vary with literature supporting or refuting each respectively. Theories include those of: in situ development, mullerianosis [23], accentuation by genital tract anomalies [24], genetic predisposition [25], coelomic metaplasia [26], induction [27], transplantation [28], retrograde menstruation [29], endometrial stem cells [30], physiological phenomenon [31], alterations in the endometrium [32], exogenous endometrial hormonal production [33], angiogenesis [34], the evasion of endometrial tissue om the immune system [35], cellular protection from apoptosis [36,37], the potential of endometrium to implant [38] and invade [39], and differences in peritoneal fluid [40].
The diagnosis of endometriosis is a combination of physician vigilance and knowledge, patient history, clinical symptoms/signs at examination and use of radiology (Including trans-vaginal and trans-abdominal ultrasound to assess for endometriomas or macroscopic lesions). MRI and CT scanning are used for deep lesions and the most utilised serum marker is the non-specific CA125 which has poor correlation with presence or severity of disease. Biomarkers reported in literature range from the use of cytokines, non-cytokines, serum and endometrial biomarkers, the presence of nerve fibres within tissues [41], gene aberrations and mRNAs. However, none of them is currently used and validated in the clinical setting.
There is the postulated use of proteomic analysis of endometrium and patient blood using 2-DIGE and SELDI-TOF MS technology [42, 43, 44, 45]. No biomarkers have so far been identified from this. Peripheral blood studies using SELDI TOF MS report proteins altered in endometriosis [46, 47, 48, 49, 50], but none of them provides sufficient sensitivity and specificity for use in clinical settings.
The gold standard for diagnosis and treatment of endometrial peritoneal lesions is invasive surgical laparoscopy enabling the visualisation of lesions. Endometriosis is removed by diathermy or/and peritoneal and/or cyst stripping. This has substantial patient morbidly and possibly mortality.
An early non-invasive test would enable easier diagnosis and treatment, prevent chronic effects of disease that include pain, scarring, psychological trauma and infertility; reduce surgical patient morbidity and mortality and enable the monitoring of the response of disease to therapeutics and management.
There is a need for new or improved in vitro tests with good specificity and sensitivity to enable non-invasive diagnosis of endometriosis. It is an object of the invention to provide further and improved biomarkers for the diagnosis of endometriosis e.g. using in vitro detection techniques.
The invention is based on the identification of correlations between endometriosis and the increased or decreased levels of certain proteins and small non-coding miRNAs.
The inventors have identified miRNAs for which the expression profiles can be used to indicate that a subject has endometriosis. These miRNAs are present at significantly different levels in subjects with endometriosis and without endometriosis. Detection of the presence or absence of these miRNAs. and/or of changes in their levels over time can thus be used to indicate that a subject has endometriosis, or has the potential to develop endometriosis in the future. These miRNAs function as biomarkers of endometriosis.
The Inventors have also identified antigens for which the level of auto-antibodies can be used to indicate that a subject has endometriosis. Auto-antibodies against these antigens are present at significantly different levels in subjects with endometriosis and without endometriosis. Detection of the presence or absence of these auto-antibodies, and/or of changes in their levels over time, can thus be used to indicate that a subject has endometriosis. The auto-antibodies and their antigens also function as biomarkers of endometriosis.
Detection of these biomarkers in a subject sample can be used to improve the diagnosis, prognosis and monitoring of endometriosis. Advantageously, the invention can be used to distinguish between endometriosis and other forms of intra-abdominal inflammation.
The inventors have identified 343 such biomarkers (Table 1) and the invention uses at least one of these to assist in the diagnosis of endometriosis by measuring level(s) of the biomarker(s). The biomarker can be a protein or a mRNA. A protein biomarker can be (i) auto-antibody which binds to an antigen in Table 1 and/or (ii) an antigen in Table 1, but is preferably the former.
Thus the invention provides a method for analysing a subject sample, comprising a step of determining the level of a Table 1 biomarker in the sample, wherein the level of the biomarker provides a diagnostic indicator of whether the subject has endometriosis.
Analysis of a single Table 1 biomarker can be performed, and detection of the auto-antibody/antigen or mRNA can provide a useful diagnostic indicator for endometriosis even without considering any of the other Table 1 biomarkers.
The sensitivity and specificity of diagnosis can be improved, however, by combining data for multiple biomarkers. It is thus preferred to analyse more than one Table 1 biomarker. Analysis of two or more different biomarkers (a “pane”) can enhance the sensitivity and/or specificity of diagnosis compared to analysis of a single biomarker. Each different biomarker in a panel is shown in a different row in Table 1, e.g. measuring both auto-antibody which binds to an antigen listed in Table 1 and the antigen itself is measurement of a single biomarker rather than of a panel.
The inventors found that the combination of the information from protein and mRNA biomarkers provides an enhancement in diagnostic utility, as measured by sensitivity, specificity and/or area under the Receiver Operating Characteristic (ROC) curve. Thus, the invention preferably uses at least one protein biomarker from Table 1 and at least one mRNA biomarker from Table 1 to assist in the diagnosis of endometriosis.
Thus the invention provides a method for analysing a subject sample, comprising a step of determining the levels of x different biomarkers of Table 1, wherein the levels of the biomarkers provide a diagnostic indicator of whether the subject has endometriosis. The value of x is 2 or more e.g. 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more (e.g. up to 343). These panels may include (I) any specific one of the 343 biomarkers in Table 1 in combination with (i) any of the other 342 biomarkers in Table 1. Preferably the panel includes at least one protein biomarker (e.g. auto-antibodies or an antigen) from Table 1 and at least one miRNA biomarker from Table 1. Preferably the panel includes protein biomarkers only (e.g. auto-antibodies or antigens) from Table 1. Preferably the panel includes mRNA biomarkers only from Table 1.
Suitable panels are described below and panels of particular interest include those listed in Tables 11 to 15. Preferred panels have from 2 to 6 biomarkers, as using >6 of them adds little to sensitivity and specificity.
The Table 1 biomarkers can be used in combination with one or more of: (a) known biomarkers for endometriosis, which may be auto-antibodies, antigens or miRNAs; and/or (b) other information about the subject from whom a sample was taken e.g. age, genotype (genetic variations can affect auto-antibody profiles [51] and considerable progress on the elucidation of the genetics of endometriosis has been made [52]), weight, other clinically-relevant data or phenotypic information; and/or (c) other diagnostic tests or clinical indicators for endometriosis. Such combinations can enhance the sensitivity and/or specificity of diagnosis. Known endometriosis biomarkers of particular interest include, but are not limited to, auto-antibodies against CA125, CA19-9 and/or any of the antigens listed in Table 5.
Thus the invention provides a method for analysing a subject sample, comprising a step of determining:
The samples used in (a) and (b) may be the same or different.
The value of y is 1 or more e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 (e.g. up to 343). When y>1 the invention uses a panel of different Table 1 biomarkers. When y>1, preferably the panel includes at least one protein biomarker (e.g. auto-antibodies or an antigen) from Table 1 and at least one mRNA biomarker from Table 1.
The invention also provides, in a method for diagnosing if a subject has endometriosis, an improvement consisting of determining in a sample from the subject the level(s) of y biomarker(s) of Table 1, wherein the level(s) of the biomarker(s) provide a diagnostic indicator of whether the subject has endometriosis. The biomarker(s) of Table 1 can be used in combination with known endometriosis biomarkers, as discussed above.
The invention also provides a method for diagnosing a subject as having endometriosis, comprising steps at: (i) determining the levels of y biomarkers of Table 1 in a sample from the subject; and (ii) comparing the determination from step (i) to data obtained from samples from subjects without endometriosis and/or from subjects with endometriosis, wherein the comparison provides a diagnostic indicator of whether the subject has endometriosis. The comparison in step (ii) can use a classifier algorithm as discussed in more detail below. The biomarkers measured in step (i) can be used in combination with known endometriosis biomarkers, as discussed above.
The invention also provides a method for monitoring development of endometriosis in a subject, comprising steps of: (i) determining the levels of z1 biomarker(s) of Table 1 in a first sample from the subject taken at a first time; and (ii) determining the levels of z2 biomarker(s) of Table 1 in a second sample from the subject taken at a second time, wherein: (a) the second time is later than the first time; (b) one or more of the z biomarker(s) were present in the first sample; and (c) a change in the level(s) of the biomarker(s) in the second sample compared with the first sample indicates that endometriosis is in remission or is progressing. Thus the method monitors the biomarker(s) over time, with changing levels indicating whether the disease is getting better or worse.
The disease development can be either an improvement or a worsening of the disease, and this method may be used in various ways e.g. to monitor the natural progress of a disease, or to monitor the efficacy of a therapy being administered to the subject. Thus a subject may receive a therapeutic agent or may receive surgery for removing endometriosis before the first time, at the first time, or between the first time and the second time. Increased levels of antibodies against a particular auto-antigen may be due to “epitope spreading”, in which additional antibodies or antibody classes are raised to antigens against which an antibody response has already been mounted [53].
The value of z2 is 1 or more e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 (e.g. up to 343). The value of z2 is 1 or more e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 (e.g. up to 343). The values of z1 and z2 may be the same or different. If they am different, it is usual that z1>z2 as the later analysis (z2) can focus on biomarkers which were already detected in the earlier analysis; in other embodiments, however, z2 can be larger than z1, e.g. If previous data have indicated that an expanded panel should be used; in other embodiments z2=z1, e.g. so that, for convenience, the same panel can be used for both analyses. When z1>1 or z2>1, the biomarkers am different biomarkers. The z1 and/or z2 biomarker(s) can be used in combination with known endometriosis biomarkers, as discussed above.
The invention also provides a method for monitoring development of endometriosis in a subject, comprising steps of (i) determining the level of at least w1 Table 1 biomarkers in a fir sample taken at a first time from the subject; and (I) determining the level of at least w2 Table 1 biomarkers in a second sample taken at a second time from the subject, wherein: (a) the second time is later than the first time; (b) at least one biomarker is common to both the w1 and w2 biomarkers; (c) the level of at least one biomarker common to both the w1 and w2 biomarkers is different in the first and second samples, thereby indicating that the endometriosis is progressing or regressing. Thus the method monitors the range of biomarkers over time, with a broadening in the number of detected biomarkers indicating that the disease is getting worse. As mentioned above, this method may be used to monitor disease development in various ways.
The value of w1 is 1 or more e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 (e.g. up to 343). The value of w2 is 2 or more e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 (e.g. up to 343). The values of w1 and w2 may be the same or different. If they ae different, it is usual that w2≥w1, as the later analysis should focus on a biomarker panel that is at least as wide as the number already detected in the earlier analysts. There will usually be an overlap between the w1 and w2 biomarkers (including situations where they are the same, such that the same biomarkers are measured at two time points) but N is also possible for w1 and w2 to have no biomarkers in common. The w1 and/or w2 biomarker(s) can be used in combination with known endometriosis biomarkers, as discussed above.
Where the methods involve a first time and a second time, these times may differ by at least 1 day, 1 week, 1 month or 1 year. Samples may be taken regularly. The methods may involve measuring biomarkers in more than 2 samples taken at more than 2 time points i.e. there may be a 3rd sample, a 4th sample, a 5th sample, etc.
The Invention also provides a diagnostic device for use in diagnosis of endometriosis, wherein the device permits determination of the level(s) of y Table 1 biomarkers. The value of y is defined above. The device may also permit determination of whether a sample contains one or more of the known endometriosis biomarkers mentioned above.
The invention also provides a kit comprising (i) a diagnostic device of the invention and (ii) instructions for using the device to detect y of the Table 1 biomarkers. The value of y is defined above. The kit is useful in the diagnoses of endometriosis.
The invention also provides a kit comprising reagents for measuring the levels of x different Table 1 biomarkers. The kit may also include reagents for determining whether a sample contains one or more of the known endometriosis biomarkers mentioned above. The value of x is defined above. The kit is useful in the diagnosis of endometriosis.
The Invention also provides a kit comprising components for preparing a diagnostic device of the invention. For instance, the kit may comprise individual detection reagents for x different biomarkers, such that an array of those x biomarkers can be prepared.
The invention also provides a product comprising (i) one or more detection reagents which permit measurement of x different Table 1 biomarkers, and (ii) a sample from a subject.
The Invention also provides a software product comprising (i) code that accesses data attributed to a sample, the data comprising measurement of y Table 1 biomarkers, and (ii) code that executes an algorithm for assessing the data to represent a level of y of the biomarkers in the sample. The software product may also comprise (iii) code that executes an algorithm for assessing the result of step (ii) to provide a diagnostic indicator of whether the subject has endometriosis. As discussed below, suitable algorithms for use in part (iii) include support vector machine algorithms, artificial neural networks, tree-based methods, genetic programming, etc. The algorithm can preferably classify the data of part (ii) to distinguish between subjects with endometriosis and subjects without based on measured biomarker levels in samples taken from such subjects. The invention also provides methods for training such algorithms. The y biomarker(s) can be used in combination with known endometriosis biomarkers, as discussed above.
The invention also provides a computer which is loaded with and/or is running a software product of the invention.
The invention also extends to methods for communicating the results of a method of the invention. This method may involve communicating assay results and/or diagnostic results. Such communication may be to, for example, technicians, physicians or patients. In some embodiments, detection methods of the invention will be performed in one country and the results will be communicated to a recipient in a different country.
The Invention also provides an isolated antibody (preferably a human antibody) which recognises one of the antigens listed in Table 1. The invention also provides an isolated nucleic acid encoding the heavy and/or light chain of the antibody. The invention also provides a vector comprising this nucleic acid, and a host cell comprising this vector. The invention also provides a method for expressing the antibody comprising culturing the host ell under conditions which permit production of the antibody. The invention also provides derivatives of the human antibody e.g. F(ab′)2 and F(ab) fragments, Fv fragments, single-chain antibodies such as single chain Fv molecules (scFv), minibodies, dAbs, etc.
The invention also provides the use of a Table 1 biomarker as a biomarker for endometriosis.
The invention also provides the use of x different Table 1 biomarkers as biomarkers for endometriosis. The value of x is defined above. These may include (i) any specific one of the 343 biomarkers in Table 1 in combination with (ii) any of the other 342 biomarkers in Table 1. Preferably the invention uses at least one protein biomarker (e.g. auto-antibodies or an antigen) from Table 1 and at least one mRNA biomarker from Table 1. Preferably the panel includes protein biomarkers only (e.g. auto-antibodies or antigens) from Table 1. Preferably the panel includes mRNA biomarkers only from Table 1.
The invention also provides the use as combined biomarkers for endometriosis of (a) at least y Table 1 biomarker(s) and (b) biomarkers including auto-antibodies against CA125, CA19-9 and/or any of the antigens from Table 5 (and optionally, any other known biomarkers e.g. see above). The value of y is defined above. When y>1 the invention uses a panel of biomarkers of the invention. When y>1, preferably the panel includes at least one protein biomarker (e.g. auto-antibodies or an antigen) from Table 1 and at least one miRNA biomarker from Table 1. Such combinations include those discussed above. Panels of particular interest include those listed in Tables 10 to 15.
Biomarkers of the Invention
Antigens
The inventors identified auto-antibodies against 121 different human antigens (as listed in Table 1) and these can be used as endometriosis biomarkers. Further details of the 121 antigens are given in Table 2. Within the 121 antigens, the human antigens mentioned in Tables 6 and 7 are particularly useful for distinguishing between samples from subjects with endometriosis and from subjects without endometriosis. Further auto-antibody biomarkers can be used in addition to these 121 (e.g. any of the biomarkers listed in Table 5).
The sequence listing provides an example of a natural coding sequence for these antigens. These specific coding sequences are not limiting on the invention, however, and auto-antibody biomarkers may recognise variants of polypeptides encoded by these natural sequences (e.g. allelic variants, polymorphic forms, mutants, splice variants, or gene fusions), provided that the variant has an epitope recognised by the auto-antibody. Details on allelic variants of or mutations in human genes are available from various sources, such as the ALFRED database [54] or, in relation to disease associations, the OMIM [55] and HGMD [56] databases. Details of splice variants of human genes are available from various sources, such as ASD [57].
miRNAs
The inventors identified 222 individual human miRNAs (as listed in Table 1), and these can be used as endometriosis biomarkers. Further details of these mRNAs are given in Table 3 and Table 4. Within the 222 miRNAs, the miRNAs mentioned in Tables 8, 9 and 16-21 am particularly useful for distinguishing between samples from subjects with endometriosis and from subjects without endometriosis.
Preferably, the invention uses any of the mRNAs mentioned in Tables 9 and 16-18.
Preferably, the invention uses any of the miRNAs listed in the group consisting of: ebv-miR-BART2-5p, hsa-let-7f, hsa-let-7g, hsa-miR-1260, hsa-miR142-3p, hsa-miR-197, hsa-miR-215, hsa-miR-223, hsa-miR-30b, hsa-miR-320c, hsa-miR34a, hsa-miR-497, hsa-miR-630, hsa-miR-663 and hsa-miR-720.
The specific sequences in Tables 3 and 4 are not limiting on the invention. The invention includes detecting and measuring the levels of polymorphic variants of these mRNAs. A database outlining in more detail the miRNAs listed herein is available: MiRBase [58, 59, 60, 61] or, in relation to target prediction, the DIANA-microT [62, 63], microRNA.org 1641, miRDB [65, 66], TargetScan [87] and PicTar [88] databases.
As mentioned above, detection of a single Table 1 biomarker can provide useful diagnostic information, but each biomarker might not individually provide information which is useful i.e. auto-antibodies against a Table 1 antigen may be present in some, but not all, subjects with endometriosis. An inability of a single biomarker to provide universal diagnostic results for all subjects does not mean that this biomarker has no diagnostic utility; rather, any such inability means that the test results (as in all diagnostic tests) have to be properly understood and interpreted.
To address the possibilty that a single biomarker might not provide universal diagnostic results, and to increase the overall confidence that an assay is giving sensitive and specific results across a disease population, it is advantageous to analyse a plurality of the Table 1 biomarkers (i.e. a panel). For instance, a negative signal for a particular Table 1 antigen is not necessarily indicative of the absence of endometrioses, confidence that a subject does not have endometriosis increases as the number of negative results increases. For example, if al 343 biomarkers are tested and are negative then the result provides a higher degree of confidence than if only 1 biomarker is tested and is negative. Thus biomarker panels are most useful for enhancing the distinction seen between diseased and non-diseased samples.
As mentioned above, though, preferred panels have from 2 to 6 biomarkers as the burden of measuring a higher number of markers is usually not rewarded by better sensitivity or specificity.
The inventors found that the combination of the information from protein and miRNA biomarkers improves diagnostic power, as measured by sensitivity, specificity and/or area under the Receiver Operating Characteristic (ROC) curve. Thus, the invention preferably uses at least one protein biomarker from Table 1 and at least one miRNA biomarker from Table 1 to assist in the diagnosis of endometriosis.
Preferred panels are given below, and are listed in Tables 11-15.
Some biomarkers of the invention have different relative differential expression profiles in endometriosis samples compared to a control. Pairs of these biomarkers, where one is up-regulated and the other is down-regulated relative to the same control sample, may provide a useful way of diagnosing or predicting endometriosis. For example, the inventors found that ebv-miR-BART2-5p is up-regulated in endometriosis samples vs. non-endometriosis samples (i.e. a negative control) and hsa-miR-564 is down-regulated in endometriosis samples vs. non-endometriosis samples (i.e. a negative control), so this pair would be useful. This divergent behaviour can enhance diagnosis or prediction of endometriosis when the pair of the biomarkers is assessed in the same sample.
The Subject
The invention is used, ether alone or in combination with other measurements or data concerning the subject, for diagnosing disease in a subject.
The subject may be pre-symptomatic for endometriosis or may already be displaying clinical symptoms, e.g. pelvic pain, dysmenorrhea, dyspareunia, dyschesia, fatigue and infertility. For pre-symptomatic subjects the invention is useful for predicting that symptoms may develop in the future if no preventative action is taken. For subjects already displaying clinical symptoms, the invention may be used to confirm or resolve another diagnosis. The subject may already have begun treatment for endometriosis.
In some embodiments the subject may already be known to be predisposed to development of endometriosis e.g. due to family or genetic links. In other embodiments, the subject may have no such predisposition, and may develop the disease as a result of environmental factors e.g. as a result of exposure to particular chemicals (such as toxins or pharmaceuticals), as a result of diet, as a result of infection, etc.
The subject will typically be a human being. In some embodiments, however, the invention is useful in non-human mammals, e.g. mouse, rat, rabbit, guinea pig, cat, dog, horse, pig, cow, or non-human primate (monkeys or apes, such as macaques or chimpanzees). In non-human embodiments, any detection antigens used with the invention will typically be based on the relevant non-human ortholog of the human auto-antigens disclosed herein. In some embodiments animals can be used experimentally to monitor the impact of a therapeutic on a particular biomarker.
The Sample
The Invention analyses samples from subjects. Many types of sample can include auto-antibodies and/or antigens or miRNAs suitable for detection by the invention. The sample can be a tissue sample, e.g. from regions of uterine or vaginal tissues. Alternatively, the sample can be or a body fluid sample, e.g. cervical discharge or peritoneal fluid.
In some embodiments, a method of the invention involves a step of obtaining the sample from the subject. In other embodiments, however, the sample is obtained separately from and prior to performing a method of the invention.
Detection of the biomarkers of the invention may be performed directly on a sample taken from a subject, or the sample may be treated between being taken from a subject and being analysed. For example, a blood sample may be treated to remove cells, leaving antibody-containing plasma for analysis, or to remove cells and various clotting factors, leaving antibody-containing serum for analysis. A bodily fluid sample, e.g. a blood sample, can be treated to extract circulating endometrial cells for analysis. Various types of sample may be subjected to treatments such as dilution, aliquoting, sub-sampling, heating, freezing, irradiation, eta between being taken from the body and being analysed. Addition of processing reagents is also typical for various sample types e.g. addition of anticoagulants to blood samples.
Biomarker Detection
Auto-Antibody Detection
The invention involves determining in a sample the level of auto-antibodies that bind to the antigens listed in Table 1. Immunochemical techniques for detecting antibodies against specific antigens are well known in the art, as are techniques for detecting specific antigens themselves. Detection of an antibody will typically involve contacting a sample with a detection antigen, wherein a binding reaction between the sample and the detection antigen indicates the presence of the antibody of interest. Detection of an antigen will typically involve contacting a sample with a detection antibody, wherein a binding reaction between the sample and the detection antibody indicates the presence of the antigen of interest. Detection of an antigen can also be determined by non-immunological methods, depending on the nature of the antigen e.g. If the antigen is an enzyme then its enzymatic activity can be assayed, or If the antigen is a receptor then its binding activity can be assayed, etc. For example, the CLK1 kinase can be assayed using methods known in the art.
A detection antigen can be a natural antigen recognised by the auto-antibody (e.g. a mature human protein disclosed in Table 1), or R may be an antigen comprising an epitope which is recognized by the auto-antibody. Where a detection antigen is a polypeptide, its amino acid sequence can vary from the natural sequences disclosed above, provided that it has the ability to specifically bind to an auto-antibody of the invention (i.e. the binding is not non-specific and so the detection antigen will not arbitrarily bind to antibodies in a sample). It may even have little in common with the natural sequence (e.g. a mimotope, an aptamer, etc.). Typically, though, a detection antigen will comprise an amino acid sequence (i) having at least 90% (e.g. ≥91%, ≥92%, ≥93%, ≥94%, ≥95%, ≥96%, ≥97%, ≥98%, ≥99%) sequence identity to the relevant SEQ ID NO disclosed herein (i.e. any one of SEQ ID NOs: 1-121) across the length of the detection antigen, and/or (ii) comprising at least one epitope from the relevant SEQ ID NO disclosed herein (i.e. any one of SEQ ID NOs: 1-121). Thus the detection antigen may be any of the variants discussed above.
Epitopes are the parts of an antigen that are recognised by and bind to the antigen binding sites of antibodies and are also known as “antigenic determinants”. An epitope-containing fragment may contain a linear epitope from within a SEQ ID NO and so may comprise a fragment of at least n consecutive amino acids of the SEQ ID NO, wherein n may be 7 or more (e.g. 8, 10, 12, 14, 16, 18, 20, 25, 30, 35, 40, 50, 80, 70, 80, 90, 100, 150, 200, 250 or more). B-cell epitopes can be identified empirically (e.g. using PEPSCAN [89,70] or similar methods), or they can be predicted e.g. using the Jameson-Wolf antigenic index [71], ADEPT [72], hydrophilicity [73], antigenic index [74], MAPITOPE [75], SEPPA [76], matt-based approaches [77], the amino acid pair antigenicity scale [78], or any other suitable method e.g. see ref. 79. Predicted epitopes can readily be tested for actual immunochemical reactivity with samples.
Detection antigens can be purified from human sources, but k is more typical to use recombinant antigens (e.g. particularly where the detection antigen uses sequences which are not present in the natural antigen e.g. for attachment). Various systems are available for recombinant expression, and the choice of system may depend on the auto-antibody to be detected. For example, prokaryotic expression (e.g. using E. coli) are useful for detecting many auto-antibodies, but if an auto-antibody recognises a glycoprotein then eukaryotic expression may be required. Similarly, if an auto-antibody recognises a specific discontinuous epitope then a recombinant expression system which provides correct protein folding may be required.
The detection antigen may be a fission polypeptide with a first region and a second region, wherein the first region can react with an auto-antibody in a sample and the second region can react with a substrate to immobilise the fusion polypeptide thereon.
A detection antibody for a biomarker antigen can be a monoclonal antibody or a polyclonal antibody. Typically it will be a monoclonal antibody. The detection antibody should have the ability to specifically bind to a Table 1 antigen (i.e. the binding is not non-specific and so the detection antibody will not arbitrarily bind to other antigens in a sample).
Various assay formats can be used for detecting biomarkers in samples. For example, the invention may use one or more of western blot, immunoprecipitation, silver staining, mass spectrometry (e.g. MALDI-MS), conductivity-based methods, dot blot, slot blot colorimetric methods, fluorescence-based detection methods, or any form of immunoassay, etc. The binding of antibodies to antigens can be detected by any means, including enzyme-biked assays such as ELISA, radioimmunoassays (RIA), immunoradiometric assays (IRMA), immunoenzymatic assays (IEMA), DELFIA™ assays, surface plasmon resonance or other evanescent light techniques (e.g. using planar waveguide technology), label-free electrochemical sensors, etc. Sandwich assays are typical for immunological methods.
In embodiments where multiple biomarkers are to be detected an array-based assay format is preferable, in which a sample that potentially contains the biomarkers is simultaneously contacted with multiple detection reagents (antibodies and/or antigens) in a single reaction compartment. Antigen and antibody arrays are well known in the art e.g. see references 80-88, including arrays for detecting auto-antibodies. Such arrays may be prepared by various techniques, such as those disclosed in references 87-91, which are particularly useful for preparing microarrays of correctly-folded polypeptides to facilitate binding interactions with auto-antibodies. It has been estimated that most B-cell epitopes are discontinuous and such epitopes are known to be important in diseases with an autoimmune component. For example. In autoimmune thyroid diseases, auto-antibodies arise to discontinuous epitopes on the immunodominant region on the surface of thyroid peroxidase and in Goodpasture disease auto-antibodies arise to two major conformational epitopes. Protein arrays which have been developed to present correctly-folded polypeptides displaying native structures and discontinuous epitopes are therefore particularly well suited to studies of diseases where auto-antibody responses occur [84].
Methods and apparatuses for detecting binding reactions on antigen arrays are now standard in the art. Preferred detection methods are flu detection methods. To detect auto-antibodies which have bound to immobilised antigens, a sandwich assay is typical, in which the primary antibody is an auto-antibody from the sample and the secondary antibody is a labelled anti-sample antibody (e.g. an anti-human antibody).
Where a biomarker is an auto-antibody the invention will generally detect IgG antibodies, but detection of auto-antibodies with other subtypes is also possible e.g. by using $ detection reagent which recognises the appropriate lass of auto-antibody (IgA, IgM, IgE or IgD rather than IgG). The assay format may be able to distinguish between different antibody subtypes and/or isotypes. Different subtypes [92] and isotypes [93] can influence auto-antibody repertoires. For instance, a sandwich assay can distinguish between different subtypes by using differentially-labeled secondary antibodies e.g. different libels for anti-IgG and anti-IgM.
As mentioned above, the invention provides a diagnostic device which permits determination of whether a sample contains Table 1 biomarkers. Such devices will typically comprise one or more antigen(s) and/or antibodies immobilised on a solid substrate (e.g. on glass, plastic, nylon. etc.). Immobilisation may be by covalent or non-covalent bonding (e.g. non-covalent bonding of a fusion polypeptide, as discussed above, to an immobilised functional group such as an avidin [89] or a bleomycin-family antibiotic [91]). Antigen arrays are a preferred foment, with detection antigens being individually addressable. The immobilised antigens will be able to react with auto-antibodies which recognise a Table 1 antigen.
In some embodiments, the solid substrate may comprise a strip, a slide, a bead, a well of a microtitre plate, a conductive surface suitable for performing mass spectrometry analysis [94], a semiconductive surface [95,96]. a surface plasmon resonance support, a planar waveguide technology support, a microfluidic devices, or any other device or technology suitable for detection of antibody-antigen binding.
Where the invention provides or uses an antigen array for detecting a panel of auto-antibodies as disclosed herein, in some embodiments the array may include only antigens for detecting these auto-antibodies. In other embodiments, however, the array may include polypeptides in addition to those useful for detecting the auto-antibodies. For example, an array may include one or more control polypeptides. Suitable positive control polypeptides include an anti-human immunoglobulin antibody, such as an anti-IgM antibody, an anti-IgG antibody, an anti-IgA antibody, an anti-IgE antibody or combinations thereof. Other suitable positive control polypeptides which can bind to sample antibodies include protein A or protein G, typically in recombinant form. Suitable negative control polypeptides include, but are not limited to, β-galactosidase, serum albumins (e.g. bovine serum albumin (BSA) or human serum albumin (HSA)), protein tags, bacterial proteins, yeast proteins, citrullinated polypeptides, etc. Negative control features on an array can also be polypeptide-free e.g. buffer alone, DNA, etc. An arrays control features are used during performance of a method of the invention to check that the method has performed as expected e.g. to ensure that expected proteins are present (e.g. a positive signal from serum proteins in a serum sample) and that unexpected substances are not present (e.g. a positive signal from an array spot of buffer alone would be unexpected).
In an antigen array of the invention, at least 10% (e.g. ≥20%, ≥30%, ≥40%, ≥50%, ≥60%, ≥70%, ≥80%, ≥90%, ≥95%, or more) of the total number of different proteins present on the array may be for detecting auto-antibodies as disclosed herein.
An antigen array of the invention may include one or more replicates of a detection antigen and/or control feature e.g. duplicates, triplicates or quadruplicates. Replicates provide redundancy, provide intra-array controls, and facilitate inter-array comparisons.
An antigen array of the invention may include detection antigens for more than just the 121 different auto-antibodies described here, but preferably it can detect antibodies against fewer than 10000 antigens (e.g. <5000, <4000, <3000, <2000, <1000, <500, <250, <100, etc.).
An array is advantageous because it allows simultaneous detection of multiple biomarkers in a sample. Such simultaneous detection is not mandatory, however, and a panel of biomarkers can also be evaluated in series. Thus, for instance, a sample could be split into sub-samples and the sub-samples could be assayed. In series. In this embodiment it may not be necessary to complete analysis of the whole panel e.g. the diagnostic indicators obtained on a subset of the panel may indicate that a patient has endometriosis without requiring analysis of any further members of the panel. Such incomplete analysis of the panel is encompassed by the invention because of the intention or potential of the method to analyse the complete panel.
As mentioned above, some embodiments of the invention can include a contribution from known tests for endometriosis, such as CA125, CA19-9 and/or any of the antigens listed in Table 5. Any known tests can be used e.g. laparoscopy, ultrasound, magnetic resonance imaging (MRI), etc.
miRNA Detection
Table 1 lists 222 human miRNAs. and methods of the invention can involve detecting and determining the level of these mRNA biomarker(s) in a sample. Further details of these miRNAs are provided in Tables 3 and 4. Tables 3 and 4 also include nucleotide sequences for these miRNAs, but polymorphisms of mRNA are known in the art and so the invention can also involve detecting and determining the level of a polymorphic miRNA variant of these listed mRNA sequences.
Techniques for detecting specific miRNAs are well known in the art, e.g. mic oar ay analysis and NanoString's nCounter technology, polymerase chain reaction (PCR)-based methods (e.g. reverse transcription PCR, RT-PCR), in-situ hybridisation (ISH)-based methods (e.g. fluorescent ISH, FISH), northern blotting, sequencing (e.g. nest-generation sequencing), fluorescence-based detection methods, etc. Any of the detection techniques mentioned above can be used with the invention. Where prognosis is the primary interest, a quantitative detection technique is preferred, e.g. real-time quantitative PCR (qPCR), TagMan® or SYBR® Green.
Detection of a miRNA typically involves contacting (‘hybridising’) a sample with a complementary detection probe (e.g. a synthetic oligonucleotide strand), wherein a specific (rather than non-specific) finding reaction between the sample and the complementary probe indicates the presence of the miRNA of interest, in some instances, the miRNA in the sample is amplified prior to detection, e.g. by reverse transcription of the miRNA to produce a complementary DNA (cDNA) strand, and the derived cDNA can be used as a template in the subsequent PCR reaction.
Thus, the invention provides nucleic acids, which can be used, for example, as hybridization probes for specific detection of miRNA in biological samples or as single-stranded primers to amplify the miRNA.
The term ‘nucleic acid’ includes in general means a polymeric form of nucleotides of any length, which contain deoxyribonucleotides, ribonucleotides, and/or their analogs. It includes DNA, RNA, DNA/RNA hybrids. It also includes DNA or RNA analogs, such as those containing modified backbones (e.g. peptide nucleic adds (PNAs) or phosphorothioates) or modified bases. Nucleic add according to the invention can take various forms (e.g. single-stranded, primers, probes, labelled eta). Primers and probes are generally single-stranded.
The nucleic acid can be identical or complementary to the mature miRNA sequences listed in Tables 3 and 4 (i.e. any one of SEQ ID NOs:122-343).
The nucleic acid can comprise a nucleotide sequence that has ≥50%, ≥60%, ≥70%, ≥75%, ≥80%, ≥85%, ≥90%, ≥95%, ≥96%, ≥97%, ≥98%, ≥99% or more identity to any one of SEQ ID NOs: 122-343. Identity between sequences is preferably determined by the Smith-Waterman homology search algorithm as described above.
The nucleic acid can comprise a nucleotide sequence that has ≥50%, ≥80%, ≥70%, ≥75%, ≥80%, ≥85%, ≥90%, ≥95%, ≥96%, ≥97%, λ≥98%, ≥99% or more complementarily to any one of SEQ ID NOs: 122-343.
The term “complementarily” when used in relation to nucleic acids refers to Watson-Crick base pairing. Thus the complement of C is G, the complement of G is C, the complement of A is T (or U), and the complement of T (or U) is A. It is abo possible to use bases such as I (the purine inosine) e.g. to complement pyrimidines (C or T).
Where a nucleic acid is DNA, it will be appreciated that “U” in a RNA sequence will be replaced by T in the DNA. Similarly, where a nucleic acid is RNA. It will be appreciated that “T” in a DNA sequence will be replaced by IP in the RNA.
The nucleic acid may be 12 or more. e.g. 12, 13, 14, 15, 16, 17 or 18, etc. (e.g. up to 50) nucleotides in length. The nucleic acid may be 1540 nucleotides in length, 10-25 nucleotides in length, 15-25 nucleotides in length, or 20.25 nucleotides in length.
The nucleic acid may include sequences that do not hybridise to the miRNA biomarkers, and/or amplified products thereof. For example, the nucleic acid may contain additional sequences at the 5′ end or at the 3′ end. The additional sequences can be a linker, e.g. for cloning or PCR purposes.
Nucleic acid of the invention may be attached to a solid support (e.g. a bead, plate, filter, film, side, microarray support, resin, etc). Nucleic acid of the invention may be labelled e.g. with a radioactive or fluorescent label, or a biotin label. This is particularly useful where the nucleic acid is to be used in detection techniques e.g. where the nucleic acid is a primer or as a probe. Methods for preparing fluorescent labeled probes, e.g. for fluorescent in-situ hybridisation FISH analysis, are known in the art, and FISH probes can be obtained commercially, e.g. from Exiqon.
The invention may use in-situ hybridisation (ISH)-based methods. e.g. fluorescent in-situ hybridisation (FISH). Hybridization reactions can be performed under conditions of different “stringency” following by washing. Preferably, the nucleic acid of the invention hybridize under high stringency conditions, such that the nucleic acid specifically hybridizes to a miRNA in an amount that is delectably stronger than non-specific hybridization. Relatively high stringency conditions include, for example, low salt and/or high temperature conditions, such as provided by about 0.02-0.1 M NaCl or the equivalent at temperatures of about 50-70° C. A stringent wash removes nonspecific probe binding and overloaded probes. Relatively stringent wash conditions include, for example, low salt and/or presence of detergent, e.g. 0.02% SDS in 1× Saline-Sodium Citrate (SSC) at about 50° C.
In embodiments where multiple biomarkers are to be detected, an array-based assay or PCR format is preferable, in which a sample that potentially contains the biomarkers is simultaneously contacted with multiple oligonucleotide, complementary detection probes or PCR primers/probes (‘multiplexed’) in a single reaction compartment, whereby a reaction compartment is defined as, but not linked to, a microlitre well, microfluidic chamber or detection pore. In other embodiments these multiple biomarkers could either be contacted with its complementary detection probe in separate, individual reaction compartments and/or, experiments could be separated over time and using different platform technologies in either multiplexed single reaction compartments or separate, individual reaction compartments. Microarray and PCR usage for the detection of miRNAs is well known in the art e.g. see reference 115. Microarrays may be prepared by various techniques, such as those disclosed in references 118, 117, 118. Methods based on nucleic acid amplification are also well known in the art.
Methods and apparatus for detecting binding reactions on DNA microarrays are now standard in the art. Preferred detection methods are fluorescence-based detection methods. To detect biomarkers which have bound to immobilised oligonucleotide strands on a glass substrate is typical e.g. in which the target mRNA is fluorescently labelled and then is hybridised to a complementary oligonucleotide strand (probe).
An array is advantageous because it allows simultaneous detection of multiple biomarkers in a sample. Such simultaneous detection is not mandatory, however, and a panel of biomarkers can also be evaluated in series. Thus, for instance, a sample could be spilt into sub-samples and the sub-samples could be assayed in series. In this embodiment it may not be necessary to complete analysis of the whole panel e.g. the diagnostic indicators obtained on a subset of the panel may indicate that a patient has endometriosis without requiring analysis of any further members of the panel. Such incomplete analysis of the panel is encompassed by the invention because of the intention or potential of the method to analyse the complete panel.
Data Interpretation
The invention involves a step of determining the level of Table 1 biomarker(s). In some embodiments of the invention this determination for a particular marker can be a simple yes/no determination, whereas other embodiments may require a quantitative or semi-quantitative determination, still other embodiments may involve a relative determination (e.g. a ratio relative to another marker, or a measurement relative to the same marker in a control sample), and other embodiments may involve a threshold determination (e.g. a yes/no determination whether a level is above or below a threshold). Usually biomarkers will be measured to provide quantitative or semi-quantitative results (whether as relative concentration, absolute concentration, fold charges, titre, relative fluorescence etc.) as this gives more data for use with classifier algorithms.
Usually the raw data obtained from an assay for determining the presence, absence, or level (absolute or relative) require some sod of manipulation prior to their use. For instance, the nature of most detection techniques means that some signal will sometimes be seen even if no biomarker is actually present and so this noise may be removed before the results are interpreted. Similarly, there may be a background level of the biomarker in the general population which needs to be compensated for. Data may need scaling or standardising to facilitate inter-experiments comparisons. These and similar issues, and techniques for dealing with them, are well known in the immunodiagnostic area.
Various techniques are available to compensate for background signal in a particular experiment. For example, replicate measurements will usually be performed (e.g. using multiple features of the same detection probe on a single array) to determine antra-assay variation, and average values from the replicates can be compared (e.g. the median value of binding to quadruplicate array features).
Furthermore, standard markers can be used to determine inter assay variation and to pen nit calibration and/or normalisation e.g. an array can include one or more standards for indicating whether measured signals should be proportionally increased or decreased. For example, an assay might include a step of analysing the level of one or more control markers) in a sample e.g. levels of an antigen or antibody unrelated to endometriosis. Signal may be adjusted according to distribution in a single experiment. For instance, signals in a single array experiment may be expressed as a percentage of interquartile differences e.g. as [observed signal−25th percentile]/[75th percentile−25th percentile]. This percentage may then be normalised e.g. using a standard quartile normalization matrix, such as disclosed in reference 97, le which all percentage values on a single array are ranked end replaced by the average of percentages for antigens with the same rank on ail arrays. Overall, this process gives data distributions with identical median and quartile values. Data transformations of this type are standard in the art for permitting valid inter-array comparisons despite variation between different experiments.
The level of an auto-antibody relative to a single baseline level may be defined as a fold difference. Normally k is desirable to use techniques that can indicate a change of at least 1.5-fold e.g. ≥1.75 fold, ≥2:2-fold, ≥2.5-fold, ≥5-fold, etc.
A control sample can also be used for data normalisation. For example, levels of an antibody or a mRNA unrelated to endometriosis can be measured in both the sample and a negative control, and signal from other antibodies can be adjusted accordingly. A negative control sample is from a subject that does not have any clinical presentation of endometriosis.
In some embodiments, rather than (or in addition to) comparing a biomarker against a ‘normal’ baseline, they will be compared to levels seen in a sample from a subject known to have endometriosis and known to have a particular biomarker (i.e. comparison to a positive control). For some biomarkers this comparison may be easier than using a lower negative control level. A preference for comparison against a negative or positive control may depend on the dynamic range between negative and positive signals.
Levels of the biomarkers in a control may be determined in parallel to determining levels in the sample. Rather than making a parallel determination, however, it can be more convenient to use an absolute control level based on empirical data. For example, the level of a particular biomarker may be measured in samples taken from a range of subjects without endometriosis. Those levels can be used to build a baseline across the range of subjects. This may involve normalization relative to a reference antibody. A population of negative control subjects can be used to provide a collection of baseline levels for subjects of different genders, ages, ethnic lies, habits (e.g. smokers, non-smokers), eta, so that, if there is variation across the population, a control can be matched to a particular subject as closely as possible. Thus, by analysing non-diseased samples from a sufficiently large number of subjects it is possible to establish an empirical baseline for any particular auto-antibody or miRNA, which can serve as the control level for comparison according to the invention. The control level is not necessarily a single value, but could be a range, against which a test value can be compared. For instance, if a particular auto-antibody titre is variable across non-diseased subjects, but is always in the range of 20-100 (arbitrary) units, a titre of 400 units in a sample would indicate a disease state.
As well as compensating for variation which is inherent between different experiments, it can also be important to compensate for background levels of a biomarker which are present in the general population. Again, suitable techniques am well known. For example, levels of a particular biomarker in a sample will usually be measured quantitatively or semi-quantitatively to permit comparison to the background level of that biomarker. Various controls can be used to provide a suitable baseline for comparison, and choosing suitable controls is routine in the diagnostic field. Further details of suitable controls are given below.
The measured level(s) of biomarker(s), after any compensation/normalisation/etc., can be transformed into a diagnostic result in various ways. This transformation may involve an algorithm which provides a diagnostic result as a function of the measured level(s). Where a panel is used then each individual biomarker may make a different contribution to the overall diagnostic result and so two biomarkers may be weighted differently.
The creation of algorithms for converting measured levels or raw data into scores or results is well known in the ad. For example, linear or non-linear classifier algorithms can be used. These algorithms can be trained using data from any particular technique for measuring the marker(s). Suitable training data will have been obtained by measuring the biomarkers in “case” and “control” samples i.e. samples from subjects known to suffer from endometriosis and from subjects known not to suffer from endometriosis. Most usefully the control samples will also include samples from subjects with a related disease which is to be distinguished from the disease of interest e.g. it is useful to train the algorithm with data from rheumatoid arthritis subjects and/or with data from subjects with connective tissue diseases other than endometriosis. The classifier algorithm is modified until it can distinguish between the case and control samples e.g. by adding or removing markers from the analysis, by changes in weighting, etc. Thus a method of the invention may include a step of analysing biomarker levels in a subject's sample by using a classifier algorithm which distinguishes between endometriosis subjects and non-endometriosis subjects based on measured biomarker levels in samples taken from such subjects.
Various suitable classifier algorithms are available e.g. linear discriminant analysis, naïve Bayes classifiers, perceptions, support vector machines (SW)[98] and genetic programming (GP) [99]. GP is particularly useful as it generally selects relatively small numbers of biomarkers and overcomes the problem of trapping in a local maximum which is inherent in many other classification methods. SWIM based approaches have previously been applied to endometriosis datasets [100]. The inventors have previously confirmed that both SW and GP approaches can be trained on the same biomarker panels to distinguish the auto-antibody/antigen biomarker profiles of case and control cohorts with similar sensitivity and specificity i.e. auto-antibody biomarkers are not dependent on a single method of analysis. Moreover, these approaches can potentially distinguish endometriosis subjects from subjects with (i) other forms of autoimmune disease and fie rheumatoid arthritis. The biomarkers in Table 1 can be used to train such algorithms to reliably make such distinctions. The classification performance (sensitivity and specificity, ROC analysis) of any putative biomarkers can be rigorously assessed using nested cross validation and permutation analyses prior to further validation. Biological support for putative biomarkers can be sought using tools and databases including Genespring (version 11.5.1), Biopax pathway for GSEA analysis and Pathway Studio (version 9.1).
It will be appreciated that, although there may be some biomarkers in Table 1 which always give a negative absolute signal when contacted with negative control samples (and thus any positive signal is immediately indicative of endometriosis), it is more common that a biomarker will give at least a low absolute signal (and thus that a disease-indicating positive signal requires detection of auto-antibody levels above that background levee. Thus references herein detecting a biomarker may not be references to absolute detection but rather (as is standard in the art) to a level above the levels seen in an appropriate negative control. Such controls may be assayed in parallel to a test sample but t can be more convenient to use an absolute control level based on empirical data, or to analyse data using an algorithm which can (e.g. by previous training) use biomarker levels to distinguish samples from disease patients vs. non-disease patients.
The level of a particular biomarker in a sample from an endometriosis-diseased subject may be above or below the level seen in a negative control sample. Antibodies that react with self-antigens occur naturally in healthy individuals and it is believed that these are necessary for survival of T- and B-cells in the peripheral immune system [101]. In a control population of healthy individuals there may thus be significant levels of circulating auto-antibodies against some of the antigens disclosed in Table 1 and these may occur at a significant frequency in the population. The level and frequency of these biomarkers may be altered in a disease cohort, compared with the control cohort. An analysis of the level and frequency of these biomarkers in the case and control populations may identify differences which provide diagnostic information. The level of auto-antibodies directed against a specific antigen may increase or decrease in an endometriosis sample, compared with a healthy sample.
When detecting combinations of protein and nucleic acid biomarkers, each class of biomarker is assayed and treated (e.g. data normalisation) as appropriate for that class, and then both the protein and the nucleic acid biomarkers are analysed together using an algorithm which classifies the sample.
In general, therefore, a method of the invention will involve determining whether a sample contains a biomarker level which is associated with endometriosis. Thus a method of the invention can include a step of comparing biomarker levels in a subject's sample to levels in (i) a sample from a patient with endometriosis and/or (ii) a sample from a patient without endometriosis. The comparison provides a diagnostic indicator of whether the subject has endometriosis. An aberrant level of one or more biomarker(s), as compered to known or standard expression levels of those biomarker(s) in a sample from a patient without endometriosis, indicates that the subject has endometriosis.
The level of a biomarker should be significantly different from that seen in a negative control. Advanced statistical tools (e.g. principal component analysis, unsupervised hierarchical clustering and linear modeling) can be used to determine whether two levels am the same or different. For example, an in vitro diagnosis will rarely be based on comparing a single determination. Rather, an appropriate number of determinations will be made with an appropriate level of accuracy to give a desired statistical certainty with an acceptable sensitivity and/or specificity. Antigen and/or antibody levels can be measured quantitatively to permit proper comparison, and enough determinations will be made to ensure that any difference in levels can be assigned a statistical significance to a level of p≤0.05 or better. The number of determinations will vary according to various criteria (e.g. the degree of variation in the baseline, the degree of up-regulation in disease states, the degree of noise, etc.) but, again, this fats within the normal design capabilities of a person of ordinary skill in this field. For example, interquartile differences of normalised data can be assessed, and the threshold for a positive signal (i.e. indicating the presence of a particular auto-antibody) can be defined as requiring that antibodies in a sample read with a diagnostic antigen at least 2.5-fold more strongly that the interquartile difference above the 75th percentile.
Other criteria are familiar to those skilled in the art and, depending on the assays being used, they may be more appropriate than quintile normalisation. Other methods to normalise data include data transformation strategies known in the an e.g. scaling, log normalisation, median normalisation, etc. For example, raw protein array data can be normalized by consolidating the replicates, transforming the data and applying median normalization which has been demonstrated to be appropriate for this type of analysis. Gene expression data can be subjected to background correction via 2D spatial correction and dye bias normalization via MvA lowess [102,103,104]. Normalized gene expression and proteomic data can be analysed for any potential signatures relating to differences between patient cohorts referring to levels of statistical significance (generally p<0.05), multiple testing correction and told changes within the expression data that could be indicative of biological effect (generally 2 fold in mRNA compared with a reference value).
The underlying aim of these data interpretation techniques is to distinguish between the presence of a Table 1 biomarker and of an arbitrary control biomarker, and also to distinguish between the response of sample from a endometriosis subject from a control subject. Methods of the invention may have sensitivity of at least 70% (e.g. >70%, >75%, >80%, >85%, >90%, >95%, >96%, >97%, >98%, >99%). Methods of the invention may have specificity of at least 70% (e.g. >70%, >75%, >60%, >85%, >90%, >95%, >96%, >97%, >98%, >99%). Advantageously, methods of the invention may have both specificity and sensitivity of at least 70% (e.g. >70%, >75%, >80%, >85%, >90%, >95%, >96%, >97%, >98%, >99%). As shown in the examples, the invention can consistently provide specificities above approximately 70% and sensitivities greater than approximately 70%.
Data obtained from methods of the invention, and/or diagnostic information based on those data, may be stored in a computer medium (e.g. in RAM, in non-volatile computer memory, on CD-ROM) and/or may be transmitted between computers e.g. over the kismet.
If a method of the invention indicates that a subject has endometriosis, further steps may then follow. For instance, the subject may undergo confirmatory diagnostic procedures, such as those involving physical inspection of the subject, and/or may be treated with therapeutic agent(s) suitable for treating endometriosis.
Preferably, the biomarkers of the invention can determine if a subject has peritoneum endometriosis. Thus, the invention also provides a method for detecting peritoneum endometriosis. The method can use any of the biomarkers listed in any of Tables 16, 18 and 20-21, preferably in Table 16 or 18.
Alternatively, the biomarkers of the invention can determine if a subject has ovarian endometriosis, such as endometriomas. Thus, the invention also provides a method for detecting ovarian endometriosis. The method can use any of the biomarkers listed in any of Tables 17-21, preferably in Table 17 or 18.
Monitoring the Efficacy of Therapy
As mentioned above, some methods of the invention involve testing samples from the same subject at two or more different points in time. In general, where the above text refers to the presence or absence of antibody(s), the invention also includes an increasing or decreasing level of the antibody(s) over time, or to a spread of antibodies in which additional antibodies or antibody classes are raised against a single auto-antigen. Methods which determine changes in antibody(s) over time can be used, for instance, to monitor the efficacy of a therapy being administered to the subject (e.g. in theranostics). The therapy may be administered before the first sample is taken, at the same time as the first sample is taken, or after the first sample is taken.
The invention can be used to monitor a subject who is receiving hormonal therapies. Hormonal treatments including the combine oral contraceptive pills, progesterone only pills and intrauterine progesterone (Levonorgestrel IUS), Progestins, Gestrinone, Danazol (despite its masculinisation side effects), GnRH analogues.
In related embodiments of the invention, the results of monitoring a therapy are used for future therapy prediction. For example, if treatment with a particular therapy is effective in reducing or eliminating disease symptoms in a subject, and is also shown to decrease levels of a particular auto-antibody in that subject, detection of that auto-antibody in another subject may indicate that this other subject will respond to the same therapy. Conversely, if a particular therapy was not effective in reducing or eliminating disease symptoms in a subject who had a particular auto-antibody or auto-antibody profile, detection of that auto-antibody or profile in another subject may indicate that this other subject will also fall to respond to the same therapy.
In other embodiments, the presence of auto-antibodies against a particular auto-antigen can be used as the basis of proposing or initiating a particular therapy. For instance, if is known that levels of a particular auto-antibody can be reduced by administering a particular therapy then that auto-antibody's detection may suggest that the therapy should begin. Thus the invention is useful in a theranostic setting. Normally at least one sample will be taken from a subject before a therapy begins.
Immunotherapy
Where the development of auto-antibodies to a newly-exposed auto-antigen is causative for a disease, early priming of the immune response can prepare the body to remove antigen-exposing cells when they arise, thereby removing the cause of disease before auto-antibodies develop dangerously. The auto-antigens listed in Table 2 are thus therapeutic targets for treating endometriosis. For example, one antigen known to be recognised by auto-antibodies is p53, and this protein is considered to be both a vaccine target and a therapeutic target for the modulation of cancer [105-107].
Thus the invention provides a method for raising an antibody response in a subject, comprising eliciting to the subject an immunogen which slicks antibodies which recognise an auto-antigen listed in Table 2. The method is suitable for immunoprophylaxis of endometriosis.
The invention also provides an immunogen for use in medicine, wherein the immunogen can elicit antibodies which recognise an auto-antigen listed in Table 2. Similarly, the invention also provides the use of an immunogen in the manufacture of a medicament for immunoprophylaxis of endometriosis, wherein the immunogen can elicit antibodies which recognise an auto-antigen listed in Table 2.
As discussed above for detection antigens, the immunogen may be the auto-antigen itself or may comprise an amino acid sequence having identity and/or comprising an epitope from the auto-antigen. Thus the immunogen may comprise an amino acid sequence (i) having at least 90% (e.g. ≥91%, ≥92%, ≥93%, ≥94%, ≥95%, ≥90%, ≥97%, ≥98%, ≥99%) sequence identity to the relevant SEQ ID NO disclosed herein (i.e. any of SEQ D NOs: 1-121), and/or (ii) comprising at least one epitope from the relevant SEQ ID NO disclosed herein (La. any of SEQ ID NOs: 1-121). Other immunogens may also be used, provided that they can sack antibodies which recognise the auto-antigen of interest.
As an alternative to immunising a subject with a polypeptide immunogen, it is possible to administer a nucleic acid (e.g. DNA or RNA) immunogen encoding the polypeptide, for in situ expression in the subject, thereby leading to the development of an antibody response.
The immunogen may be delivered in conjunction (e.g. in admixture) with an immunological adjuvant. Such adjuvants include, but are not limited to, insoluble aluminium salts, water-In-oil emulsions, oil-in-water emulsions such as MF59 and AS03, saponins. ISCOMs, 3-O-deacylated MPL (3dMPL), immunostimulatory oligonucleotides (e.g. including one or more CpG motifs), bacterial ADP-ribosylating toxins and detoxified derivatives thereof, cytokines, chitosan, biodegradable microparticles, liposomes, imidazoquinolones, phosphazenes (e.g. endometriosisPP), aminoalkylglucosaminide phosphates, gamma inulins, etc. Combinations of such adjuvants can also be used. The adjuvant(s) may be selected to elicit an immune response involving CD4 or CD8 T cells. The adjuvant(s) may be selected to bias an immune response towards a TH1 phenotype or a TH2 phenotype.
The immunogen may be delivered by any suitable route. For example, it may be delivered by parenteral injection (e.g. subcutaneously, intraperitoneally, intravenously, intramuscularly), or mucosally, such as by oral (e.g. tablet, spray), topical, transdermal, transcutaneous, intranasal, ocular, aural, pulmonary or other mucosal administration.
The immunogen may be administered in a liquid or sold form. For example, the immunogen may be formulated for topical administration (e.g. as an ointment, cream or powder), for oral administration (e.g. as a tablet or capsule, as a spray, or as a syrup), for pulmonary administration (e.g. as an inhaler, using a fine powder or a spray), as a suppository or pessary, as drops, or as an injectable solution or suspension.
RNA-Based Therapy
The miRNAs listed in Table 3 and 4 can be useful for RNA-based therapy, e.g., antisense therapy. There is literature precedent outlining the use of antisense therapy to manage cancer [108]. A synthetic nucleic acid complementary to a miRNA listed in Table 3 or Table 4 could be used to stimulate cell death of cancerous cells (either associated with endometriosis and/or aggressive endometriosis). Additionally, in vivo antisense therapy could be used to introduce a nucleic acid complementary to a miRNA listed in Table 3 or Table 4 to specifically bind to, and abrogate, overexpression of specific miRNA(s) associated with endometriosis and/or aggressive endometriosis.
Thus the invention provides a nucleic acid which hybridises to miRNA(s) listed in Table 3 or Table 4 (i.e. any of SEQ ID NOs: 122-343), and which is conjugated to a cytotoxic agent. The miRNA is preferably a human miRNA. My suitable cytotoxic agent can be used. These conjugates miRNAs can be used in methods of therapy.
Thus the invention provides a complementary nucleic which recognises a miRNA listed in Table 3 or Table 4 for the purposes of RNA-based therapies, e.g. antisense therapy.
Imaging
The biomarkers listed in Table 1 can be useful for imaging.
A labelled antibody against an auto antigen kited in Table 2 can be injected in vivo and the distribution of the antigen can then be detected. This method may identify the source of the auto-antigen (e.g. an area in the body where there is a high concentration of the antigen), potentially offering early identification of a pathological condition. Imaging techniques can also be used to monitor the progress or remission of disease, or the impact of a therapy.
The invention also provides a labelled antibody which recognises an auto-antigen listed in Table 2. The antibody may be a human antibody, as discussed above. Any suitable label can be used e.g. quantum dots, spin labels, fluorescent labels, etc.
A labelled, synthetic nucleic complementary to a miRNA(s) listed in Table 3 or Table 4 could be used for the identification, in ex via (e.g. tissue samples taken from biopsies), and in vivo (e.g. magnetic resonance imaging (MRI), positron emission tomography (PET) computed tomography (CT) scans of patients) samples of miRNAs associated ted with endometriosis and/or aggressive endometriosis. This may potentially offer a method for the early identification of endometriosis and/or aggressive endometriosis. Imaging techniques can also be used to monitor the progress or remission of disease, or the impact of a therapy.
The miRNA listed in Table 3 or Table 4 can be useful for analysing tissue samples by staining e.g. using standard FISH. A fluorescently labelled nucleic acid, complementary in sequence to the miRNAs outlined in Table 3 or Table 4 can be contacted with a tissue sample to visualise the location of the miRNA. A single sample could be stained against multiple miRNAs, and these different miRNAs may be differentially labeled to enable them to be distinguished. As an alternative, a plurality of different samples can each be stained with a single, labeled miRNA.
Thus the invention provides a labeled nucleic acid which can hybridise to miRNA(s) listed in Table 3 or Table 4. The mRNA is preferably a human miRNA. Any salable label can be used e.g. quantum dots, spin labels, fluorescent labels, dyes, etc. These labelled nucleic acids can be used in methods of in vivo and/or in vitro imaging.
Alternative Biomarkers
The invention refers to auto-antibody and antigen biomarkers, with assays of auto-antibodies against an antigen being used in preference to assays of the antigen itself. In addition to these biomarkers, however, the invention can be used with other biological manifestations of the Table 2 antigens. For example, the level of mRNA transcripts encoding a Table 2 antigen can be measured, particularly in tissues where that gene is not normally transcribed (such as in the potential disease tissue). Similarly, the chromosomal copy number of a gene encoding a Table 2 antigen can be measured e.g. to check fora gene duplication event. The level of a regulator of a Table 2 antigen can be measured e.g. to look at a microRNA regulator of a gene encoding the antigen. Furthermore, things which are regulated by or respond to a Table 2 antigen can be assessed e.g. if an antigen is a regulator of a metabolic pathway then disturbances in that pathway can be measured.
The invention also refers to mRNA biomarkers. In addition to these biomarkers, however, the invention can be used with other biological manifestations of the Table 3 and Table 4 miRNAs. For exempla, the expression level of mRNA transcripts which are a target of a Table 3 or a Table 4 mRNA can be measured, particularly in tissues where changes in transcription level can easily be determined (such as in the potential disease tissue). Similarly, the copy number variation of a chromosomal location of a Table 3 or a Table 4 mRNA can be measured e.g. to check for a chromosomal deletion or duplication events.
The level of a regulator of transcription for a Table 3 or a Table 4 mRNA can be measured e.g. the methylation status of the mRNA chromosomal region.
A single pre-miRNA precursor may lead to one or more mature miRNA sequences, such as sequences excised from the 5′ and 3′ arms of the hairpin. The invention can be used to look for other mature mRNA sequences from the same pre-mRNA precursor. For example, other mature miRNA sequences from the same precursor may be appropriate biomarkers as well.
Further possibilities will be apparent to the skilled reader.
Preferred Panels
Preferred embodiments of the invention are based on at least two different biomarkers i.e. a panel. Panels of particular interest consist of or comprise combinations of one or more biomarkers listed in Table 1, optionally in combination with at least 1 further biomarker(s) e.g. from Table 5 etc. Preferred panels have from 2 to 6 biomarkers in total. Panels of particular interest consist of or comprise the combinations of bio markers listed in any of Tables 10 to 15. The panels useful for the invention (e.g. the panels listed in Tables 10 to 15) can be expanded by adding further (i.e. one or more) biomarker(s) to create a larger panel. The further biomarkers can usefully be selected from known biomarkers (as discussed above e.g. see Table 5). In general the addition does not decrease the sensitivity or specificity of the panel shown in the Tables. Such panels include, but are not limited to:
This panel is particularly useful for diagnosis.
Preferred panels have between 2 and 6 biomarkers in total.
A preferred panel comprises hsa-miR-150 and hsa-miR-574-5p. Another preferred panel comprises hsa-miR-342-3p and hsa-miR-574-5p. Mother preferred panel comprises hsa-miR-150 and hsa-mill-342-3p. Another preferred panel comprises hsa-miR-150, hsa-miR-122 and hsa-miR-574-5p. Another preferred panel comprises TPM1, hsa-miR-150 and hsa-miR-574-5p.
Different biomarkers can have different relative differential expression profiles in an endometriosis sample compared to a control sample. Pairs of these biomarkers (i.e. where one is up-regulated and the other is down-regulated relative to the same control) may provide a useful way of diagnosing endometriosis. For example, the inventors found that ebv-miR-BART2-5p and tea-miR-564 are up- and down-regulated, respectively, in endometriosis samples vs. control samples (healthy, non-endometriosis samples), so this pair would be useful. This divergent behaviour can enhance diagnosis of endometriosis when a pair of the biomarkers is assessed in the same sample.
Thus, a method of the invention can include a step of determining the expression levels of a first and a second biomarker of the invention in a subject's sample, wherein the first bb marker is up-regulated in an endometriosis sample compared to a non-endometriosis sample and the second biomarker is down-regulated in an endometriosis sample compared to the same non-endometriosis sample.
A method of the invention can include: (i) determining the expression level of a first biomarker of the invention in a subject's sample, (ii) determining the expression level of a second biomarker of the invention in the subject's sample, wherein the first biomarker is up-regulated in an endometriosis sample compared to a non-endometriosis sample and the second biomarker is down-regulated in an endometriosis sample compared to the same non-endometriosis sample, and (ii) comparing the determinations of (i) and (ii) with a non-endometriosis sample, an endometriosis sample and/or an absolute value, wherein the comparison provides a diagnostic indicator of whether the subject has endometriosis. Aberrant levels of the first and the second biomarkers, as compared to the known or standard expression levels of them in the non-endometriosis sample or endometriosis sample, and/or the absolute value, indicate that the subject has endometriosis.
General
The term “comprising” encompasses “Including” as well as “consisting” e.g. a composition “comprising” X may consist exclusively of X or may include something additional e.g. X+Y.
References to an antibody's ability to “bind” an antigen mean that the antibody and antigen interact strongly enough to withstand standard washing procedures in the assay to question. Thus non-specific binding will be minimised or eliminated.
An assay's “sensitivity” is the proportion of true positives which are correctly identified i.e the proportion of endometriosis subjects with auto-antibodies against the relevant antigen who test positive.
An assay's “specificity” is the proportion of true negatives which are correctly identified by i.e. the proportion of subjects without endometriosis who test negative for antibodies against the relevant antigen. Unless specifically stated, a method comprising a step of mixing two or more components does not require any specific order of mixing. Thus components can be mooed in any order. Where there are three components then two components can be combined with each other, and then the combination may be combined with the third component, etc.
References to a percentage sequence identity between two amino acid sequences means that, when aligned, that percentage of amino acids are the same in comparing the two sequences. This alignment and the percent homology or sequence identity can be determined using software programs known in the art, for example those described in section 7.7.18 of ref. 109. A preferred alignment is determined by the Smith-Waterman homology search algorithm using an atone gap search with a gap open penalty of 12 and a gap extension penalty of 2. BLOSUM matrix of 82. The Smith-Waterman homology search algorithm is disclosed in ref. 110.
Study 1
1. Detection of Auto-Antibodies in Serum of Subjects Suffering from Endometriosis
a. Array Preparation
The examples refer to use of a “functional protein” array technology which has the ability to display native, discontinuous epitopes [111,112]. Proteins are full-length, expressed with a folding tag in insect cells and screened for correct folding before being arrayed in a specific, oriented manner designed to conserve native epitopes. Each array contains approximately 1550 human proteins representing ˜1500 distinct genes chosen from multiple functional and disease pathways printed in quadruplicate together with control proteins. In addition to the proteins on each array, four control proteins for the BCCP-myc tag (BCCP, BCCP-myc, β-galactosidase-BCCP-myc and β-galactosidase-BCCP) were arrayed, along with additional controls including Cy3-labeled biotin-BSA, dilution series of biotinylated-IgG and biotinylated-IgM and buffer-only spots.
Incubation of the arrays with serum samples slows detection of binding of serum immunoglobulins to specific proteins on the arrays, enabling the identification of both autoantibodies and their cognate antigens [112].
b. Biomarker Confirmation
Serum samples were obtained from two groups of subjects:
For autoantibody profiling, serum samples were incubated with arrays separately. All arrays were incubated for 2 hours at room temperature (RT, 20° C.), followed by washing three times in fresh Triton-BSA buffer at RT for 20 minutes. The washed slides were incubated in a labelled anti-human IgG antibody at RT for 2 hours. Slides were washed three times in Triton-BSA buffer for 5 minutes at RT, rinsed, and centrifuged for 2 minutes at 240 g.
The probed and dried arrays were scanned using an Agilent High-Resolution microarray scanner at 10 μm resolution. The resulting 20-bit tiff images were feature extracted using Agilent's Feature Extraction software version 10.5 or 10.7.3.1. The microarray scans produced images for each array that were used to determine the intensity of fluorescence bound to each protein spot which were used to normalize and score array data.
Raw median signal intensity (also referred to as the relative fluorescent unit, RFU) of each protein feature (also referred to as a spot or antigen) on the array was subtracted from the local median background intensity. Alternative analyses use other measures of spot intensity such as the mean fluorescence, total fluorescence, as known in the art. The results of QC analyses showed that the platform performed well within expected parameters with relatively low technical variation.
The raw array data was normalized by consolidating the replicates (median consolidation), followed by normal transformation and then global median normalisation. Outliers were identified and removed. There is no method of normalisation which is universally appropriate and factors such as study design and sample properties must be considered. For the current study median normalisation was used. Other normalisation methods include, amongst others, SAM, quantile normalisation [113], multiplication of net fluorescent intensities by a normalisation factor consisting of the product of the 1st quartile of alt intensities of a sample and the mean of the 1st quartiles of ail samples and the “VSN” method [114]. Such normalisation methods are known in the art of microarray analysis.
This normalised data was then used for the identification of individual candidate biomarkers and for the development of combinations of biomarkers (“panels”). Tools such as volcano plots (
2 Detection of miRNA in Serum of Subjects Suffering from Endometriosis Using Microarrays
a. Array Preparation
For microarray fabrication and usage. Agilent Technologies' (‘Agilent’) miRNA microarray was used. The content of the microarray is continuously aligned with releases from the miRBase database [115, 116, 117, 118], representing all known miRNAs from human beings, as well as all know human viral mRNAs. These arrays are printed using Agilent's ink-jet in situ synthesis microarray fabrication machines.
b. Biomarker Confirmation
A set of 71 serum samples, sourced from patients with endometriosis (“case”; n=36) and normal (“control”; n=35) patients were processed to extract total RNA (including miRNA) using standard column filtration methodologies. The extracted serum samples were analysed using the Agilent miRNA microarray (G4870A-031181), according to their standard protocol, (manual part number G4170-90011, version 2.4). However, deviations from the standard protocol included labeling of the samples using 2.25 μl Cyanine 3-pCp, and hybridising the microarray slides for 44 hours.
The probed and dried arrays were then scanned using a microarray scanner capable of using an excitation wavelength suitable for the detection of the labelled miRNAs and to determine magnitude of miRNA binding to the complementary detection probe. The microarray scans produced images for each array that were used to determine the intensity of fluorescence bound to each oligonucleotide spot which were used to normalise and score array data.
The raw microarray scan image contains raw signal intensity (also referred to as the relative fluorescent unit, RFU) for each oligonucleotide spot (also referred to as a feature) on the array. These images were then feature extracted using Agilent's proprietary feature extraction software. Alternative analyses use other measures of spot intensity such as the mean fluorescence, total fluorescence, as known in the art.
The resulting average intensities of all oligonucleotide features on each array were then normalised to reduce the influence of technical bias (e.g. laser power variation, surface variation, input miRNA concentration, etc.) by a percentile normalisation procedure. Other methods for data normalisation suitable for the data include, amongst others, quantile normalisation [97]. Such normalisation methods are known in the art of microarray analysis.
A linear model was fitted to evaluate statistical differences in miRNA expression between cases and controls. Volcano plot analysis of the miRNA microarray data is shown in
The hierarchical clustering of the significant mRNAs according to the type of tissue (i.e. case vs. control) is shown in
3 Validation of miRNA in Serum of Subjects Suffering from Endometriosis Using qPCR
For quantitative PCR (“qPCR”) usage, Life Technologies' (“LifeTech”) TaqMan® miRNA assays were used. These assays am continuously aligned with releases from the MiRBase database, representing all known miRNAs from human beings, as well as all know human viral mRNAs. These TaqMan® mRNA assays employ a novel target-specific stern-loop reverse transcription primer to address the challenge of the short length of mature miRNA. The primer extends the 3′ end of the target to produce a template that can be used in standard TagMan® assay-based real-time PCR. Also, the stem-loop structure in the tail of the primer confers a key advantage to these assays: specific detection of the mature, biologically active mRNA.
Using the significant markers identified in the miRNA microarray experiments, a sub-selection of miRNAs were analysed using the LifeTech TaqMan® miRNA qPCR assays, according to their standard protocol, (manual pact number 4465407, revision date 30 Mar. 2012 (Rev. B)).
The TagMan® miRNA assays were scanned using a ViiA™ 7 Real-Time PCR System using an excitation wavelength suitable for the detection of the labelled mRNAs (for example, but not limited to, 6-FAM™ Dye). The qPCR scans produced traces for each TaqMan® mRNA assay which can be used, if applicable, to determine the amount of specific mRNA within a given sample, relative to a passive reference dye (for example, but not limited to, ROX).
The raw qPCR traces contain raw signal intensity (also referred to as ΔRn) to which an assay threshold (horizontal line) is applied after the raw traces have been baseline norms lied, which is necessary to remove aberrant signal. This threshold line intersects the qPCR trace at the point on the qPCR trace where the trace is logarithmic. From this, the qPCR cycle (Ct) can be determined. These qPCR traces are analysed using LifeTech's proprietary analysis software. Alternative analyses and analysis techniques are known in the art.
The median Ct and mean quantity was taken across the three sample replicates for each TaqMan® mRNA assay. Testing for statistically significant associations between the two groups (case vs control) was carried out by applying linear models to the normalised sample data to identify general miRNA changes between case samples and control samples. Statistical differences were calculated using a t-test.
The data from the microarray and TaqMan® miRNA qPCR assays were analysed for Pearson correlation to assess the cross-platform robustness of the observed results (
4. Multivariate Analysis: Combination of miRNA and Autoantibody Biomarkers in Serum of Subjects Suffering from Endometriosis
Panels of putative biomarkers were developed consisting of either autoantibodies alone, miRNAs alone or combinations containing both autoantibodies and miRNA species. Multivariate analysis was also performed incorporating data for galectin-3 and CA125 as variables however their inclusion did not improve on the performance of the miRNA and autoantibodies identified here. It is not possible to predict a priori which classifier will perform best with a given dataset, therefore data analysis was performed with 5 different feature ranking methods (1-5) plus forward and backward feature selection:
Other classification methods as known in the art could be used. Classifiers were then assessed for performance by referring to the combined sensitivity and specificity (S+S score) and area under the curve (AUC). Data were repeatedly split and analysis cycles repeated until a stable set of classifiers (“panels”) was identified. Nested cross validation was applied to the classification procedures in order to avoid overfilling of the study data. The performance of the classification was compared to a randomized set of case-control status samples (permutation assay) which should give no predictive performance and provides an indication of the background in the analysis. A figure close to 1.0 is expected for the null assay (equivalent to a sensitivity+specificity (S+S) score of 0.5+0.5, respectively) whereas an S+S score of 2.0 would indicate 100% sensitivity and 100% specificity. The difference between the values for the permutation analysis and the classifier performance indicates the relative strength of the classifier. The antigens and miRNAs identified from this study are provided in Tables 2 and 3.
Table 8 shows the protein biomarkers that provided good performance, as judged by p value, fold-change, sensitivity, specificity, AUC. The best performing protein biomarkers are shown in Table 7. Table 8 shows the miRNA biomarkers that provided good performance, as judged by p value, fold-change, sensitivity, specificity, AUC. The best performing mRNA biomarkers are shown in Table 9. The ROC curves for some oldie best performing miRNA biomarkers (hsa miR-150, hsa-miR-574-5p and hsa-miR-342-3p) are shown in
The analysis methods described above were used to build, test and identify combinations of biomarkers with greater sensitivity, specificity or AUC than the individual biomarkers disclosed in Table 1.
For each analysis, multiple combinations of putative biomarkers were derived and the performance of the derived panels was then ranked (rabies 11-15). Tables 11-15 show 2-mer, 3-mer, 4-mer, 5-mer and 8-mar panels that provide good performance. The ROC curves for some of the best performing combinations are shown in
The biomarkers with the greatest diagnostic power, as judged by p value, fold-change, sensitivity, specificity, AUC and/or frequency of appearance in the panels derived were identified and combined into a single list of antigens and miRNAs.
Study 2
1. Detection of miRNA in Tissue Samples of Subjects Suffering from Endometriosis Using NanoString Technology
The fresh tissue samples used in this study include: peritoneal endometriosis (ES), ovarian endometriomas (OvES), normal endometrium (NE) and normal peritoneum (NP).
miRNA Extraction
Total RNA from specimens was extracted using the mirVana™ PARIS™ kit (Ambion) as per their supplied protocol. RNA quantification and integrity was assessed using the Eukaryote total RNA nano assay by Aligent Technologies™. Following validation, 100 ng of total RNA was used in the nCounter® miRNA Expression Assay (NanoString Technologies) enabling an ultrasensitive miRNA detection in total RNA across all biological levels of expression without the use of reverse transcription or amplification. 735 human and human-associated viral miRNAs derived from MiRBase were scanned. Unique multiplexed annealing of specific oligonucleoside tags were ligated onto their target miRNA followed by an enzymatic purification to remove all unligated tags. Excess unbound probes and RNA were washed using a two-step magnetic bead-based purification system on the nCounter Prep Station. Remaining miRNA was attached to a cartridge surface and polarised. The Cartridge was scanned and data collection was performed on the nCounter digital analyser. Digital images were processed and miRNA counts were tabulated.
Statistical Analysis
The R project (R version 2.12.1) (http://www.R-project.org) was used for statistical and clustering analysis. Quantified gene expression signal levels derived from the nCounter® miRNA Egression Assay (NanoString Technologies) were logarithmically transformed (base 2) and quantile-normalized prior to further exploration. In order to determine miRNA expression detectability threshold the log 2 signal values were ranked in increasing order and binned into 0.5 expression level categories. Subsequently, the differences in successive expression level frequencies were calculated and the expression value following the most significant signal increase was considered as miRNA expression detectability threshold. miRNA expressions at any level (above log 2 expression value of 3.25) were counted in each group and subsequently chi-square test was performed to assess if the numbers are significantly different from expected average. The global expression of miRNAs between all groups was not noted to be significantly different (chi square p-value 0.62664147).
Analysis of variance (ANOVA) was performed per each gene across all samples assigned to appropriate groups in order to identify differentially expressed mRNAs. The Benjamin & Hochberg multiple hypothesis testing correction was applied to control the false discovery rate (FDR). Genes that remained statistically significant (corrected p-value <0.05) were selected for unsupervised hierarchical clustering performed based on distances calculated by means of Spearman correlation coefficient and ward agglomeration method.
Subsequently, pairwise t-tests with corrections for multiple testing were applied as a post hoc analysis. T-test was performed between a particular sample group and all other sample groups. For example, ES samples were compered against all other groups (OvES, NP and NE), OvES samples were compared against all other groups (ES, NP and NE) etc. Computed t-statistics and signal intensity fold changes were further used to determine significantly up- and down-regulated miRNAs in each sample group and to generate Venn diagrams depicting the overlap between miRNAs differentially expressed in investigated groups (
Area A in
Area C in
Area F in
Area B in
Area D in
Area E in
2 Validation of miRNA in Tissue Samples of Subjects Suffering From Endometriosis Using qPCR
PCR was performed to validate the microarray data. TaqMan® MicroRNA Assays (Applied Biosystems) were used.
cDNA was synthesized from total RNA using a commercially available specific EBV-miR-bart2-5p assay. RNU-44 and hsa-miR-26b were used as the mature miRNA endogenous controls. The miRNA 26b is a commonly expressed vertebrate mRNA and was used as a control primer for the reactions. Real time PCR amplification was performed on triplicates of each sample using the TaqMan2X Universal PCR Master Mix (Applied Biosystems).
Significant differential expression of the ebv-miR-BART2-5p miRNA was observed between normal endometrium from controls, normal endometrium from subjects having endometriosis and endometriosis (
It will be understood that the invention has been described by way of example only and modifications may be made whist remaining within the scope and spirit of the invention.
Columns
(i) This number is the SEQ ID NO as shown in the sequence listing. For an auto-antigen biomarker, the SEQ ID NO in the sequence listing provides the coding sequence for the auto-antigen biomarker. For a miRNA biomarker, the SEQ ID NO in the sequence listing provides the sequence of the mature, expressed miRNA biomarker, as shown in Tables 3 and 4.
(ii) The “Symbol” column gives the gene symbol which has been approved by the HGNC. The HONG aims to give unique and meaningful names to every miRNA end human gene. An additional dash-number suffix indicates pre-miRNAs that lead to identical mature miRNAs but that are located at different places in the genome.
(iii) The names for auto-antigens are taken from the Official Full Name provided by NCBI. An auto-antigen may have been referred to by one or more pseudonyms in the prior art. The invention relates to these auto-antigens regardless of their nomenclature. The names of the miRNA are taken from the specialist database, miRBase, according to version 16 (released, August 2010).
Homo sapiens actin beta
Homo sapiens adducin 1 (alpha)
Homo sapiens adenylosuccinate lyase
Homo sapiens adenylate kinase 2 transcript
Home sapiens kallikrein 3 (prostate specific
Homo sapiens activating transcription factor 1
Homo sapiens creatine kinase muscle
Homo sapiens CDC-like kinase 2 transcript
Homo sapiens damage-specific DNA binding
Homo sapiens deoxythymidylate kinase
Homo sapiens dual specificity phosphatase 4
Homo sapiens E2F transcription factors
Home sapiens exostoses (multiple) 2
Homo sapiens fibroblast growth factor
Homo sapiens c-fos induced growth factor
Homo sapiens FK506 binding protein 3 25 kDa
Home sapiens galactokinase 1
Homo sapiens glycerol kinase 2
Homo sapiens growth factor receptor-bound
Homo sapiens general transcription factor IIB
Homo sapiens general transcription factor IIH
Homo sapiens general transcription factor IIH
Homo sapiens heat shock 60 kDa protein 1
Homo sapiens isopentenyl-diphosphate delta
Homo sapiens interferon, gamma-inducible
Homo sapiens lactate dehydrogenase A
Homo sapiens lymphoblastic leukemia derived
Homo sapiens MAP/microtubule affinity-
Homo sapiens membrane protein
Homo sapiens tripartite motif-containing 37,
Homo sapiens neutrophil cytosolic factor 2
Homo sapiens ribosomal protein L10a
Homo sapiens nuclear transcription factor Y
Homo sapiens neuroblastoma RAS viral (v-
Homo sapiens neurotrophic tyrosine kinase
Homo sapiens 2′-5′-oligoadenylate synthetase
Hamo sapiens pyruvate carboxylase
Homo sapiens endometriosisTAIRE protein
Homo sapiens
Homo sapiens
Homo sapiens PHD finger protein 1 transcript
Homo sapiens pyruvate kinase muscle
Homo sapiens mitogen-activated protein
Home sapiens phosphoribosyl pyrophosphate
Homo sapiens RAN member RAS oncogene
Homo sapiens retinoblastoma 1 (including
Homo sapiens Homo sapiens RNA binding
Homo sapiens ret proto-oncogene (multiple
Homo sapiens RAR-related orphan receptor C
Homo sapiens ribosomal protein L18
Homo sapiens ribosomal protein L18a
Homo sapiens ribosomal protein L28
Homo sapiens ribosomal protein L31
Homo sapiens ribosomal protein L32
Homo sapiens S100 calcium binding protein
Homo sapiens sterol carrier protein 2
Homo sapiens seven in absentia homolog 1
Homo sapiens SWI/SNF related matrix
Homo sapiens SWI/SNF related matrix
Homo sapiens superoxide dismutase 2
Homo sapiens SRY (sex determining region
Homo sapiens SFRS protein kinase 1
Homo sapiens tropomyosin 1 (alpha)
Homo sapiens nuclear receptor subfamily 2
Homo sapiens thyroid hormone receptor
Homo sapiens Homo sapiens ubiquitin-
Homo sapiens vinculin
Homo sapiens zinc finger protein 41 transcript
Homo sapiens paired box gene 8 transcript
Homo sapiens nuclear receptor interacting
Homo sapiens phosphatidylinositol
Homo sapiens ubiquitously-expressed
Homo sapiens apoptosis inhibitor 5
Homo sapiens MAP kinase-interacting
Homo sapiens succinate-CoA ligase ADP-
Homo sapiens LIM domain binding 1
Homo sapiens amyloid beta precursor protein
Homo sapiens 3′-phosphoadenosine 5′-
Homo sapiens ubiquitin specific protease 10
Hamo sapiens Homo sapiens PRP4 pre-
Hamo sapiens absent in melanoma 2
Homo sapiens TBP-like 1
Homo sapiens TNF receptor-associated factor
Homo sapiens suppressor of cytokine
Homo sapiens zinc finger protein 305
Homo sapiens CDNA
Homo sapiens KIAA0101 gene product
Homo sapiens inositol hexaphosphate kinase
Homo sapiens ring finger protein 40 transcript
Homo sapiens gephyrin
Homo sapiens translational inhibitor protein
Homo sapiens STIP1 homology and U-Box
Homo sapiens TRAF interacting protein
Homo sapiens serine/threonine kinase 18
Homo sapiens cyclin I
Homo sapiens interleukin 24 transcript variant
Homo sapiens cell cycle related kinase
Homo sapiens poly(A) binding protein
Homo sapiens vitamin D receptor interacting
Homo sapiens non-metastatic cells 7 protein
Homo sapiens PHD finger protein 11
Homo sapiens interleukin-1 receptor-
Homo sapiens thioredoxin domain containing
Homo sapiens STE20-like kinase
Homo sapiens dual specificity phosphatase 24
Homo sapiens ankyrin repeat and SOCS box-
Homo sapiens Mst3 and SOK1-related kinase
Homo sapiens pelota homolog (Drosophila)
Homo sapiens ethanolamine kinase 2
Homo sapiens riboflavin kinase
Homo sapiens chromosome 9 open reading
Homo sapiens DEAD (Asp-Glu-Ala-Asp) box
Homo sapiens cDNA
Homo sapiens keich-like protein C3IP1
Homo sapiens Salvador homolog 1
Homo sapiens hypothetical protein MGC8407
Homo sapiens nucleosomal binding protein 1,
Homo sapiens alpha-kinase 1 mRNA (cDNA
Homo sapiens RNA binding motif and ELMO
Homo sapiens apoptosis-inducing factor (AIF)-
Homo sapiens ras homolog gene family
Home sapiens pygopus 2
Columns
(i) This number is the SEQ ID NO: for the coding sequence for the auto-antigen biomarker, as shown in the sequence listing.
(ii) The “ID” column shows the Entrez GeneID number for the antigen marker. An Entrez GeneID value is unique across all taxa.
(iii) The “Symbol” column gives the gene symbol which has been approved by the HGNC. The HGNC aims to give unique and meaningful names to every miRNA and human gene.
(iv) This name is taken from the Official Full Name provided by NCBI. An auto-antigen may have been referred to by one or more pseudonyms in the prior art. The invention relates to these auto-antigens regardless of their nomenclature.
(v) The HUGO Gene Nomenclature Committee aims to give unique and meaningful names to every human gene. The HGNC number thus identifies a unique human gene. An additional dash-number suffix indicates pre-miRNAs that lead to identical mature miRNAs but that are located at different places in the genome.
(v) A “GI” number, or “GenInfo Identifier”, is a series of digits assigned consecutively to each sequence record processed by NCBI when sequences are added to its databases. The GI number bears no resemblance to the accession number of the sequence record. When a sequence is updated (e.g. for correction, or to add more annotation or information) it receives a new GI number. Thus the sequence associated with a given GI number is never changed.
Columns for Tables 3 and 4
(i) The SEQ ID NO: for the sequence of the mature, expressed miRNA biomarker.
(ii) The “mRNA name” column gives the name of the human mRNA as provided by the specialist database, miRBase, according to version 15 (released. August 2010).
(iii) The “Symbol” column gives the gene symbol which has been approved by the HGNC. The HGNC aims to give unique and meaningful names to every miRNA and human gene. An additional dash-number suffix indicates pre-miRNAs that lead to identical mature miRNAs but that are located at different places in the genome.
(iv) The HGNC aims to give unique and meaningful names to every mRNA (and human gene). The HGNC number thus identifies a unique human gene. Inclusion on to HUGO is for human genes only.
Homo sapiens cell division cycle 42 (GTP
Homo sapiens epidermal growth factor receptor
Homo sapiens v-kit Hardy-Zuckerman 4 feline
Homo sapiens peroxisome proliferative
Homo sapiens Wilms tumor 1
Columns
(i)-(vi) are the same as those in Table 2.
Columns (Tables 6 & 7)
(i) The “ID” column shows the Entrez GeneID number for the antigen marker. An Entrez GeneID value is unique across all taxa.
(ii) The “Symbol” column gives the gene symbol which has been approved by the HGNC. The HGNC aims to give unique and meaningful names to every miRNA and human gene.
(iii) The represents the p-value of a microarray T-test derived from comparing case with control, as determined in study 1.
(iv) The biomarkers can be up-regulated (Le an increase in fold-change, when compared to control samples) or down-regulated (i.e. a decrease in fold-change, when compared to control samples), as determined in study 1.
Columns (Tables 8 & 9)
(i) The “mRNA name” column gives the name of the human mRNA as provided by the specialist database, miRBase, according to version 16 (released, August 2010).
(ii) The “p-value” represents the p-value of a microarray T test derived from comparing case with control, as determined in study 1.
(iii) The biomarkers can be up-regulated (i.e. an Increase in fold-change, when compered to control samples) or down-regulated (i.e. a decrease in fold-change, when compared to control samples), as determined in study 1.
Columns (Tables 10 to 14)
(i) S+S is the sum of the sensitivity and specificity columns, as determined in study 1.
(ii) and (iii) These two columns show the sensitivity and specificity of a test based solely on the relevant biomarker (or, for Tables 11-15, panel) shown in the left-hand column in the same row when tippled to the samples used in study 1.
(iv) For miRNA analysis, data was generated using either a microarray (“microarray”) or qPCR (“qPCR”) platform, as described in study 1. Autoantibody (“autoAb”) biomarkers were identified using the protein array platform described in study 1. Where panels were developed incorporating both miRNA and autoantibody biomarkers as variables (see study 1), these are described as “combiAbMir.”
Number | Date | Country | Kind |
---|---|---|---|
1403489.6 | Feb 2014 | GB | national |
Number | Date | Country | |
---|---|---|---|
Parent | 16874831 | May 2020 | US |
Child | 17720503 | US | |
Parent | 16132844 | Sep 2018 | US |
Child | 16874831 | US | |
Parent | 15121861 | Aug 2016 | US |
Child | 16132844 | US |