The present disclosure relates to the development of prognostic and diagnostic cancer biomarkers in biological material and the characterization of tumor subtype, vulnerabilities and therapeutic strategies, from the resurfacing of nullomers.
Cancer is the second leading cause of death worldwide (“Cancer” n.d.), and for most cancer types, survivability is significantly higher if the tumor is detected at an early stage (Hawkes 2019; Etzioni et al. 2003). Currently mass population screening is applicable only for breast and cervical cancers and utilizes physical tests like mammography and cytology screens. Detection for other cancer types, done both en masse and in a low and affordable resource setting, still poses a major challenge for the scientific and clinical communities (“Cancer” n.d.). In particular, a major hurdle is to single-out cancer biomarkers for the detection of cancer development at its earliest stage for patient stratification and improvement of patients' outcome by providing personalized treatments.
Circulating cell-free DNA (cfDNA) is an emerging and promising resource for cancer diagnostics and prognostics (Bronkhorst, Ungerer, and Holdenrieder 2019; Heitzer, Auinger, and Speicher 2020). It has a short life span (16 minutes to 2.5 hours), which makes it a highly temporal indicator of various processes occurring in the subject's body and with advances in sequencing technologies, can be rapidly analyzed. Analysis of cell-free tumor DNA (ctDNA, liquid biopsy) has become a prospective minimally invasive tool to screen the population and to monitor patients already diagnosed with cancer. To distinguish cancerous cells, their tissue of origin and cancer type, current technologies rely on sequencing to resolve somatic mutations (Zill et al. 2018) and epigenetic marks, such as DNA methylation or histone modifications that can determine the cancerous tissue (Saghafinia et al. 2018; Sadeh et al. 2021). However, ctDNA still has many hurdles and caveats that need to be overcome (Barbany et al. 2019). Some of the major hurdles include: 1) cfDNA is fragmented (180-360 base pairs) making its collection and extraction more challenging and the tumor-derived DNA makes up only a small portion (estimated to be around 0.4%) warranting the need for extremely sensitive biomarkers that can easily detect the presence of cancerous cells; 2) prior knowledge of specific mutations or methylation marks is required for targeted screening, and consequently the main focus has been on coding mutations which only constitute a small fraction of mutations; 3) cfDNA mutation and epigenetic diagnosis could be confounded by somatic alterations in white blood cells (Razavi et al. 2019); 4) the diagnostic techniques used to detect methylation or histone marks are technologically complex and can have low sensitivity and specificity (Ji et al. 2014; Worm Ørntoft 2018; Warton and Samimi 2015; Bronkhorst, Ungerer, and Holdenrieder 2019) and 5) to provide the most optimal cancer treatment, it needs to be diagnosed at preliminary stages when the tumor is small (˜5 mm in diameter). At these stages, the tumor produces minute levels of ctDNA that are difficult to detect using current methods (Bronkhorst, Ungerer, and Holdenrieder 2019).
Nullomers are short DNA sequences (11-18 base pairs) that are absent from the human genome (Hampikian and Andersen 2006; Vergni and Santoni 2016). While the absence of nullomeric sequences could be due to chance, we and others have shown that a significant proportion of them is under negative selection pressures (Georgakopoulos-Soares et al. 2020; Vergni and Santoni 2016), suggesting that they could have a deleterious effect on the genome. Experimental evidence was also provided through the observation that two out of three nullomers led to lethality in several cancerous cell types when delivered as synthetic peptides (Alileche et al. 2012; Alileche and Hampikian 2017). It has also been shown that these sequences could be used as DNA “fingerprints” identifying specific human populations and used for phylogenetic analyses between species (Georgakopoulos-Soares et al. 2020).
As nullomers do not exist in a human genome, their appearance due to mutagenesis followed by clonal expansion could be exploited as a diagnostic method for diseases associated with a mutational burden, such as cancer.
The disclosure relates to methods and compositions for the detection, identification, classification and characterization of cancer in general and cancer types in biological material.
The disclosure provides a method of identifying one or a plurality of nullomers in a sample comprising: (a) isolating a plurality of nucleic acids from the sample; (b) contacting the nucleic acids to one or a plurality of probes specific for one or a plurality of nullomers; (c) detecting the presence of the probes associated with the one or plurality of nullomers; and (d) correlating the presence or quantity of probes with the likelihood of the presence or quantity of nullomers in the sample. In some embodiments, the one or plurality of probes comprise a complementary nucleic acid sequence bound to or associated with a fluorescent molecule, radioactive isotope or chemiluminescent molecule. In some embodiments, the step of detecting is performed by mass spectrometry.
In some embodiments, the method further comprises, prior to step (b), disassociating a plurality of double stranded nucleic acid sequences comprising at least one nullomer by exposing the double-stranded nucleic acid sequences to a predetermined melting temperature for a period of time sufficient to create single stranded nullomer, annealing at least one primer to the nullomer, and allowing a sufficient period of time to extend the primer in the presence of dNTPs and DNA polymerase. In some embodiments, the steps of disassociating a plurality of double stranded nucleic acid sequences comprising at least one nullomer by exposing the double-stranded nucleic acid sequences to a predetermined melting temperature for a period of time sufficient to create single stranded nullomer, annealing at least one primer to the nullomer, and allowing a sufficient period of time to extend the primer in the presence of dNTPs and polymerase are repeated multiple times such that copies of the at least one nullomer are produced.
The disclosure further provides a method of identifying one or plurality of nullomers in a sample comprising: (a) isolating a plurality of nucleic acids from the sample; (b) contacting the nucleic acids to one or a plurality of probes specific for one or a plurality of nullomers; (c) detecting the presence of the probes associated with the one or plurality of nullomers; (d) correlating the presence or quantity of probes with the likelihood or the presence or quantity of nullomers in the sample; and (e) comparing the sequence of the nullomer with the sequence of a library of known nullomer sequences. In some embodiments, the probe or plurality of probes comprise a complementary nucleic acid sequence bound to or associated with a fluorescent molecule, radioactive isotope or chemiluminescent molecule. In some embodiments, the method further comprises a step of performing polymerase chain reaction (PCR) with one or a plurality of primers specific for the one or plurality of nullomers.
The disclosure also provides a computer-implemented method of identifying a mutation associated with a hyperproliferative disorder comprising: (a) isolating one or a plurality of nucleic acid molecules from a sample associated with the hyperproliferative disorder; (b) contacting the nucleic acids to one or a plurality of probes specific for one or a plurality of nullomers; (c) in a system configured to compile data and detect the presence or quantify the presence of a nucleic acid sequence, detecting the presence of the probes associated with the one or plurality of nullomers; (d) correlating the presence or quantity of the nullomer to the likelihood of a specific mutation serving as a biomarker for a hyperproliferative disorder. In some embodiments, the method further comprises, prior to step (a), in a system configured to compile data and detect the presence or quantity of nucleic acids in a sample: compiling genetic data about a population of subjects including the subject that has a mutation candidate that is a biomarker for a hyperproliferative disorder. In some embodiments, the method further comprises, after step (d), a step of: (e) selecting a cancer treatment for the subject based upon identification of the hyperproliferative disorder. In some embodiments, the hyperproliferative disorder is breast cancer, pancreatic cancer, or liver cancer. In some embodiments, the hyperproliferative disorder is breast cancer, pancreatic cancer, esophagus cancer, lymphoid cancer, kidney cancer, ovary cancer, head and neck cancer, lung cancer, stomach cancer, CNS cancer, uterus cancer, skin cancer, colorectal cancer, prostate cancer, bladder cancer, bone and soft tissue cancer, biliary cancer, cervix cancer, thyroid cancer, myeloid cancer, or liver cancer. In some embodiments, the hyperproliferative disorder is a malignant tumor. In some embodiments, the sample is a brush biopsy, puncture biopsy, fluid from a needle biopsy, blood, blood cells, cells from a hair sample, nucleic acids from a hair sample, saliva, or spit. In some embodiments, the probe or plurality of probes comprise a complementary nucleic acid sequence bound to or associated with a fluorescent molecule, radioactive isotope or chemiluminescent molecule. In some embodiments, the method further comprises a step of performing PCR with one or a plurality of primers specific for the one or plurality of nullomers.
The disclosure additionally provides a method of treating a hyperproliferative disorder in a subject in need thereof comprising: (a) exposing a sample from the subject to a probe specific for at least one nullomer chosen from Table 1; (b) detecting the presence, absence or quantity of the at least one nullomer in the sample; (c) normalizing the presence, absence, or quantity of the at least one nullomer in the sample against the presence, absence or quantity of the at least one nullomer in a sample of a healthy subject or a sample of a subject known to have the hyperproliferative disorder; (d) correlating the presence, absence, or quantity of the at least one nullomer in the sample to the subject having the hyperproliferative disorder; and (e) administering a therapeutically effective amount of one or a plurality of active agents to the subject. In some embodiments, the method further comprises obtaining the sample from the subject prior to the step of exposing. In some embodiments, the one or plurality of active agents is chosen from one or a combination of the agents identified in Table 3. In some embodiments, the sample is plasma, serum, whole blood, respiratory tissue, respiratory mucosal sample, saliva, urine, blood cells, cells from a hair sample, nucleic acids from a hair sample, or spit. In some embodiments, step (b) further comprises calculating one or more scores based upon the presence, absence, or quantity of the at least one nullomer, and step (d) further comprises correlating the one or more scores to the presence, absence, or quantity of the at least one nullomer such that, if the amount of the at least one nullomer is greater than the quantity of the at least one nullomer in a control sample; or, if the amount of the at least one nullomer is substantially equal to the quantity of the at least one nullomer in a sample taken from a subject known to have a hyperproliferative disorder, then the subject is diagnosed as having a hyperprolifferative disorder. In some embodiments, the probe is a radioactive probe, a chemoluminescent probe, or a fluorescent probe. In some embodiments, the sample is free of cells.
In some embodiments, the at least one nullomer is detected by next generation sequencing, quantitative real-time reverse transcription-PCR (qRT-PCR), isothermal amplification, microarray, multiplex nullomer profiling assay, RNA in situ hybridization (RNA-ish), or northern blotting. In some embodiments, the at least one nullomer is detected by qRT-PCR. In some embodiments, the step of quantifying at least one quantity of the at least one nullomer in the sample comprises using a fluorescence and/or digital imaging.
In some embodiments, the step of analysing comprises detecting a presence, absence, or quantity of at least 2 different nullomers. In some embodiments, the step of analysing comprises detecting the presence, absence, or quantity of the at least one nullomer by PCR amplification using one or a plurality of primers specific for the at least one nullomer chosen from Table 1. In some embodiments, the step of analysing comprises detecting presence, absence, or quantity of the at least one nullomer by a probe comprising a nucleic acid sequence complementary to the nucleic acid sequence of the at least one nullomer.
The disclosure further provide a method of diagnosing a subject with cancer comprising: (a) contacting a plurality of nucleic acids from a sample to a system comprising a probe specific for one or a plurality of nullomers; and (b) detecting the presence of or quantifying the amount of one or more nucleic acids from the sample. In some embodiments, the method comprises detecting the presence, absence or quantity of one or a plurality of the nullomers provided in Table 1. In some embodiments, the method comprises detecting the presence, absence or quantity of nullomers that comprise at least 93% sequence identify to one or a plurality of the nullomers provided in Table 1. In some embodiments, the at least one nullomer is detected by qRT-PCR. In some embodiments, the at least one nullomer is detected by CRISPR diagnosis. In some embodiments, the at least one nullomer is detected by CRISPR diagnosis and Cas9, Cas12 or Cas13 protein is used.
In some embodiments, the method further comprises, after the step of detecting, normalizing the quantity of the probe as compared to a quantity of signal from a negative control. In some embodiments, the method further comprises, after the step of detecting, correlating the one or more scores to the presence, absence, or quantity of the at least one nullomer such that, if the amount of the at least one nullomer is greater than the quantity of the at least one nullomer in a control sample; or, if the amount of the at least one nullomer is substantially equal to the quantity of the at least one nullomer in a sample taken from a subject known to have a hyperproliferative disorder, then the subject is diagnosed as having a hyperprolifferative disorder. In some embodiments, the hyperproliferative disorder is breast cancer, pancreatic cancer, or liver cancer. In some embodiments, the hyperproliferative disorder is breast cancer, pancreatic cancer, esophagus cancer, lymphoid cancer, kidney cancer, ovary cancer, head and neck cancer, lung cancer, stomach cancer, CNS cancer, uterus cancer, skin cancer, colorectal cancer, prostate cancer, bladder cancer, bone and soft tissue cancer, biliary cancer, cervix cancer, thyroid cancer, myeloid cancer, or liver cancer.
Also provided is a kit comprising one or more probes or primers for detecting the presence, absence or quantity of one or a plurality of the nullomers provided in Table 1 or nullomers that comprise at least 93% sequence identify to one or a plurality of the nullomers provided in Table 1. In some embodiments, the one or more probes comprised in the disclosed kit comprise one or a combination of the nullomer sequences of Table 1 or complementary thereof.
Further provided is a computer program product encoded on a computer-readable storage medium, wherein the computer program product comprises instructions for: a) detecting the presence, absence or quantity of at least one nullomer in a sample of a subject; b) normalizing the presence, absence, or quantity of the at least one nullomer in the sample against the presence, absence or quantity of the at least one nullomer in a control sample; and c) correlating the presence, absence, or quantity of the at least one nullomer in the sample to a likelihood that the subject having a hyperproliferative disorder. In some embodiments, the computer program product further comprises instructions for calculating a score associated with the presence, absence or quantity of the at least one nullomer in the sample and correlating the score to a likelihood that the subject has a hyperproliferative disorder. In some embodiments, the computer program product further comprises instructions for: a) detecting and normalizing the presence, absence or quantity of a second nullomer in the sample; b) calculating a combined score associated with the presence, absence or quantity of the at least one nullomer and the second nullomer in the sample; and c) correlating the combined score to a likelihood that the subject having a hyperproliferative disorder. In some embodiments, at least 2 different nullomers in the sample are detected, normalized and correlated by the computer program product. In some embodiments, the computer program product detects the presence, absence, or quantity of the at least one nullomer by qRT-PCR amplification. In some embodiments, the control sample used in the computer program product is obtained from a subject free of a hyperproliferative disorder.
The disclosure also provides a system comprising: a) the computer program product of any one of claims 54 to 59; and b) a processor operable to execute programs; and/or a memory associated with the processor.
The disclosure further provides a system for detecting the presence or quantity of nullomer in a sample of a subject comprising: a processor operable to execute programs, a memory associated with the processor, a database associated with said processor and said memory, and a program stored in the memory and executable by the processor, the program being operable for: a) detecting the presence, absence or quantity of at least one nullomer in a sample of a subject; b) normalizing the presence, absence, or quantity of the at least one nullomer in the sample against the presence, absence or quantity of the at least one nullomer in a control sample; and c) correlating the presence, absence, or quantity of the at least one nullomer in the sample to a likelihood that the subject having a hyperproliferative disorder. In some embodiments, the program is further operable for calculating a score associated with the presence, absence or quantity of the at least one nullomer in the sample and correlating the score to a likelihood that the subject has a hyperproliferative disorder. In some embodiments, the program is further operable for detecting and normalizing the presence, absence or quantity of a second nullomer in the sample.
In some embodiments, the one or plurality of probes used in any of the disclosed methods, systems, or computer program product, or comprised in any of the disclosed kits comprise a nucleic acid sequence chosen from Table 1. In some embodiments, the one or plurality of probes used in any of the disclosed methods, systems, or computer program product, or comprised in any of the disclosed kits comprise a nucleic acid sequence comprising at least about 93% sequence identity to any of the sequences in Table 1.
In some embodiments, the one or plurality of probes used in any of the disclosed methods, systems, or computer program product, or comprised in any of the disclosed kits comprise a nucleic acid sequence that is complementary to any of the nullomer sequences provided in Table 1, or a fragment thereof. In some embodiments, the one or plurality of probes used in any of the disclosed methods, systems, or computer program product, or comprised in any of the disclosed kits comprise a nucleic acid sequence that is complementary to a nullomer comprising at least about 93% sequence identity to any of the nullomer sequences provided in Table 1, or a fragment thereof.
In some embodiments, the one or plurality of primers specific for the one or plurality of nullomers used in any of the disclosed methods, systems, or computer program product, or comprised in any of the disclosed kits comprise a nucleic acid sequence that is complementary to any of the nullomer sequences provided in Table 1, or a fragment thereof. In some embodiments, the one or plurality of primers specific for the one or plurality of nullomers used in any of the disclosed methods, systems, or computer program product, or comprised in any of the disclosed kits comprise a nucleic acid sequence that is complementary to a nullomer comprising at least about 93% sequence identity to any of the nullomer sequences provided in Table 1, or a fragment thereof.
Before the present methods and systems are described, it is to be understood that the present disclosure is not limited to the particular processes, compositions, or methodologies described, as these may vary. It is also to be understood that the terminology used in the description is for the purposes of describing the particular versions or embodiments only, and is not intended to limit the scope of the present disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meanings as commonly understood by one of ordinary skill in the art. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present disclosure, the methods, devices, and materials in some embodiments are now described. All publications mentioned herein are incorporated by reference in their entireties. Nothing herein is to be construed as an admission that the present disclosure is not entitled to antedate such disclosure by virtue of prior invention.
Unless specifically defined otherwise, all technical and scientific terms used herein shall be taken to have the same meaning as commonly understood by one of ordinary skill in the art (e.g., in cell culture, molecular genetics, microRNA and detection thereof, immunology, immunohistochemistry, protein chemistry, and biochemistry). The meaning and scope of the terms should be clear, however, in the event of any latent ambiguity, definitions provided herein take precedent over any dictionary or extrinsic definition. Further, unless otherwise required by context, singular terms shall include pluralities and plural terms shall include the singular.
The indefinite articles “a” and “an,” as used herein in the specification and in the claims, unless clearly indicated to the contrary, should be understood to mean “at least one.”
The phrase “and/or,” as used herein in the specification and in the claims, should be understood to mean “either or both” of the elements so conjoined, i.e., elements that are conjunctively present in some cases and disjunctively present in other cases. Other elements may optionally be present other than the elements specifically identified by the “and/or” clause, whether related or unrelated to those elements specifically identified unless clearly indicated to the contrary. Thus, as a non-limiting example, a reference to “A and/or B,” when used in conjunction with open-ended language such as “comprising” can refer, in one embodiment, to A without B (optionally including elements other than B); in another embodiment, to B without A (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
As used herein in the specification and in the claims, “or” should be understood to have the same meaning as “and/or” as defined above. For example, when separating items in a list, “or” or “and/or” shall be interpreted as being inclusive, i.e., the inclusion of at least one, but also including more than one, of a number or list of elements, and, optionally, additional unlisted items. Only terms clearly indicated to the contrary, such as “only one of” or “exactly one of,” or, when used in the claims, “consisting of,” will refer to the inclusion of exactly one element of a number or list of elements. In general, the term “or” as used herein shall only be interpreted as indicating exclusive alternatives (i.e. “one or the other but not both”) when preceded by terms of exclusivity, “either,” “one of,” “only one of,” or “exactly one of.” “Consisting essentially of,” when used in the claims, shall have its ordinary meaning as used in the field of patent law.
The term “about” is used herein to mean within the typical ranges of tolerances in the art. For example, “about” can be understood as about 2 standard deviations from the mean. According to certain embodiments, when referring to a measurable value such as an amount and the like, “about” is meant to encompass variations of ±20%, ±10%, ±5%, ±1%, ±0.9%, ±0.8%, ±0.7%, ±0.6%, ±0.5%, ±0.4%, ±0.3%, ±0.2% or ±0.1% from the specified value as such variations are appropriate to perform the disclosed methods. When “about” is present before a series of numbers or a range, it is understood that “about” can modify each of the numbers in the series or range.
As used herein, the term “animal” includes, but is not limited to, humans and non-human vertebrates such as wild animals, rodents, such as rats, ferrets, and domesticated animals, and farm animals, such as dogs, cats, horses, pigs, cows, sheep, and goats. In some embodiments, the animal is a mammal. In some embodiments, the animal is a human. In some embodiments, the animal is a non-human mammal.
An “algorithm,” “formula,” or “model” is any mathematical equation, algorithmic, analytical or programmed process, or statistical technique that takes one or more continuous or categorical inputs (herein called “parameters”) and calculates an output value, sometimes referred to as an “index” or “index value.” Non-limiting examples of “formulas” include sums, ratios, and regression operators, such as coefficients or exponents, biomarker (e.g., nullomers disclosed herein) value transformations and normalizations (including, without limitation, those normalization schemes based on clinical parameters, such as gender, age, or ethnicity), rules and guidelines, statistical classification models, and neural networks trained on historical populations. Of particular use in combining markers are linear and non-linear equations and statistical classification analyses to determine the relationship between levels of the biomarkers detected in a subject sample and the subject's risk of disease (for example). In panel and combination construction, of particular interest are structural and syntactic statistical classification algorithms, and methods of risk index construction, utilizing pattern recognition features, including established techniques such as cross correlation, Principal Components Analysis (PCA), factor rotation, Logistic Regression (LogReg), Linear Discriminant Analysis (LDA), Eigengene Linear Discriminant Analysis (ELDA), Support Vector Machines (SVM), Random Forest (RF), Recursive Partitioning Tree (RPART), as well as other related decision tree classification techniques, Shruken Centroids (SC), StepAIC, Kth-Nearest Neighbor, Boosting, Decision Trees, Neural Networks, Bayesion Networks, Support Vector Machines, and Hidden Markov Models, among others. Many of these techniques are useful either combined with a biomarker selection technique, such as forward selection, backwards selection, or stepwise selection, complete enumeration of all potential panels of a given size, genetic algorithms, or they may themselves include biomarker selection methodologies in their own technique. These may be coupled with information criteria, such as Akaike's Information Criterion (AIC) or Bayes Information Criterion (BIC), in order to quantify the tradeoff between additional biomarkers and model improvement, and to aid in minimizing overfit. The resulting predictive models may be validated in other studies, or cross-validated in the study they were originally trained in, using such techniques as Leave-One-Out (LOO) and 10-Fold cross-validation (10-Fold-CV).
The term “at least” prior to a number or series of numbers (e.g. “at least two”) is understood to include the number adjacent to the term “at least,” and all subsequent numbers or integers that could logically be included, as clear from context. When “at least” is present before a series of numbers or a range, it is understood that “at least” can modify each of the numbers in the series or range.
The term “biomarker” as used herein refers to a biological molecule present in an individual at varying concentrations useful in predicting the cancer status of an individual. A biomarker may include but is not limited to, nucleic acids, proteins and variants and fragments thereof. A biomarker may be DNA comprising the entire or partial nucleic acid sequence encoding the biomarker, or the complement of such a sequence. Biomarker nucleic acids useful in the disclosure are considered to include both DNA and RNA comprising the entire or partial sequence of any of the nucleic acid sequences of interest. In some embodiments, the biomarker of the disclosure is any of the nullomers disclosed herein.
The term “bodily fluid” as used herein refers to a bodily fluid including blood (or a fraction of blood such as plasma or serum), lymph, mucus, tears, saliva, sweat, sputum, urine, semen, stool, cerebrospinal fluid (CSF), breast milk, and, ascities fluid. In some embodiments, the bodily fluid is blood. In some embodiments, the bodily fluid is a fraction of blood. In some embodiments, the bodily fluid is plasma. In some embodiments, the bodily fluid is serum. In some embodiments, the bodily fluid is urine.
The terms “cancer” and “cancerous” as used herein refer to or describe a physiological condition in mammals in which a population of cells are characterized by unregulated cell growth. Thus, the term “cancer” refers to a group of diseases involving abnormal cell growth with the potential to invade or spread to other parts of the body. Examples of cancer include, but not limited to, lung cancer, bone cancer, blood cancer, chronic myelomonocytic leukemia (CMML), bile duct cancer, cervical cancer, liver cancer, pancreatic cancer, skin cancer, cancer of the head and neck, cancer of the eye, cutaneous or intraocular melanoma, uterine cancer, ovarian cancer, rectal cancer, cancer of the anal region, stomach cancer, colon cancer, breast cancer, testicular cancer, gynecologic tumors (e.g., uterine sarcomas, carcinoma of the fallopian tubes, carcinoma of the endometrium, carcinoma of the cervix, carcinoma of the vagina or carcinoma of the vulva), Hodgkin's disease, cancer of the esophagus, cancer of the small intestine, cancer of the endocrine system (e.g., cancer of the thyroid, parathyroid or adrenal glands), sarcomas of soft tissues, cancer of the urethra, cancer of the penis, prostate cancer, chronic or acute leukemia, solid tumors of childhood, lymphocytic lymphomas, cancer of the bladder, cancer of the kidney or ureter (e.g., renal cell carcinoma, carcinoma of the renal pelvis), or neoplasms of the central nervous system (e.g., primary CNS lymphoma, spinal axis tumors, brain stem gliomas or pituitary adenomas).
As used herein, the term “characterizing cancer in a subject” refers to the identification of one or more properties of a cancer sample in a subject, including but not limited to, the presence of benign, pre-cancerous or cancerous tissue, the stage of the cancer, the type of the cancer, the tissue of origin of the cancer, and the subject's prognosis. Cancers may be characterized by the identification of the expression of one or more cancer marker genes, including but not limited to, the nullomers disclosed herein. As used herein, the term “stage of cancer” refers to a qualitative or quantitative assessment of the level of advancement of a cancer. Criteria used to determine the stage of a cancer include, but are not limited to, the size of the tumor and the extent of metastases (e.g., localized or distant). In some embodiments, the subject has been previously diagnosed with having a cancer and received, or is currently receiving, cancer treatment, including but not limited to surgical intervention and cancer therapy, and in such embodiments, the term “characterizing cancer in a subject” refers to monitoring the progress of the cancer treatment.
The terms “complementary” or “complementarity” refer to polynucleotides (i.e., a sequence of nucleotides) related by base-pairing rules, for example, the sequence “5′-AGT-3′,” is complementary to the sequence “5′-ACT-3′.” Complementarity may be “partial,” in which only some of the nucleic acids' bases are matched according to the base pairing rules, or there may be “complete” or “total” complementarity between the nucleic acids. The degree of complementarity between nucleic acid strands can have significant effects on the efficiency and strength of hybridization between nucleic acid strands under defined conditions. This is of particular importance for methods that depend upon binding between nucleic acid bases.
As used herein, the terms “comprising” (and any form of comprising, such as “comprise,” “comprises,” and “comprised”), “having” (and any form of having, such as “have” and “has”), “including” (and any form of including, such as “includes” and “include”), or “containing” (and any form of containing, such as “contains” and “contain”), are inclusive or open-ended and do not exclude additional, unrecited elements or method steps.
The term “correlate” or “correlating” as used herein refers to a statistical association between instances of two events, where events may include numbers, data sets, and the like. For example, when the events involve numbers, a positive correlation (also referred to herein as a “direct correlation”) means that as one increases, the other increases as well. A negative correlation (also referred to herein as an “inverse correlation”) means that as one increases, the other decreases. The disclosure provides nullomers, the levels of which are correlated with a particular outcome measure, such as between the presence of a particular nullomer and the likelihood of developing a particular type of cancer. For example, the increased level of a nullomer may be negatively correlated with a likelihood of good clinical outcome for the patient. In this case, for example, the patient may have a decreased likelihood of long-term survival without recurrence of the cancer and/or a positive response to a chemotherapy, and the like. Such a negative correlation indicates that the patient likely has a poor prognosis or will respond poorly to a chemotherapy, and this may be demonstrated statistically in various ways, e.g., by a high hazard ratio.
As used herein, the terms “detect,” “detecting” or “detection” refer to either the general act of discovering or discerning or the specific observation of a composition. Detecting a composition may comprise determining the presence or absence of a composition. Detecting may comprise quantifying a composition. For example, detecting comprises determining the expression level of a composition. The composition may comprise a nucleic acid molecule. For example, the composition may comprise one or a plurality of the nullomers disclosed herein. Alternatively, or additionally, the composition may be a detectably labeled composition.
The term “diagnosis” or “prognosis” as used herein refers to the use of information (e.g., genetic information or data from other molecular tests on biological samples, signs and symptoms, physical exam findings, cognitive performance results, etc.) to anticipate the most likely outcomes, timeframes, and/or response to a particular treatment for a given disease, disorder, or condition, based on comparisons with a plurality of individuals sharing common nucleotide sequences, symptoms, signs, family histories, or other data relevant to consideration of a patient's health status.
The terms “functional fragment” means any portion of a polypeptide or nucleic acid sequence from which the respective full-length polypeptide or nucleic acid relates that is of a sufficient length and has a sufficient structure to confer a biological affect that is similar or substantially similar to the full-length polypeptide or nucleic acid upon which the fragment is based. In some embodiments, a functional fragment is a portion of a full-length or wild-type nucleic acid sequence that encodes any one of the nucleic acid sequences disclosed herein, and said portion encodes a polypeptide of a certain length and/or structure that is less than full-length but encodes a domain that still biologically functional as compared to the full-length or wild-type protein. In some embodiments, the functional fragment may have a reduced biological activity, about equivalent biological activity, or an enhanced biological activity as compared to the wild-type or full-length polypeptide sequence upon which the fragment is based. In some embodiments, the functional fragment is derived from the sequence of an organism, such as a human. In such embodiments, the functional fragment may retain about 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91%, or 90% sequence identity to the wild-type or given sequence upon which the sequence is derived. In some embodiments, the functional fragment may retain about 85%, 80%, 75%, 70%, 65%, or 60% sequence identity to the wild-type sequence upon which the sequence is derived. In some embodiments, the given sequence is a nullomer sequence of Table 1. In other embodiments, the given sequence is a complementary sequence of any of the nullomer sequences of Table 1.
The term “hyperproliferation” as used herein is defined as clonal expansion, in which daughter cells share a set of somatic mutations that were not originally present in the germline and which could include but are not limited to driver mutations. Clonal expansion could include but is not limited to resistance to cell death, evasion of growth suppressors, sustaining proliferate signaling, enabling replicative immortality, activating invasion and metastasis or inducing angiogenesis.
The term “hyperproliferative cell” refers to a cell located in a tissue or organ having a “hyperproliferative disorder,” a disease or disorder characterized by abnormal proliferation, abnormal growth, abnormal senescence, abnormal quiescence, or abnormal removal of cells in an organism, and includes all forms of hyperplasias, neoplasias, and cancer. In some embodiments, the “hyperproliferative cell” is a precancerous cell in form of hyperplasias. In some embodiments, the “hyperproliferative cell” is precancerous cell in form of neoplasias. In some embodiments, the “hyperproliferative cell” is a cancerous cell. In some embodiments, the hyperproliferative disorder or disease is a cancer derived from the gastrointestinal tract or urinary system. In some embodiments, a hyperproliferative disorder or disease is a cancer of the adrenal gland, bile ducts, bladder, blood, bone, bone marrow, brain, breast, cervix, colon, esophagus, eye, gall bladder, ganglia, gastrointestinal tract, heart, lymphatic system, liver, lung, kidney, muscle, ovary, pancreas, parathyroid, penis, prostate, prostate glands, rectum, salivary glands, skin, spine, stomach, spleen, testis, thymus, thyroid, or uterus. In some embodiments, the term hyperproliferative disorder or disease is a cancer chosen from: lung cancer, bone cancer, blood cancer, chronic myelomonocytic leukemia (CMML), bile duct cancer, cervical cancer, liver cancer, pancreatic cancer, skin cancer, cancer of the head and neck, cancer of the eye, cutaneous or intraocular melanoma, uterine cancer, ovarian cancer, rectal cancer, cancer of the anal region, stomach cancer, colon cancer, breast cancer, testicular cancer, gynecologic tumors (e.g., uterine sarcomas, carcinoma of the fallopian tubes, carcinoma of the endometrium, carcinoma of the cervix, carcinoma of the vagina or carcinoma of the vulva), Hodgkin's disease, cancer of the esophagus, cancer of the small intestine, cancer of the endocrine system (e.g., cancer of the thyroid, parathyroid or adrenal glands), sarcomas of soft tissues, cancer of the urethra, cancer of the penis, prostate cancer, chronic or acute leukemia, solid tumors of childhood, lymphocytic lymphomas, cancer of the bladder, cancer of the kidney or ureter (e.g., renal cell carcinoma, carcinoma of the renal pelvis), or neoplasms of the central nervous system (e.g., primary CNS lymphoma, spinal axis tumors, brain stem gliomas or pituitary adenomas). In some embodiments, the hyperproliferative disorder or disease is a breast cancer, pancreatic cancer, esophagus cancer, lymphoid cancer, kidney cancer, ovary cancer, head and neck cancer, lung cancer, stomach cancer, CNS cancer, uterus cancer, skin cancer, colorectal cancer, prostate cancer, bladder cancer, bone and soft tissue cancer, biliary cancer, cervix cancer, thyroid cancer, myeloid cancer, or liver cancer. In some embodiments, the hyperproliferative disorder or disease comprises one or a plurality of mutations in one or a plurality of genes selected from Table A.
As used herein, the phrase “in need thereof” means that the animal or mammal has been identified or suspected as having a need for the particular method or treatment. In some embodiments, the identification can be by any means of diagnosis or observation. In any of the methods and treatments described herein, the animal or mammal can be in need thereof.
The term “label” as used herein refers to any atom or molecule that can be used to provide a detectable (preferably quantifiable) effect, and that can be attached to a nucleic acid or protein. Labels include but are not limited to dyes; radiolabels such as 2P; binding moieties such as biotin; haptens such as digoxgenin; luminogenic, phosphorescent or fluorogenic moieties; and fluorescent dyes alone or in combination with moieties that can suppress or shift emission spectra by fluorescence resonance energy transfer (FRET). Labels may provide signals detectable by fluorescence, radioactivity, colorimetry, gravimetry, X-ray diffraction or absorption, magnetism, enzymatic activity, and the like. A label may be a charged moiety (positive or negative charge) or alternatively, may be charge neutral. Labels can include or consist of nucleic acid or protein sequence, so long as the sequence comprising the label is detectable. In some embodiments, nucleic acids are detected directly without a label (e.g., directly reading a sequence).
The term “level” as used herein refers to qualitative or quantitative amount of the number of copies of a nullomer. A nullomer exhibits an “increased level” when the level of the nullomer is higher in a first sample, such as in a clinically relevant subpopulation of patients (e.g., patients who have cancer), than in a second control sample, such as in a related subpopulation (e.g., patients who do not have cancer). In the context of an analysis of a level of a nullomer in a tumor sample obtained from an individual patient, a nullomer exhibits “increased level” when the level of the nullomer in the subject trends toward, or more closely approximates, the level characteristic of a clinically relevant subpopulation of patients.
The term “measuring” or “measurement” means assessing the presence, absence, quantity or amount (which can be an effective amount) of either a given substance within a clinical or subject-derived sample, including the derivation of qualitative or quantitative concentration levels of such substances, or otherwise evaluating the values or categorization of a subject's clinical parameters. Alternatively, the term “detecting” or “detection” may be used and is understood to cover all measuring or measurement as described herein.
The term “metastasis” as used herein refers to the process by which a cancer spreads or transfers from the site of origin to other regions of the body. A “metastatic” or “metastasizing” cell is one that loses adhesive contacts with neighboring cells and migrates (e.g., via the bloodstream or lymph) from the primary site of disease to secondary sites.
The particular use of terms “nucleic acid,” “oligonucleotide,” and “polynucleotide” should in no way be considered limiting and may be used interchangeably herein. “Oligonucleotide” is used when the relevant nucleic acid molecules typically comprise less than about 100 bases. “Polynucleotide” is used when the relevant nucleic acid molecules typically comprise more than about 100 bases. Both terms are used to denote a DNA, RNA, modified or synthetic DNA or RNA sequence (including, but not limited to nucleic acids comprising synthetic and naturally-occurring base analogs, dideoxy or other sugars, thiols or other non-natural or natural polymer backbones), or other nucleobase containing polymers capable of hybridizing to DNA and/or RNA. Accordingly, the terms should not be construed to define or limit the length of the nucleic acids referred to and used herein, nor should the terms be used to limit the nature of the polymer backbone to which the nucleobases are attached.
The term “nucleic acid sequence” or “polynucleotide sequence” refers to a contiguous string of nucleotide bases and in particular contexts also refers to the particular placement of nucleotide bases in relation to each other as they appear in a polynucleotide.
“Nucleobase” means a heterocyclic moiety capable of non-covalently pairing with another nucleobase.
“Nucleoside” means a nucleobase linked to a sugar moiety.
“Nucleotide” means a nucleoside having a phosphate group covalently linked to the sugar portion of a nucleoside. In some embodiments, the nucleotide is characterized as being modified if the 3′ phosphate group is covalently linked to a contiguous nucleotide by any linkage other than a phosphodiester bond.
“Compound comprising a modified oligonucleotide consisting of a number of linked nucleosides” means a compound that includes a modified oligonucleotide having the specified number of linked nucleosides. Thus, the compound may include additional substituents or conjugates. Unless otherwise indicated, the compound does not include any additional nucleosides beyond those of the modified oligonucleotide.
“Modified oligonucleotide” means an oligonucleotide having one or more modifications relative to a naturally occurring terminus, sugar, nucleobase, and/or internucleoside linkage. A modified oligonucleotide may comprise unmodified nucleosides.
“Single-stranded modified oligonucleotide” means a modified oligonucleotide which is not hybridized to a complementary nucleic acid strand.
“Modified nucleoside” means a nucleoside having any change from a naturally occurring nucleoside. A modified nucleoside may have a modified sugar, and an unmodified nucleobase. A modified nucleoside may have a modified sugar and a modified nucleobase. A modified nucleoside may have a natural sugar and a modified nucleobase. In some embodiments, a modified nucleoside is a bicyclic nucleoside. In some embodiments, a modified nucleoside is a non-bicyclic nucleoside.
The term “nullomers” as used herein refers to expressed oligonucleotide sequences in a species, the genetic templates of which are congenitally absent in the species. In some embodiments, the nullomers of the disclosure are nullomers not present in the published human genome sequences. In some embodiments, the nullomers of the disclosure are nullomers not present in the published human genome sequences and associated with one or a plurality of cancers.
As used herein “one or more of” includes at least one of the recited components, or 2, 3, 4, 5, or 5 etc. of the recited components. In some embodiments, the phase includes all of the recited components.
Ranges provided herein are understood to include all individual integer values and all subranges within the ranges.
As used herein, the term “sample” refers to a biological sample obtained or derived from a source of interest, as described herein. In some embodiments, a source of interest comprises an organism, such as an animal or human. In some embodiments, a biological sample comprises biological tissue or fluid. In some embodiments, a biological sample may be or comprise bone marrow, blood, blood cells, cells from a hair sample, ascites, tissue or fine needle biopsy samples, cell-containing body fluids, free floating nucleic acids, sputum, saliva or spit, urine, cerebrospinal fluid, peritoneal fluid, pleural fluid, feces, lymph, gynecological fluids, skin swabs, vaginal swabs, oral swabs, nasal swabs, washings or lavages such as a ductal lavages or broncheoalveolar lavages, aspirates, scrapings, bone marrow specimens, tissue biopsy specimens, surgical specimens, feces, other body fluids, secretions and/or excretions, and/or cells therefrom, etc. In some embodiments, the sample is a brush biopsy, puncture biopsy, or fluid from a needle biopsy. In some embodiments, the sample is blood or blood cells. In some embodiments, the sample is cells from a hair sample or nucleic acids from a hair sample. In some embodiments, the sample is sputum, saliva or spit. In some embodiments, a biological sample is or comprises cells obtained from an individual. In some embodiments, a sample is a “primary sample” obtained directly from a source of interest by any appropriate means. For example, in some embodiments, a primary biological sample is obtained by methods selected from the group consisting of biopsy (e.g., fine needle aspiration or tissue biopsy), surgery, collection of body fluid (e.g., blood, lymph, feces etc.), etc. In some embodiments, as will be clear from context, the term “sample” refers to a preparation that is obtained by processing (e.g., by removing one or more components of and/or by adding one or more agents to) a primary sample. For example, filtering using a semi-permeable membrane. Such a “processed sample” may comprise, for example nucleic acids or proteins extracted from a sample or obtained by subjecting a primary sample to techniques such as amplification or reverse transcription of mRNA, isolation and/or purification of certain components, etc.
As used herein, the term “minimal residual disease” refers to a small number of cancer cells remaining in the body after treatment or surgical intervention. These cells cannot usually be detected by standard scans or tests, due to lower abundance than detection sensitivity thresholds.
A “score” is a value or set of values selected so as to provide a normalized quantitative measure of a variable or characteristic of a subject's condition, and/or to discriminate, differentiate or otherwise characterize a subject's condition. The value(s) comprising the score can be based on, for example, quantitative data resulting in a measured amount of one or more sample constituents obtained from the subject, or from clinical parameters, or from clinical assessments, or any combination thereof. In certain embodiments, the score can be derived from a single constituent, parameter or assessment, while in other embodiments the score is derived from multiple constituents, parameters and/or assessments. The score can be based upon or derived from an interpretation function; e.g., an interpretation function derived from a particular predictive model using any of various statistical algorithms known in the art. A “change in score” can refer to the absolute change in score, e.g. from one time point to the next, or the percent change in score, or the change in the score per unit time (i.e., the rate of score change). In some embodiments, the score is calculated through an interpretation function or algorithm. In some embodiments, the subject is suspected of having expression of a gene that promotes or contributes to the likelihood of acquiring a disease state or whose expression is correlative to the presence of a pathogen. Calculation of score can be accomplished using known algorithms executable in computer program products within equipment used in sequencing or analyzing samples. In some embodiments, the methods disclosed herein comprise substeps of detecting the presence, absence or quantity of a given biomarker by calculating the quantity of a probe in a control sample, calculating the quantity of a probe in the subject sample, and normalizing the signal obtained from the subject sample by subtracting the signal obtained from the control sample.
As used herein, “sequence identity” is determined by using the stand-alone executable BLAST engine program for blasting two sequences (b12seq), which can be retrieved from the National Center for Biotechnology Information (NCBI) ftp site, using the default parameters (Tatusova and Madden, FEMS Microbiol Lett., 1999, 174, 247-250; which is incorporated herein by reference in its entirety). Alternatively, “% sequence identity” can be determined using the EMBOSS Pairwise Alignment Algorithms tool available from The European Bioinformatics Institute (EMBL-EBI), which is part of the European Molecular Biology Laboratory (EMBL). This tool is accessible at the website ebi.ac.uk/Tools/emboss/align/. This tool utilizes the Needleman-Wunsch global alignment algorithm (Needleman, S. B. and Wunsch, C. D. (1970) J. Mol. Biol. 48, 443-453; Kruskal, J. B. (1983) An overview of sequence comparison, In D. Sankoff and B. Kruskal, (ed.), Time warps, string edits and macromolecules: the theory and practice of sequence comparison, pp. 1-44, Addison Wesley). Default settings are utilized which include Gap Open: 10.0 and Gap Extend 0.5. The default matrix “Blosum62” is utilized for amino acid sequences and the default matrix “DNAfull” is utilized for nucleic acid sequences.
As used herein, the term “statistically significant” means an observed alteration is greater than what would be expected to occur by chance alone (e.g., a “false positive”). Statistical significance can be determined by any of various methods well-known in the art. An example of a commonly used measure of statistical significance is the p-value. The p-value represents the probability of obtaining a given result equivalent to a particular datapoint, where the datapoint is the result of random chance alone. A result is often considered highly significant (not random chance) at a p-value less than or equal to about 0.05.
The terms “subject,” “individual,” and “patient” are used interchangeably herein to refer to a vertebrate, preferably a mammal, more preferably a human. Mammals include, but are not limited to, murine, simians, humans, farm animals, cows, pigs, goats, sheep, horses, dogs, sport animals, and pets. Tissues, cells and their progeny obtained in vivo or cultured in vitro are also encompassed by the definition of the term “subject.” In some embodiments, the subject is a human. For treatment of those conditions which are specific for a specific subject, such as a human being, the term “patient” may be interchangeably used. In some instances in the description of the present disclosure, the term “patient” will refer to human patients suffering from a particular disease or disorder. In some embodiments, the subject may be a non-human animal. The term “mammal” encompasses both humans and non-humans and includes but is not limited to humans, non-human primates, canines, felines, murine, bovines, equines, caprine, and porcines.
By “substantially identical” is meant a nucleic acid molecule (or polypeptide) comprises at least about 50% sequence identity to a reference nucleic acid sequence (for example, any one of the nucleic acid sequences described herein) or amino acid sequence. In some embodiments, such a sequence is at least about 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, or even 99% identical at the nucleic acid level or amino acid level to the reference sequence used for comparison.
As used herein, the term “therapeutic” means an agent utilized to treat, combat, ameliorate, prevent or improve an unwanted condition or disease of a patient.
The term “therapeutically effective amount” means a quantity sufficient to achieve a desired therapeutic effect, for example, an amount which results in the prevention or amelioration of or a decrease in the symptoms associated with a disease that is being treated, e.g., disorders associated with cancer growth or a hyperproliferative disorder. The amount of compound administered to the subject will depend on the type and severity of the disease and on the characteristics of the individual, such as general health, age, sex, body weight and tolerance to drugs. It will also depend on the degree, severity and type of disease. The skilled artisan will be able to determine appropriate dosages depending on these and other factors. The regimen of administration can affect what constitutes an effective amount. Further, several divided dosages, as well as staggered dosages, can be administered daily or sequentially, or the dose can be continuously infused, or can be a bolus injection. Further, the dosages of the compound(s) of the disclosure can be proportionally increased or decreased as indicated by the exigencies of the therapeutic or prophylactic situation. Typically, an effective amount of the compounds of the present disclosure, sufficient for achieving a therapeutic effect, range from about 0.000001 mg per kilogram body weight per day to about 10,000 mg per kilogram body weight per day. Preferably, the dosage ranges are from about 0.0001 mg per kilogram body weight per day to about 100 mg per kilogram body weight per day. The compounds disclosed herein can also be administered in combination with each other, or with one or more additional therapeutic compounds.
The terms “treatment” or “treating” as used herein is an approach for obtaining beneficial or desired results including clinical results for the subject. For purposes herein, beneficial or desired clinical results include, but are not limited to, one or more of the following: (1) preventing or delaying the appearance of clinical symptoms of the state, disorder, or condition developing in a person who may be afflicted with or predisposed to the state, disorder or condition but does not yet experience or display clinical symptoms of the state, disorder or condition; (2) inhibiting the state, disorder or condition, i.e., arresting, reducing or delaying the development of the disease or a relapse thereof (in case of maintenance treatment) or at least one clinical symptom, sign, or test, thereof; or (3) relieving the disease, i.e., causing regression of the state, disorder or condition or at least one of its clinical or sub-clinical symptoms or signs. In some embodiments, a subject is successfully “treated” according to the methods of the present disclosure if the patient shows one or more of the following: a reduction in the number of and/or complete absence of cancer cells; a reduction in the tumor size; an inhibition of tumor growth; inhibition of and/or an absence of cancer cell infiltration into peripheral organs including the spread of cancer cells into soft tissue and bone; inhibition of and/or an absence of tumor or cancer cell metastasis; inhibition and/or an absence of cancer growth; relief of one or more symptoms associated with the specific cancer; reduced morbidity and mortality; improvement in quality of life; reduction in tumorigenicity; reduction in the number or frequency of cancer stem cells; or some combination of such effects.
The term “tumor” as used herein, refers to all neoplastic cell growth and proliferation, whether malignant or benign, and all pre-cancerous and cancerous cells and tissues. A “benign” tumor is not cancerous and it does not invade nearby tissue or spread to other parts of the body. A “premalignant” tumor is a tumor which is not yet cancerous but has the potential to become malignant. A “malignant” tumor, on the other hand, is cancerous and can grow and spread to other parts of the body.
The term “tumor sample” as used herein refers to a sample comprising tumor material obtained from a cancer patient. The term encompasses tumor tissue samples, for example, tissue obtained by surgical resection and tissue obtained by biopsy, such as for example, a core biopsy or a fine needle biopsy. In some embodiments, the tumor sample is a fixed, wax-embedded tissue sample, such as a formalin-fixed, paraffin-embedded tissue sample. Additionally, the term “tumor sample” encompasses a sample comprising tumor cells obtained from sites other than the primary tumor, e.g., circulating tumor cells. The term also encompasses cells that are the progeny of the patient's tumor cells, e.g. cell culture samples derived from primary tumor cells or circulating tumor cells. The term further encompasses samples that may comprise protein or nucleic acid material shed from tumor cells in vivo, e.g., bone marrow, blood, plasma, serum, and the like. The term also encompasses samples that have been enriched for tumor cells or otherwise manipulated after their procurement and samples comprising polynucleotides and/or polypeptides that are obtained from a patient's tumor material.
The identification of nullomers can be performed using any methods known in the art. In some embodiments, the identification of nullomers of the disclosure is performed as previously described in Georgakopoulos-Soares et al., published in bioRxiv, available at biorxiv.org/content/10.1101/2020.03.02.972422v1, incorporated by reference herein. As a first step, a dataset is obtained. In some embodiments, the dataset is obtained from WGS cancers from ICGC under the project PanCancer Analysis of Whole Genomes (ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes, Nature, 2020, 578:82-93), which includes 46 cancer projects from 21 organs. WGS patients were analyzed using the GRCh37 (hg19) reference assembly of the human genome.
In some embodiments, somatic indel calls are performed using three pipelines from four somatic variant callers. These are the Wellcome Sanger Institute pipeline, the DKFZ/EMBL pipeline and the Broad Institute pipeline, with somatic variant false discovery rate of about 2.5%. In some embodiments, indel calling is performed by those algorithms and only indels called by at least two of the callers were analyzed, therefore generating a conservative dataset. As a result, the false negative rate of indel detection can be higher than that of other methods, and of each pipeline separately, which implies that many indels present in the samples were not identified successfully. For a small subset of indels, in some embodiments, the indel calls are visually examined using JBrowse Genome Browser32, to inspect the number of reads reporting the indel, if the indel calls are biased towards the end of the sequencing reads or if there were other systematic biases between the normal and tumor sequencing reads; such biases could not be identified.
In some embodiments, Bedtools intersect utility is used to measure overlap between indels and polyN tracts. The term overlap in this context refers to deleted bases occurring at any position across the entire length of the repeat or inserted bases occurring at any position across the length of the repeat and immediately before or after the repeat. Indel density is defined as the number of indel mutations for a given number of bases.
In some embodiments, the distance between each pair of consecutive indels is calculated per patient. In some embodiments, indels in different chromosomes are excluded because their pairwise distance cannot be defined. In some embodiments, the same analysis is performed separately for insertions and deletions.
In some embodiments, substitution calling is performed using four somatic mutation-calling algorithms, with mutation calls being shared by at least two algorithms. In the embodiments for lung cancers, C>A substitutions can be examined with respect to transcriptional strand asymmetries at polyG tracts and replication timing.
In some embodiments, the numbers of indels overlapping motifs found in the template or non-template strands are obtained using the bedtools intersect command. In some embodiments, strand bias is calculated for the vector of genes, reporting the number of polyN motif occurrences and the number of overlapping motifs as:
In some embodiments, bootstrapping with replacement, randomly selecting the indels overlapping motifs at template and non-template strands from each randomly selected gene are performed for equal number of genes in multiple iterations, from which the standard deviation for the strand bias can be calculated.
The nullomers can be of any length. In some embodiments, the nullomers are in a length of from about 8 to about 50 nucleotides. In some embodiments, the nullomers are in a length of from about 10 to about 45 nucleotides. In some embodiments, the nullomers are in a length of from about 12 to about 40 nucleotides. In some embodiments, the nullomers are in a length of from about 14 to about 30 nucleotides. In some embodiments, the nullomers are in a length of from about 16 to about 20 nucleotides. In some embodiments, the nullomers are in a length of from about 8 nucleotides. In some embodiments, the nullomers are in a length of about 10 nucleotides. In some embodiments, the nullomers are in a length of about 11 nucleotides. In some embodiments, the nullomers are in a length of about 12 nucleotides. In some embodiments, the nullomers are in a length of about 13 nucleotides. In some embodiments, the nullomers are in a length of about 14 nucleotides. In some embodiments, the nullomers are in a length of about 15 nucleotides. In some embodiments, the nullomers are in a length of about 16 nucleotides. In some embodiments, the nullomers are in a length of about 17 nucleotides. In some embodiments, the nullomers are in a length of about 18 nucleotides. In some embodiments, the nullomers are in a length of about 19 nucleotides. In some embodiments, the nullomers are in a length of about 20 nucleotides. In some embodiments, the nullomers are in a length of about 25 nucleotides. In some embodiments, the nullomers are in a length of about 30 nucleotides. In some embodiments, the nullomers are in a length of about 35 nucleotides. In some embodiments, the nullomers are in a length of about 40 nucleotides. In some embodiments, the nullomers are in a length of about 45 nucleotides. In some embodiments, the nullomers are in a length of about 50 nucleotides. In some embodiments, the nullomers are in a length of more than about 50 nucleotides. Nullomers as Biomarkers for Cancer
The disclosure provides nullomers identified in cancers of numerous organs or tissues, including pancreas, esophagus, lymphoid, kidney, ovary, head and neck, lung, stomach, liver, CNS, uterus, skin, colorectal, prostate, bladder, bone and soft tissue, breast, biliary, cervix, thyroid and myeloid. The nullomers of the disclosure are provided in Table 1.
In some embodiments, the disclosure relates to a nullomer comprising at least about 60%, 65%, 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89% 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 97%, 98, 99% or 100% sequence identity to any of the sequences provided in Table 1. In some embodiments, the disclosure relates to a nullomer comprising any of the sequences provided in Table 1. In some embodiments, the disclosure relates to a nucleic acid sequence that is complementary to any of the sequences provided in Table 1.
The expression level of one or more disclosed nullomers can be determined in a biological sample obtained from a subject. A sample of a subject is one that originates from a subject. Such a sample may be further processed after it is obtained from the subject. For example, DNA or RNA may be isolated from a sample. In this example, the DNA or RNA isolated from the sample is also a sample obtained from the subject. A biological sample useful for determining the level of one or more disclosed nullomers may be obtained from essentially any source, including cells, blood, hair, tissues, and fluids throughout the body.
In some embodiments, the biological sample used for determining the level of one or more disclosed nullomers is a sample. In some embodiments the sample comprises circulating nullomers, e.g., extracellular nullomers. Extracellular nullomers freely circulate in a wide range of biological material, including bodily fluids, such as fluids from the circulatory system, e.g., a blood sample or a lymph sample, or from another bodily fluid such as urine or saliva or serum. Accordingly, in some embodiments, the biological sample used for determining the level of one or more disclosed nullomers is a bodily fluid, for example, blood, fractions thereof, serum, plasma, urine, saliva, tears, sweat, semen, vaginal secretions, lymph, bronchial secretions, CSF, whole blood, etc. In some embodiments, the sample is a sample that is obtained non-invasively. In some embodiments, the sample is whole blood or blood cells. In some embodiments, the sample is cells from a hair sample or nucleic acids from a hair sample. In some embodiments, the sample is sputum, saliva or spit. In some embodiments, the sample is a serum sample from a human. In some embodiments, the sample is a bodily fluid from a human. In some embodiments, the sample is a liquid biopsy from a human.
In some embodiments, any of the methods disclosed herein comprise using a small volume of sample for detection and/or diagnosis. In some embodiments, the sample used in any of the disclosed methods has a volume of no more than about 100 microliters of fluid. In some embodiments, the sample has a volume of no more than about 90 microliters of fluid. In some embodiments, the sample has a volume of no more than about 80 microliters of fluid. In some embodiments, the sample has a volume of no more than about 70 microliters of fluid. In some embodiments, the sample has a volume of no more than about 60 microliters of fluid. In some embodiments, the sample has a volume of no more than about 50 microliters of fluid. In some embodiments, the sample has a volume of no more than about 40 microliters of fluid. In some embodiments, the sample has a volume of no more than about 30 microliters of fluid. In some embodiments, the sample has a volume of no more than about 20 microliters of fluid. In some embodiments, the sample has a volume of no more than about 10 microliters of fluid. In some embodiments, the sample has a volume of no more than about 5 microliters of fluid. In some embodiments, the sample has a volume of no more than about 1 microliters of fluid.
In some embodiments, the disclosed methods comprise isolating total DNA or RNA and/or amplifying nullomers in a sample of no more than about 5 microliters, no more than about 10 microliters, no more than about 20 microliters, no more than about 40 microliters, no more than about 80 microliters, no more than about 100 microliters, no more than about 200 microliters, no more than about 300 microliters, no more than about 400 microliters, no more than about 500 microliters, no more than about 600 microliters, no more than about 700 microliters, no more than about 800 microliters, no more than about 900 microliters, no more than about 1 milliliter, no more than about 1.1 milliliters, no more than about 1.2 milliliters, no more than about 1.3 milliliters, no more than about 1.4 milliliters, no more than about 1.5 milliliters, no more than about 1.6 milliliters, no more than about 1.7 milliliters, no more than about 1.8 milliliters, no more than about 1.9 milliliters, or no more than about 2.0 milliliters. In some embodiments, the sample size is from about 1 microliters to about 2 milliliters, from about 20 microliters to about 2 milliliters, from about 5 microliters to about 1.5 milliliters, from about 10 microliters to about 500 microliters, from about 15 microliters to about 300 microliters, from about 20 microliters to about 200 microliters, from about 30 microliters to about 100 microliters, from about 1 microliters to about 100 microliters, from about 5 microliters to about 75 microliters, or from about 10 microliters to about 50 microliters of liquid sample in the form of subject plasma, whole blood, blood cells, cells from a hair sample, saliva or spit, or serum.
In some embodiments, the methods disclosed herein comprise isolating total DNA or RNA and/or amplifying nullomers in a sample of no more than about 5 microliters of serum, no more than about 10 microliters of serum, no more than about 20 microliters of serum, no more than about 40 microliters of serum, no more than about 80 microliters of serum, no more than about 100 microliters of serum, no more than about 200 microliters of serum, no more than about 300 microliters of serum, no more than about 400 microliters of serum, no more than about 500 microliters of serum, no more than about 600 microliters of serum, no more than about 700 microliters of serum, no more than about 800 microliters of serum, no more than about 900 microliters of serum, no more than about 1 milliliter of serum, no more than about 1.1 milliliters of serum, no more than about 1.2 milliliters of serum, no more than about 1.3 milliliters of serum, no more than about 1.4 milliliters of serum, no more than about 1.5 milliliters of serum, no more than about 1.6 milliliters of serum, no more than about 1.7 milliliters of serum, no more than about 1.8 milliliters of serum, no more than about 1.9 milliliters of serum, or no more than about 2.0 milliliters of serum.
Circulating nullomers include nullomers in cells, extracellular nullomers in microvesicles, in exosomes and extracellular nullomers that are not associated with cells or microvesicles (extracellular, non-vesicular nullomers). In some embodiments, the biological sample used for determining the level of one or more nullomers (e.g., a sample containing circulating nullomers) may contain cells. In other embodiments, the biological sample may be free or substantially free of cells (e.g., a serum sample). In some embodiments, a sample containing circulating nullomers, e.g., extracellular nullomers, is a blood-derived sample. Exemplary blood-derived sample types include, e.g., a plasma sample, a serum sample, a blood sample, etc. In other embodiments, a sample containing circulating nullomers is a lymph sample. Circulating nullomers are also found in urine and saliva, and biological samples derived from these sources are likewise suitable for determining the level of one or more disclosed nullomers.
In some embodiments, any of the methods of the disclosure comprises a step of isolating total DNA or RNA from a sample or cell or exosome or microvesicle. Methods of isolating DNA or RNA for expression analysis from blood, plasma and/or serum (see for example, Tsui NB et al. (2002) Clin. Chem. 48,1647-53, incorporated by reference in its entirety herein) and from urine (see for example, Boom R et al. (1990) J Clin Microbiol. 28, 495-503, incorporated by reference in its entirety herein) have been described and routinely used by the skilled person.
The level of one or more disclosed nullomers in a biological sample can be determined by any suitable method. Any reliable method for measuring the level or amount of a nullomer in a sample can be used. Generally, nullomers can be detected and quantified from a sample (including fractions thereof), such as samples of isolated DNA or RNA by various methods known for DNA or mRNA, including, for example, amplification-based methods (e.g., Polymerase Chain Reaction (PCR), Real-Time Polymerase Chain Reaction (RT-PCR), Quantitative Polymerase Chain Reaction (qPCR), rolling circle amplification, etc.), hybridization-based methods (e.g., hybridization arrays (e.g., microarrays), NanoString analysis, Northern Blot analysis, branched DNA (bDNA) signal amplification, in situ hybridization, etc.), and sequencing-based methods (e.g., next-generation sequencing methods, for example, using the Illumina or IonTorrent platforms). Other exemplary techniques include ribonuclease protection assay (RPA) and mass spectroscopy.
In some embodiments where RNA is used as samples, RNA is converted to DNA (cDNA) prior to analysis. cDNA can be generated by reverse transcription of isolated RNA using conventional techniques. In some embodiments, nullomer is amplified prior to measurement. In other embodiments, the level of nullomer is measured during the amplification process. In still other embodiments, the level of nullomer is not amplified prior to measurement. Some exemplary methods suitable for determining the level of nullomer in a sample are described in greater detail below. These methods are provided by way of illustration only, and it will be apparent to a skilled person that other suitable methods may likewise be used.
Many amplification-based methods exist for detecting the level of nullomers, including, but not limited to, PCR, RT-PCR, qPCR, and rolling circle amplification. Other amplification-based techniques include, for example, ligase chain reaction, multiplex ligatable probe amplification, in vitro transcription (IVT), strand displacement amplification, transcription-mediated amplification, RNA (Eberwine) amplification, and other methods that are known to persons skilled in the art.
A typical PCR reaction includes multiple steps, or cycles, that selectively amplify target nucleic acid species: a denaturing step, in which a target nucleic acid is denatured; an annealing step, in which a set of PCR primers (i.e., forward and reverse primers) anneal to complementary DNA strands, and an elongation step, in which a thermostable DNA polymerase elongates the primers. By repeating these steps multiple times, a DNA fragment is amplified to produce an amplicon, corresponding to the target sequence. Typical PCR reactions include 20 or more cycles of denaturation, annealing, and elongation. In many cases, the annealing and elongation steps can be performed concurrently, in which case the cycle contains only two steps. A reverse transcription reaction (which produces a cDNA sequence having complementarity to a RNA) may be performed prior to PCR amplification. Reverse transcription reactions include the use of, e.g., a RNA-based DNA polymerase (reverse transcriptase) and a primer.
Kits for quantitative real time PCR of nullomers are known, and are commercially available. Examples of suitable kits include, but are not limited to, the TaqMan mRNA Assay (Applied Biosystems) and the mir Vana qRT-PCR nullomer detection kit (Ambion). The RNA can be ligated to a single stranded oligonucleotide containing universal primer sequences, a polyadenylated sequence, or adaptor sequence prior to reverse transcriptase and amplified using a primer complementary to the universal primer sequence, poly(T) primer, or primer comprising a sequence that is complementary to the adaptor sequence.
In some instances, custom qRT-PCR assays can be developed for determination of nullomer levels. Custom qRT-PCR assays to measure nullomers in a biological sample, e.g., a body fluid, can be developed using, for example, methods that involve an extended reverse transcription primer and locked nucleic acid modified PCR. Custom nullomer assays can be tested by running the assay on a dilution series of chemically synthesized nullomer corresponding to the target sequence. This permits determination of the limit of detection and linear range of quantitation of each assay. Furthermore, when used as a standard curve, these data permit an estimate of the absolute abundance of nullomers measured in biological samples.
Amplification curves may optionally be checked to verify that Ct values are assessed in the linear range of each amplification plot. Typically, the linear range spans several orders of magnitude. For each candidate nullomer assayed, a chemically synthesized version of the nullomer can be obtained and analyzed in a dilution series to determine the limit of sensitivity of the assay, and the linear range of quantitation. Relative expression levels may be determined, for example, as described by Livak et al., Methods (2001) December; 25(4):402-8.
In some embodiments, two or more nullomers are amplified in a single reaction volume. For example, multiplex q-PCR, such as qRT-PCR, enables simultaneous amplification and quantification of at least two nullomers of interest in one reaction volume by using more than one pair of primers and/or more than one probe. The primer pairs comprise at least one amplification primer that specifically binds each nullomer, and the probes are labeled such that they are distinguishable from one another, thus allowing simultaneous quantification of multiple nullomers.
Rolling circle amplification is a DNA-polymerase driven reaction that can replicate circularized oligonucleotide probes with either linear or geometric kinetics under isothermal conditions (see, for example, Lizardi et al., Nat. Gen. (1998) 19(3):225-232; Gusev et al., Am. J. Pathol. (2001) 159(1):63-69; Nallur et al., Nucleic Acids Res. (2001) 29(23):E118). In the presence of two primers, one hybridizing to the (+) strand of DNA, and the other hybridizing to the (−) strand, a complex pattern of strand displacement results in the generation of over 109 copies of each DNA molecule in 90 minutes or less. Tandemly linked copies of a closed circle DNA molecule may be formed by using a single primer. The process can also be performed using a matrix-associated DNA. The template used for rolling circle amplification may be reverse transcribed. This method can be used as a highly sensitive indicator of nullomer sequence and expression level at very low nullomer concentrations (see, for example, Cheng et al., Angew Chem. Int. Ed. Engl. (2009) 48(18):3268-72; Neubacher et al., Chembiochem. (2009) 10(8):1289-91).
In some embodiments, the disclosure provide a method for identifying the presence, absence, or quantity of one or a plurality of the disclosed nullomers comprising: a) isolating nucleic acids from a sample; and b) mixing the nucleic acids with one or a plurality of primers under conditions and for a period of time sufficient to allow amplification of the one or plurality nullomers, wherein the one or plurality of primers comprises sequences that are complementary to any of the nullomers provided in Table 1. In some embodiments, the nucleic acid from a sample is cell-free (cfDNA). In some embodiments, the nucleic acid from a sample is circulating tumor (ctDNA). In some embodiments, the primer used in the disclosed method comprises from about 6 to about 16 nucleotides. In some embodiments, the primer used in the disclosed method comprises from about 7 to about 15 nucleotides. In some embodiments, the primer used in the disclosed method comprises from about 8 to about 14 nucleotides. In some embodiments, the primer used in the disclosed method comprises about 6 nucleotides. In some embodiments, the primer used in the disclosed method comprises about 7 nucleotides. In some embodiments, the primer used in the disclosed method comprises about 8 nucleotides. In some embodiments, the primer used in the disclosed method comprises about 9 nucleotides. In some embodiments, the primer used in the disclosed method comprises about 10 nucleotides. In some embodiments, the primer used in the disclosed method comprises about 11 nucleotides. In some embodiments, the primer used in the disclosed method comprises about 12 nucleotides. In some embodiments, the primer used in the disclosed method comprises about 13 nucleotides. In some embodiments, the primer used in the disclosed method comprises about 14 nucleotides. In some embodiments, the primer used in the disclosed method comprises about 15 nucleotides. In some embodiments, the primer used in the disclosed method comprises about 16 nucleotides.
In some embodiments, the identification of the presence or quantity of one or a plurality of the disclosed nullomers is indicative that the subject from which the sample is obtained has the cancer type corresponding to the particular nullomer identified in Table 1.
Nullomers may be detected using hybridization-based methods, including but not limited to hybridization arrays (e.g., microarrays), NanoString analysis, Southern Blot analysis, Northern Blot analysis, branched DNA (bDNA) signal amplification, and in situ hybridization.
Microarrays can be used to measure the levels of large numbers of nullomers simultaneously. Microarrays can be fabricated using a variety of technologies, including printing with fine-pointed pins onto glass slides, photolithography using pre-made masks, photolithography using dynamic micromirror devices, ink-jet printing, or electrochemistry on microelectrode arrays. Also useful are microfluidic TaqMan Low-Density Arrays, which are based on an array of microfluidic qRT-PCR reactions, as well as related microfluidic qRT-PCR based methods.
Axon B-4000 scanner and Gene-Pix Pro 4.0 software or other suitable software can be used to scan images. Non-positive spots after background subtraction, and outliers detected by the ESD procedure, are removed. The resulting signal intensity values are normalized to per-chip median values and then used to obtain geometric means and standard errors for each nullomer. Each signal can be transformed to log base 2, and a one-sample t test can be conducted. Independent hybridizations for each sample can be performed on chips with each nullomer spotted multiple times to increase the robustness of the data.
Microarrays can be used for the expression profiling of nullomers in diseases. For example, DNA or RNA can be extracted from a sample and, optionally, the nullomers are size-selected from total DNA or RNA. Oligonucleotide linkers can be attached to the 5′ and 3′ ends of the nullomers and the resulting ligation products are used as templates for an RT-PCR reaction. The sense strand PCR primer can have a fluorophore attached to its 5′ end, thereby labeling the sense strand of the PCR product. The PCR product is denatured and then hybridized to the microarray. A PCR product, referred to as the target nucleic acid that is complementary to the corresponding nullomer capture probe sequence on the array will hybridize, via base pairing, to the spot at which the, capture probes are affixed. The spot will then fluoresce when excited using a microarray laser scanner. In some embodiments, probes of the disclosure are nucleic acid sequences comprising from about 10 to about 20 nucleotides in length and are DNA or RNA or NDA/RNA hybrid seqeunces complementary to a nullomer of Table 1, Table 4, Table 5 or Table 7. In some embodiments, the disclosure relate to composition comprising one or a plurality f such probes. And in some embodiments, those probes comprise a fluorescent probe detectable when exposed to light emitted onto the probe.
The fluorescence intensity of each spot is then evaluated in terms of the number of copies of a particular nullomer, using a number of positive and negative controls and array data normalization methods, which will result in assessment of the level of expression of a particular nullomer.
Total RNA containing the nullomers extracted from a body fluid sample can also be used directly without size-selection of the nullomers. For example, the RNA can be 3′ end labeled using T4 RNA ligase and a fluorophore-labeled short RNA linker. Fluorophore-labeled nullomers complementary to the corresponding nullomer capture probe sequences on the array hybridize, via base pairing, to the spot at which the capture probes are affixed. The fluorescence intensity of each spot is then evaluated in terms of the number of copies of a particular nullomer, using a number of positive and negative controls and array data normalization methods, which will result in assessment of the level of expression of a particular nullomer.
Several types of microarrays can be employed including, but not limited to, spotted oligonucleotide microarrays, pre-fabricated oligonucleotide microarrays or spotted long oligonucleotide arrays.
Nullomers can also be detected without amplification using the nCounter Analysis System (NanoString Technologies, Seattle, Wash.). This technology employs two nucleic acid-based probes that hybridize in solution (e.g., a reporter probe and a capture probe). After hybridization to a nullomers disclosed herein, excess probes are removed, and probe/target complexes are analyzed in accordance with the manufacturer's protocol. nCounter nullomer assay kits are available from NanoString Technologies, which are capable of distinguishing between highly similar nullomers with great specificity.
Nullomers can also be detected using branched DNA (bDNA) signal amplification (see, for example, Urdea, Nature Biotechnology (1994), 12:926-928). RNA assays based on bDNA signal amplification are commercially available. One such assay is the QuantiGene.RTM. 2.0 nullomer Assay (Affymetrix, Santa Clara, Calif.). Southern Blot, Northern Blot and in situ hybridization may also be used to detect nullomers. Suitable methods for performing Southern Blot, Northern Blot and in situ hybridization are known in the art.
In some embodiments, biomarker expression is determined by an assay known to those of skill in the art, including but not limited to, multi-analyte profile test, enzyme-linked immunosorbent assay (ELISA), radioimmunoassay, Western blot assay, immunofluorescent assay, enzyme immunoassay, immunoprecipitation assay, chemiluminescent assay, immunohistochemical assay, dot blot assay, or slot blot assay. In some embodiments, wherein an antibody is used in the assay the antibody is detectably labeled. The antibody labels may include, but are not limited to, immunofluorescent label, chemiluminescent label, phosphorescent label, enzyme label, radiolabel, avidin/biotin, colloidal gold particles, colored particles, and magnetic particles. In some embodiments, biomarker expression is determined by an IHC assay.
In some embodiments, biomarker expression is determined using an agent that specifically binds the biomarker. Any molecular entity that displays specific binding to a biomarker can be employed to determine the level of that biomarker protein in a sample. Specific binding agents include, but are not limited to, antibodies, antibody fragments, antibody mimetics, and polynucleotides (e.g., aptamers). One of skill understands that the degree of specificity required is determined by the particular assay used to detect the biomarker protein. In some embodiments, the disclosure relates to a system comprising a solid support (such as an ELISA plate, gel, bead or column comprising an antibody, antibody fragment, antibody mimetic, and/or polynucleotides capable of binding to T3p or a salt thereof.
Advanced sequencing methods can likewise be used as available. For example, nullomers can be detected using Illumina. Next Generation Sequencing (e.g., Sequencing-By-Synthesis or TruSeq methods, using, for example, the HiSeq, HiScan, GenomeAnalyzer, or MiSeq systems (Illumina, Inc., San Diego, Calif.)). Nullomers can also be detected using Ion Torrent Sequencing (Ion Torrent Systems, Inc., Gulliford, Conn.), or other suitable methods of semiconductor sequencing.
Mass spectroscopy can be used to quantify nullomers using RNase mapping. Isolated RNAs can be enzymatically digested with RNA endonucleases (RNases) having high specificity (e.g., RNase TI, which cleaves at the 3′-side of all unmodified guanosine residues) prior to their analysis by MS or tandem MS (MS/MS) approaches. The first approach developed utilized the on-line chromatographic separation of endonuclease digests by reversed phase HPLC coupled directly to ESI-MS. The presence of posttranscriptional modifications can be revealed by mass shifts from those expected based upon the RNA sequence. Ions of anomalous mass/charge values can then be isolated for tandem MS sequencing to locate the sequence placement of the posttranscriptionally modified nucleoside.
Matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) has also been used as an analytical approach for obtaining information about posttranscriptionally modified nucleosides. MALDI-based approaches can be differentiated from ESI-based approaches by the separation step. In MALDI-MS, the mass spectrometer is used to separate the nullomers.
To analyze a limited quantity of intact nullomers, a system of capillary LC coupled with nanoESI-MS can be employed, by using a linear ion trap-orbitrap hybrid mass spectrometer (LTQ Orbitrap XL, Thermo Fisher Scientific) or a tandem-quadrupole time-of-flight mass spectrometer (QSTAR XL, Applied Biosystems) equipped with a custom-made nanospray ion source, a Nanovolume Valve (Valco Instruments), and a splitless nano HPLC system (DiNa, KYA Technologies). Analyte/TEAA is loaded onto a nano-LC trap column, desalted, and then concentrated. Intact nullomers are eluted from the trap column and directly injected into a Cl 8 capillary column, and chromatographed by RP-HPLC using a gradient of solvents of increasing polarity. The chromatographic eluent is sprayed from a sprayer tip attached to the capillary column, using an ionization voltage that allows ions to be scanned in the negative polarity mode.
Additional methods for nullomer detection and measurement include, for example, strand invasion assay (Third Wave Technologies, Inc.), surface plasmon resonance (SPR), cDNA, MTDNA (metallic DNA; Advance Technologies, Saskatoon, SK), and single-molecule methods such as the one developed by US Genomics. Multiple nullomers can be detected in a microarray format using a novel approach that combines a surface enzyme reaction with nanoparticle-amplified SPR imaging (SPRI). The surface reaction of poly(A) polymerase creates poly(A) tails on nullomers hybridized onto locked nucleic acid (LNA) microarrays. DNA-modified nanoparticles are then adsorbed onto the poly(A) tails and detected with SPRI. This ultrasensitive nanoparticle-amplified SPRI methodology can be used for nullomers profiling at attomole levels. IN some embodiments, CRISPR-Cas9 complexes can be used to detect the presence of nullomers in vitro based upon exposure of a sample from a patient to sgRNA-Cas protein complex, wherein the sgRNA is complementary to at least a portion of the nullomer sequence. In some embodiments, the exposure is to genomic DNA within a cancer cell.
In some embodiments, the disclosure relates to a composition or system comprising one or a plurality of sgRNAs that comprise about 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% to the sequences of Table 6. In some embodiments, the disclosure relates to a composition or system comprising one or a plurality of sgRNAs that comprise from about 98 to about 110 nucleotides in length with at least one portion of the sgRNA complementary to a nucleic sequence from about 8 to about 18 nuceotides of any nullomer disclosed in Table 1, Table 4, Table 5 or Table 7.
As used herein, the term “mutagen” means any molecule, a nucleic acid sequence, amino acid sequence, or hybrid amino acid or nucleic acid sequence that causes a mutation or modification in one or more regions of endogenous nucleic acid when exposed for a time period sufficient to cause the mutation. In some embodiments, the mutation is a point mutation, frameshift mutation, deletion, truncation, or addition. In some embodiments, the mutagen is a vector or a gene-modifying enzyme.
The term “vector” as used herein refers to any genetic element, such as a plasmid, phage, transposon, cosmid, chromosome, artificial chromosome, virus, virion, etc., which is capable of replication when associated with the proper control elements and which can transfer gene sequences between cells. Thus, the term includes cloning and expression vehicles, as well as viral vectors.
The term “gene-modifying enzyme” as used herein refers to an enzyme that is capable of modifying a gene by introducing a mutation (e.g., point mutation, frameshift mutation, deletion, or truncation) causing gene inactivation or introducing heterologous nucleotides (e.g., genes) through non-homologous end joining or homologous recombination. Exemplary gene-modifying enzymes, include but not limited to, a Cas protein, a meganuclease, a transcription activator-like effector nucleases (TALEN), a transposon, a zinc-finger nuclease (ZFN), or a recombinase. In some embodiments, the gene-modifying enzyme suitable for the methods disclosed herein is a Cas protein, a meganuclease, a TALEN, a ZFN, or a recombinase. In some embodiments, the gene-modifying enzyme suitable for the methods disclosed herein is a Cas protein. In some preferred embodiments, the gene-modifying enzyme suitable for the methods disclosed herein is a Cas9 protein.
The term “Cas9 protein” refers to the “clustered, regularly interspaced, short palindromic repeats (CRISPR)-associated protein 9.” This term is well known in the art and has been described, e.g. in Makarova et al. (2011) Nat. Rev. Microbiol., 9:467-477, and in Makarova et al. (2011) Biol. Direct., 6:38. Cas proteins are endonuclease that form part of an adaptive defense mechanism evolved by bacteria and archaea to protect them from invading viruses and plasmids. Cas9 protein or gene information can be obtained from a known database such as the GenBank of NCBI (National Center for Biotechnology Information), but is not limited thereto. Moreover, the Cas9 protein may comprise not only wild-type Cas9, but also deactivated Cas9 (dCas9), or Cas9 variants such as Cas9 nickase. The deactivated Cas9 may be RFN (RNA-guided FokI nuclease) comprising a FokI nuclease domain bound to dCas9, or may be dCas9 to which a transcription activator or repressor domain is bound. In addition, the Cas9 protein is not limited in its origin. For example, the Cas9 protein may be derived from Streptococcus pyogenes, Francisella novicida, Streptococcus thermophilus, Legionella pneumophila, Listeria innocua, or Streptococcus mutans.
Cas9 protein is the major protein element of the CRISPR/Cas9 system, which forms a complex with crRNA (CRISPR RNA) and tracrRNA (trans-activating crRNA) to form activated endonuclease or nickase. “CRISPR system” refers collectively to transcripts or synthetically produced transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding a Cas gene, a tracr (trans-activating CRISPR) sequence (e.g. tracrRNA or an active partial tracrRNA), a tracr-mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred to as a “spacer” in the context of an endogenous CRISPR system), or other sequences and transcripts from a CRISPR locus. In some embodiments, one or more elements of a CRISPR system is derived from a type I, type II, or type III CRISPR system. In some embodiments, one or more elements of a CRISPR system is derived from a particular organism comprising an endogenous CRISPR system, such as Streptococcus pyogenes. In general, a CRISPR system is characterized by elements that promote the formation of a CRISPR complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous CRISPR system). In the context of formation of a CRISPR complex, “target sequence” refers to a nucleic acid sequence to which a guide sequence is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex. Full complementarity is not necessarily required, provided there is sufficient complementarity to cause hybridization and promote formation of a CRISPR complex. A target sequence may comprise any polynucleotide, such as DNA or RNA polynucleotides, but in some embodiments, the tragte sequence is a nullomer or a region of a nullomer that is from about 10 to about 35 nucleotides of the nullomer sequence of any nullomer from Table 1. In some embodiments, the target sequence is a DNA polynucleotide and is referred to a DNA target sequence. In some embodiments, a target sequence comprises at least three nucleic acid sequences that are recognized by a Cas-protein when the Cas protein is associated with a CRISPR complex or system which comprises at least one sgRNA or one tracrRNA/crRNA duplex at a concentration and within an microenvironment suitable for association of such a system. In some embodiments, the target DNA comprises at least one or more proto-spacer adjacent motifs which sequences are known in the art and are dependent upon the Cas protein system being used in conjunction with the sgRNA or crRNA/tracrRNAs employed by this work. In some embodiments, the target DNA comprises NNG, where G is a guanine and N is any naturally occurring nucleic acid. In some embodiments the target DNA comprises any one or combination of NNG, NNA, GAA, NNAGAAW and NGGNG, where G is an guanine, A is adenine, and Nis any naturally occurring nucleic acid from one nullomer in Table 1.
Typically, in the context of an endogenous CRISPR system, formation of a CRISPR complex (comprising a guide sequence hybridized to a target sequence and complexed with one or more Cas proteins) results in cleavage of one or both strands in or near (e.g. within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, or more base pairs from) the target sequence. without wishing to be bound by theory, the tracr sequence, which may comprise or consist of all or a portion of a wild-type tracr sequence (e.g. about or more than about 20, 26, 32, 45, 48, 54, 63, 67, 85, or more nucleotides of a wild-type tracr sequence), may also form part of a CRISPR complex, such as by hybridization along at least a portion of the tracr sequence to all or a portion of a tracr mate sequence that is operably linked to the guide sequence. In some embodiments, the tracr sequence has sufficient complementarity to a tracr mate sequence to hybridize and participate in formation of a CRISPR complex. As with the target sequence, it is believed that complete complementarity is not needed, provided there is sufficient to be functional (bind the Cas protein or functional fragment thereof). In some embodiments, the tracr sequence has at least 50%, 60%, 70%, 80%, 90%, 95% or 99% of sequence complementarity along the length of the tracr mate sequence when optimally aligned. In some embodiments, one or more vectors driving expression of one or more elements of a CRISPR system are introduced into a host cell such that the presence and/or expression of the elements of the CRISPR system direct formation of a CRISPR complex at one or more target sites. For example, a Cas enzyme, a guide sequence linked to a tracr-mate sequence, and a tracr sequence could each be operably linked to separate regulatory elements on separate vectors. Alternatively, two or more of the elements expressed from the same or different regulatory elements, may be combined in a single vector, with one or more additional vectors providing any components of the CRISPR system not included in the first vector. In some embodiments, the target site is a genomic DNA of a cancer cell within the host or a cancer cell isolated from the subject in a sample or within a system independent of a tumor.
With at least some of the modification contemplated by this disclosure, in some embodiments, the guide sequence or RNA or DNA sequences that form a CRISPR complex are at least partially synthetic. The CRISPR system elements that are combined in a single vector may be arranged in any suitable orientation, such as one element located 5′ with respect to (“upstream” of) or 3′ with respect to (“downstream” of) a second element. In some embodiments, the disclosure relates to a composition comprising a chemically synthesized guide sequence. In some embodiments, the chemically synthesized guide sequence is used in conjunction with a vector comprising a coding sequence that encodes a CRISPR enzyme, such as a type II Cas9 protein. In some embodiments, the chemically synthesized guide sequence is used in conjunction with one or more vectors, wherein each vector comprises a coding sequence that encodes a CRISPR enzyme, such as a type II Cas9 protein. The coding sequence of one element may be located on the same or opposite strand of the coding sequence of a second element, and oriented in the same or opposite direction. In some embodiments, a single promoter drives expression of a transcript encoding a CRISPR enzyme and one or more additional (second, third, fourth, etc.) guide sequences, tracr mate sequence (optionally operably linked to the guide sequence), and a tracr sequence embedded within one or more intron sequences (e.g. each in a different intron, two or more in at least one intron, or all in a single intron). In some embodiments, the CRISPR enzyme, one or more additional guide sequence, tracr mate sequence, and tracr sequence are each a component of different nucleic acid sequences. For instance, in the case of a tracr and tracr mate sequences and in some embodiments, the disclosure relates to a composition comprising at least a first and second nucleic acid sequence, wherein the first nucleic acid sequence comprises a tracr sequence and the second nucleic acid sequence comprises a tracr mate sequence, wherein the first nucleic acid sequence is at least partially complementary to the second nucleic acid sequence such that the first and second nucleic acid for a duplex and wherein the first nucleic acid and the second nucleic acid either individually or collectively comprise a DNA-targeting domain, a Cas protein binding domain, and a transcription terminator domain. In some embodiments, the CRISPR enzyme, one or more additional guide sequence, tracr mate sequence, and tracr sequence are operably linked to and expressed from the same promoter. In some embodiments, the disclosure relates to compositions comprising any one or combination of the disclosed domains on one guide sequence or two separate tracrRNA/crRNA sequences with or without any of the disclosed modifications. Any methods disclosed herein also relate to the use of tracrRNA/crRNA sequence interchangeably with the use of a guide sequence, such that a composition may comprise a single synthetic guide sequence and/or a synthetic tracrRNA/crRNA with any one or combination of modified domains disclosed herein.
The CRISPR system suitable for the present disclosure can also comprise a modified CRISPR enzyme (or “Cas protein”) or a nucleotide sequence encoding one or more Cas proteins. Any protein capable of enzymatic activity in cooperation with a guide sequence is a Cas protein. In some embodiments, the disclosure relates to a system comprises a vector comprising a regulatory element operably linked to an enzyme-coding sequence encoding a CRISPR enzyme, such as a Cas protein from the Cas family of enzymes. In some embodiments, the disclosure relates to a system, composition, or pharmaceutical composition comprising any one or plurality of Cas proteins either individually or in combination with one or a plurality of guide sequences. Compositions of one or a plurality of Cas proteins may be administered to a subject with any of the disclosed guide sequences sequentially or contemporaneously. Non-limiting examples of Cas proteins include Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (also known as Csn1 and Csx12), Cas10, Csy1, Csy2, Csy3, Cse1, Cse2, Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, type V CRISPR-Cas systems (e.g., Cas12), and Type VI CRISPR-Cas systems (e.g., Cas13), and variants and fragments thereof, or modified versions thereof having at least 70% sequence identity to any of the above Cas proteins. These enzymes are known, for example, the amino acid sequence of S. pyogenes Cas9 protein may be found in the SwissProt database under accession number Q99ZW2. In some embodiments, the unmodified CRISPR enzyme has DNA cleavage activity, such as Cas9. In some embodiments the CRISPR enzyme is Cas9, and may be Cas9 from S. pyogenes or S. pneumoniae. In some embodiments, the CRISPR enzyme directs cleavage of one or both strands at the location of a target sequence, such as within the target sequence and/or within the complement of the target sequence. In some embodiments, the CRISPR enzyme directs cleavage of one or both strands within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500, or more base pairs from the first or last nucleotide of a target sequence. In some embodiments, a vector encodes a CRISPR enzyme or Cas protein that is mutated to with respect to a corresponding wild-type enzyme such that the mutated CRISPR enzyme lacks the ability to cleave one or both strands of a target polynucleotide containing a target sequence. For example, an aspartate-to-alanine substitution (D10A) in the RuvC I catalytic domain of Cas9 from S. pyogenes converts Cas9 from a nuclease that cleaves both strands to a nickase (cleaves a single strand). Other examples of mutations that render Cas9 a nickase include, without limitation, H840A, N854A, and N863A. In some embodiments, a Cas9 nickase may be used in combination with guide sequence(s), e.g., two guide sequences, which target respectively sense and antisense strands of the DNA target. This combination allows both strands to be nicked and used to induce NHEJ.
As a further example, two or more catalytic domains of Cas9 (RuvC I, RuvC II, and RuvC III) may be mutated to produce a mutated Cas9 substantially lacking all DNA cleavage activity. In some embodiments, a D10A mutation is combined with one or more of H840A, N854A, or N863A mutations to produce a Cas9 enzyme substantially lacking all DNA cleavage activity. In some embodiments, a CRISPR enzyme is considered to substantially lack all DNA cleavage activity when the DNA cleavage activity of the mutated enzyme is less than about 25%, 10%, 5%, 1%, 0.1%, 0.01%, or lower with respect to its non-mutated form. Other mutations may be useful; where the Cas9 or other CRISPR enzyme is from a species other than S. pyogenes, mutations in corresponding amino acids may be made to achieve similar effects.
The disclosure relates to a method of detecting the presence of a nullomer by exposing a Cas protein and sgRNA specific to a target nullomer sequence to a nullomer target sequence. In some embodiments, the nullomer target sequence is any nullomer from Table 1 and the sgRNA sequence specific for the nullomer is any RNA molecule that comprises from about 10 to about 35 nucleotides complementary to a nullomer in Table 1. In some embodiments, the method further comprises allowing a time period sufficient for the sgRNA to associate with the nullomer and the Cas protein to excise the nullomer from the genomic DNA of a host cell or cell within a sample. Detection of the nullomer can further comprise identifying the nullomer sequence excised from the cell by amplification through PCR or a non-amplification event such as those disclosed herein.
In certain embodiments, labels, dyes, or labeled probes and/or primers are used to detect amplified or unamplified nullomers. The skilled artisan will recognize which detection methods are appropriate based on the sensitivity of the detection method and the abundance of the target. Depending on the sensitivity of the detection method and the abundance of the target, amplification may or may not be required prior to detection. One skilled in the art will recognize the detection methods where nullomer amplification is preferred.
A probe or primer may include standard (A, T or U, G and C) bases, or modified bases. Modified bases include, but are not limited to, the AEGIS bases (from Eragen Biosciences), which have been described, e.g., in U.S. Pat. Nos. 5,432,272, 5,965,364, and 6,001,983. In certain aspects, bases are joined by a natural phosphodiester bond or a different chemical linkage. Different chemical linkages include, but are not limited to, a peptide bond or a Locked Nucleic Acid (LNA) linkage, which is described, e.g., in U.S. Pat. No. 7,060,809.
In a further aspect, oligonucleotide probes or primers present in an amplification reaction are suitable for monitoring the amount of amplification product produced as a function of time. In certain aspects, probes having different single stranded versus double stranded character are used to detect the nucleic acid. Probes include, but are not limited to, the 5′-exonuclease assay (e.g., TAQMAN) probes (see U.S. Pat. No. 5,538,848), stem-loop molecular beacons (see, e.g., U.S. Pat. Nos. 6,103,476 and 5,925,517), stemless or linear beacons (see, e.g., WO 9921881, U.S. Pat. Nos. 6,485,901 and 6,649,349), peptide nucleic acid (PNA) Molecular Beacons (see, e.g., U.S. Pat. Nos. 6,355,421 and 6,593,091), linear PNA beacons (see, e.g. U.S. Pat. No. 6,329,144), non-FRET probes (see, e.g., U.S. Pat. No. 6,150,097), Sunrise.TM./AmplifluorB.TM. probes (see, e.g., U.S. Pat. No. 6,548,250), stem-loop and duplex SCORPION probes (see, e.g., U.S. Pat. No. 6,589,743), bulge loop probes (see, e.g., U.S. Pat. No. 6,590,091), pseudo knot probes (see, e.g., U.S. Pat. No. 6,548,250), cyclicons (see, e.g., U.S. Pat. No. 6,383,752), MGB Eclipse™ probe (Epoch Biosciences), hairpin probes (see, e.g., U.S. Pat. No. 6,596,490), PNA light-up probes, antiprimer quench probes (Li et al., Clin. Chem. 53:624-633 (2006)), self-assembled nanoparticle probes, and ferrocene-modified probes described, for example, in U.S. Pat. No. 6,485,901.
In certain embodiments, one or more of the primers in an amplification reaction can include a label. In yet further embodiments, different probes or primers comprise detectable labels that are distinguishable from one another. In some embodiments, a nucleic acid, such as the probe or primer, may be labeled with two or more distinguishable labels.
In some aspects, a label is attached to one or more probes and has one or more of the following properties: (i) provides a detectable signal; (ii) interacts with a second label to modify the detectable signal provided by the second label, e.g., FRET (Fluorescent Resonance Energy Transfer); (iii) stabilizes hybridization, e.g., duplex formation; and (iv) provides a member of a binding complex or affinity set, e.g., affinity, antibody-antigen, ionic complexes, hapten-ligand (e.g., biotin-avidin). In still other aspects, use of labels can be accomplished using any one of a large number of known techniques employing known labels, linkages, linking groups, reagents, reaction conditions, and analysis and purification methods.
Nullomers can be detected by direct or indirect methods. In a direct detection method, one or more nullomers are detected by a detectable label that is linked to a nucleic acid molecule. In such methods, the nullomers may be labeled prior to binding to the probe. Therefore, binding is detected by screening for the labeled nullomer that is bound to the probe. The probe is optionally linked to a bead in the reaction volume.
In certain embodiments, nucleic acids are detected by direct binding with a labeled probe, and the probe is subsequently detected. In some embodiments, the nucleic acids, such as amplified nullomers, are detected using FlexMAP Microspheres (Luminex) conjugated with probes to capture the desired nucleic acids. Some methods may involve detection with polynucleotide probes modified with fluorescent labels or branched DNA (bDNA) detection, for example.
In some embodiments, biomarker expression is determined using a PCR-based assay comprising specific primers and/or probes for each biomarker. As used herein, the term “probe” refers to any molecule that is capable of selectively binding a specifically intended target biomolecule. In some embodiments, as used herein, the term “probe” refers to any molecule that may bind or associate, indirectly or directly, covalently or non-covalently, to any of the substrates and/or reaction products and/or proteases disclosed herein and whose association or binding is detectable using the methods disclosed herein. In some embodiments, the term “probe” refers to any molecule comprising a nucleic acid sequence that is complementary to any of the nucleic acid sequences disclosed in TABLE 1 or one comprising at least about 70%, 80%, 81%, 82%, 83%, 84, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity to any of the nucleic acid sequences disclosed in TABLE 1. In some embodiments, the term “probe” refers to any molecule comprising a nucleic acid sequence that is complementary to a fragment of any of the nucleic acid sequences disclosed in TABLE 1 or one comprising at least about 70%, 80%, 81%, 82%, 83%, 84, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity to a fragment of any of the nucleic acid sequences disclosed in TABLE 1. In some embodiments, the term “probe” refers to a sgRNA molecule comprising a nucleic acid sequence that is complementary to a fragment of any of the nucleic acid sequences disclosed in TABLE 1 or one comprising at least about 70%, 80%, 81%, 82%, 83%, 84, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity to a fragment of any of the nucleic acid sequences disclosed in TABLE 1. In some embodiments, the probe is a fluorogenic probe, antibody or absorbance-based probes. If an absorbance-based probe, the chromophore pNA (para-nitroanaline) may be used as a probe for detection and/or quantification of a target nucleic acid sequence disclosed herein. In some embodiments, the probe may comprise a nucleic acid sequence labeled with a fluorogenic molecule or a substrate that when exposed to an enzyme becomes fluorogenic and the nucleic acid sequence is complementary to any of the nucleic acid sequences disclosed in TABLE 1 or one comprising at least about 70%, 80%, 81%, 82%, 83%, 84, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity to any of the nucleic acid sequences disclosed in TABLE 1. Probes can be synthesized by one of skill in the art using known techniques, or derived from biological preparations. Probes may include but are not limited to, RNA, DNA, proteins, peptides, aptamers, antibodies, and organic molecules. The term “primer” or “probe” encompasses oligonucleotides that have a specific sequence or oligoribonucleotides that have a specific sequence. In some embodiments, the probe are from about 5 to about 20 nucleotides in length and are complementary to the nucleic acid sequences in TABLE 1 and comprise at least about 70%, 80%, 81%, 82%, 83%, 84, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or about 100% sequence identity to any one or combination of nucleic acid sequences complementary to those provided in TABLE 1. In some embodiments, the probe are from about 5 to about 20 nucleotides in length and are complementary to the nucleic acid sequences in TABLE 1 and comprise at least about 70%, 80%, 81%, 82%, 83%, 84, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or about 100% sequence identity to any one or combination of nucleic acid sequences complementary to those provided in TABLE 7. In some embodiments, the probe are from about 5 to about 20 nucleotides in length and are complementary to the nucleic acid sequences in TABLE 1 and comprise at least about 70%, 80%, 81%, 82%, 83%, 84, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or about 100% sequence identity to any one or combination of nucleic acid sequences complementary to those provided in TABLE 8.
The target molecule could be any one or a combination of nucleic acid sequences identified in TABLE 1. In some embodiments, the target molecule is a nucleic acid sequence comprising at least about 70%, 80%, 81%, 82%, 83%, 84, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or about 99% sequence identity to any one or combination of nucleic acid sequences provided in TABLE 1. In some embodiments, the target molecule is any amplified fragment of any one or combination of nucleic acid sequences identified in TABLE 1, and/or any one or combination of nucleic acid sequence comprising at least about 70%, 80%, 81%, 82%, 83%, 84, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or about 99% sequence identity to any one or combination of nucleic acid sequences in TABLE 1.
In other embodiments, nucleic acids are detected by indirect detection methods. For example, a biotinylated probe may be combined with a streptavidin-conjugated dye to detect the bound nucleic acid. The streptavidin molecule binds a biotin label on amplified nullomer, and the bound nullomer is detected by detecting the dye molecule attached to the streptavidin molecule. In some embodiments, the streptavidin-conjugated dye molecule comprises PHYCOLINK. Streptavidin R-Phycoerythrin (PROzyme). Other conjugated dye molecules are known to persons skilled in the art.
Labels include, but are not limited to, light-emitting, light-scattering, and light-absorbing compounds which generate or quench a detectable fluorescent, chemiluminescent, or bioluminescent signal (see, e.g., Kricka, L., Nonisotopic DNA Probe Techniques, Academic Press, San Diego (1992) and Garman A., Non-Radioactive Labeling, Academic Press (1997)). A dual labeled fluorescent probe that includes a reporter fluorophore and a quencher fluorophore is used in some embodiments. It will be appreciated that pairs of fluorophores are chosen that have distinct emission spectra so that they can be easily distinguished.
In certain embodiments, labels are hybridization-stabilizing moieties which serve to enhance, stabilize, or influence hybridization of duplexes, e.g., intercalators and intercalating dyes (including, but not limited to, ethidium bromide and SYBR-Green), minor-groove binders, and cross-linking functional groups (see, e.g., Blackburn et al., eds. “DNA and RNA Structure” in Nucleic Acids in Chemistry and Biology (1996)).
In other embodiments, methods relying on hybridization and/or ligation to quantify nullomers may be used, including oligonucleotide ligation (OLA) methods and methods that allow a distinguishable probe that hybridizes to the target nucleic acid sequence to be separated from an unbound probe. As an example, HARP-like probes, as disclosed in U.S. Publication No. 2006/0078894 may be used to measure the quantity of nullomers. In such methods, after hybridization between a probe and the targeted nucleic acid, the probe is modified to distinguish the hybridized probe from the unhybridized probe. Thereafter, the probe may be amplified and/or detected. In general, a probe inactivation region comprises a subset of nucleotides within the target hybridization region of the probe. To reduce or prevent amplification or detection of a HARP probe that is not hybridized to its target nucleic acid, and thus allow detection of the target nucleic acid, a post-hybridization probe inactivation step is carried out using an agent which is able to distinguish between a HARP probe that is hybridized to its targeted nucleic acid sequence and the corresponding unhybridized HARP probe. The agent is able to inactivate or modify the unhybridized HARP probe such that it cannot be amplified. A probe ligation reaction may also be used to quantify nullomers. In a Multiplex Ligation-dependent Probe Amplification (MLPA) technique (Schouten et al., Nucleic Acids Research 30:e57 (2002)), pairs of probes which hybridize immediately adjacent to each other on the target nucleic acid are ligated to each other driven by the presence of the target nucleic acid. In some aspects, MLPA probes have flanking PCR primer binding sites. MLPA probes are specifically amplified when ligated, thus allowing for detection and quantification of nullomer biomarkers.
The nullomers described herein can be used individually or in combination in diagnostic tests to assess the type of cancer, tissue of origin, and status or stage of the cancer in a subject. Cancer status or stage includes the presence or absence of the cancer. Cancer status or stage may also include monitoring the course of the cancer, for example, monitoring disease progression. Based on the cancer status or stage of a subject, additional procedures may be indicated, including, for example, additional diagnostic tests or therapeutic procedures.
The power of a diagnostic test to correctly predict disease status is commonly measured in terms of the accuracy of the assay, the sensitivity of the assay, the specificity of the assay, or the “Area Under a Curve” (AUC), for example, the area under a Receiver Operating Characteristic (ROC) curve. As used herein, accuracy is a measure of the fraction of misclassified samples. Accuracy may be calculated as the total number of correctly classified samples divided by the total number of samples, e.g., in a test population. Sensitivity is a measure of the “true positives” that are predicted by a test to be positive, and may be calculated as the number of correctly identified cancer samples divided by the total number of cancer samples. Specificity is a measure of the “true negatives” that are predicted by a test to be negative, and may be calculated as the number of correctly identified normal samples divided by the total number of normal samples. AUC is a measure of the area under a Receiver Operating Characteristic curve, which is a plot of sensitivity vs. the false positive rate (1-specificity). The greater the AUC, the more powerful the predictive value of the test. Other useful measures of the utility of a test include the “positive predictive value,” which is the percentage of actual positives who test as positives, and the “negative predictive value,” which is the percentage of actual negatives who test as negatives. In some embodiments, the level of one or more nullomers in samples obtained from subjects having different cancer statuses show a statistically significant difference of at least about 0.05 (p=0.05) relative to normal subjects, as determined relative to a suitable control. In some embodiments, the level of one or more nullomers in samples obtained from subjects having different cancer statuses show a statistically significant difference of at least about 0.01 (p=0.01) relative to normal subjects, as determined relative to a suitable control. In some embodiments, the level of one or more nullomers in samples obtained from subjects having different cancer statuses show a statistically significant difference of at least about 0.005 (p=0.005) relative to normal subjects, as determined relative to a suitable control. In some embodiments, the level of one or more nullomers in samples obtained from subjects having different cancer statuses show a statistically significant difference of at least about 0.001 (p=0.001) relative to normal subjects, as determined relative to a suitable control.
In other embodiments, diagnostic tests that use nullomers described herein individually or in combination show an accuracy of at least about 75%, e.g., an accuracy of at least about 75%, about 80%, about 85%, about 90%, about 95%, about 97%, about 99% or about 100%. In other embodiments, diagnostic tests that use nullomers described herein individually or in combination show a specificity of at least about 75%, e.g., a specificity of at least about 75%, about 80%, about 85%, about 90%, about 95%, about 97%, about 99% or about 100%. In other embodiments, diagnostic tests that use nullomers described herein individually or in combination show a sensitivity of at least about 75%, e.g., a sensitivity of at least about 75%, about 80%, about 85%, about 90%, about 95%, about 97%, about 99% or about 100%. In other embodiments, diagnostic tests that use nullomers described herein individually or in combination show a specificity and sensitivity of at least about 75% each, e.g., a specificity and sensitivity of at least about 75%, about 80%, about 85%, about 90%, about 95%, about 97%, about 99% or about 100% (for example, a specificity of at least about 80% and sensitivity of at least about 80%, or for example, a specificity of at least about 80% and sensitivity of at least about 95%).
Each nullomer listed in TABLE 1 is identified as being associated with certain type(s) of cancer as provided. In some instances, one particular nullomer may be associated with more than one types of cancers. In other instances, one particular nullomer may be associated with only one type of cancer.
Each nullomer listed in TABLE 1 is differentially present in biological samples derived from subjects having certain types of cancers as compared with normal subjects, and thus each is individually useful in facilitating the determination of those types of cancer in a test subject. Such a method involves determining the level of the nullomer in a sample obtained from the subject. Determining the level of the nullomer in a sample may include measuring, detecting, or assaying the level of the nullomer in the sample using any suitable method, for example, the methods set forth herein. Determining the level of the nullomer in a sample may also include examining the results of an assay that measured, detected, or assayed the level of the nullomer in the sample. The method may also involve comparing the level of the nullomer in a sample with a suitable control. A change in the level of the nullomer relative to that in a normal subject as assessed using a suitable control is indicative of the cancer status or stage of the subject. A diagnostic amount of a nullomer that represents an amount of the nullomer above or below which a subject is classified as having a particular cancer status or stage can be used. For example, if the nullomer is upregulated in samples from an individual having cancer as compared to a normal individual, a measured amount above the diagnostic cutoff provides a diagnosis of the type of cancer that individual has. Generally, the nullomers in TABLE 1 and Table 7 are upregulated in cancer samples relative to samples obtained from normal individuals. As is well-understood in the art, adjusting the particular diagnostic cut-off used in an assay allows one to adjust the sensitivity and/or specificity of the diagnostic assay as desired. The particular diagnostic cut-off can be determined, for example, by measuring the amount of the nullomer in a statistically significant number of samples from subjects with different cancer statuses, and drawing the cut-off at the desired level of accuracy, sensitivity, and/or specificity. In certain embodiments, the diagnostic cut-off can be determined with the assistance of a classification algorithm, as described elsewhere herein.
Accordingly, methods are provided for diagnosing cancer in a subject, by determining the level of at least one nullomer in a sample from the subject, wherein a difference in the level of the at least one nullomer versus that in a normal subject (as determined relative to a suitable control) is indicative of cancer in the subject. In some embodiments, the at least one nullomer includes one or more nullomers from TABLE 1. In some embodiments, a difference in the level of the at least one nullomer versus that in a normal subject (as determined relative to a suitable control) is indicative of the type(s) of cancer identified as being associated with the detected at least one nullomer in the subject. For example, the disclosed method of determining the level of at least one nullomer in a sample from a subject, wherein an increase in the level of the at least one nullomer relative to a control is indicative of cancer in the subject, particularly of the type(s) of cancer identified as being associated with the at least one nullomer detected. In some embodiments, the subject is diagnosed with having breast cancer, pancreatic cancer, esophagus cancer, lymphoid cancer, kidney cancer, ovary cancer, head and neck cancer, lung cancer, stomach cancer, CNS cancer, uterus cancer, skin cancer, colorectal cancer, prostate cancer, bladder cancer, bone and soft tissue cancer, biliary cancer, cervix cancer, thyroid cancer, myeloid cancer, or liver cancer by the disclosed method.
Optionally, the method may further comprise providing a diagnosis that the subject has or does not have cancer based on the level of at least one nullomer in the sample. In addition or alternatively, the method may further comprise correlating a difference in the level or levels of at least one nullomer relative to a suitable control with a diagnosis of cancer in the subject. In some embodiments, such a diagnosis may be provided directly to the subject, or it may be provided to another party involved in the subject's care.
While individual nullomers are useful in diagnostic applications for various types of cancer, as shown herein, a combination of nullomers may provide greater predictive value of cancer status or stage than the nullomers when used alone. Specifically, the detection of a plurality of nullomers can increase the accuracy, sensitivity, and/or specificity of a diagnostic test. The detection of a plurality of nullomers can also assist in narrowing down the type of cancer and/or status or stage thereof in a subject. This is particular useful when a given nullomer is identified as being associated with more than one type of cancer. For instance, if nullomer A is identified as being associated with cancers X, Y and Z, nullomer B is identified as being associated with cancers X and Y, and nullomer C is identified as being associated with cancers X and Z, by a process of elimination, a detection of the presence of nullomers A, B and C in a subject is indicative that the subject has cancer X. The disclosure thus includes the individual nullomer provided in TABLE 1 and nullomer combinations as set forth herein, and their use in methods and kits described herein. Accordingly, methods are provided for diagnosing cancer in a subject, by determining the level of two or more nullomers in a sample from the subject, wherein a difference in the level of the nullomers versus that in a normal subject (as determined relative to a suitable control) is indicative of cancer in the subject. In some embodiments, the nullomers include one or more of nullomers provided in TABLE1. In some embodiments, the type(s) of cancer thus diagnosed is/are the one(s) provided in TABLE 1 as being associated with each individual nullomer provided in TABLE 1.
Also provided is a method of diagnosing cancer in a subject by determining the levels of two or more nullomers in a sample from the subject, comparing the levels of the two or more nullomers in the sample to a set of data representing levels of the nullomers present in normal subjects and subjects having a particular type of cancer, and diagnosing the subject as having or not having that particular type of cancer based on the comparison. In such a method, the set of data serves as a suitable control or reference standard for comparison with the sample from the subject.
Comparison of the sample from the subject with the set of data may be assisted by a classification algorithm, which computes whether or not a statistically significant difference exists between the collective levels of the two or more nullomers in the sample, and the levels of the same nullomers present in normal subjects or subjects having cancer.
In some embodiments, data that are generated using samples such as “known samples” can then be used to “train” a classification model. A “known sample” is a sample that has been pre-classified, e.g., classified as being derived from a normal subject or from a subject having a particular type of cancer. The data that are derived from the spectra and are used to form the classification model can be referred to as a “training data set.” Once trained, the classification model can recognize patterns in data derived from spectra generated using unknown samples. The classification model can then be used to classify the unknown samples into classes. This can be useful, for example, in predicting whether or not a particular biological sample is associated with a certain biological condition (e.g., diseased versus non-diseased).
In some embodiments, data for the training data set that is used to form the classification model can be obtained directly from quantitative PCR (for example, Ct values obtained using the double delta Ct method), or from high-throughput expression profiling, such as microarray analysis (for example, total counts or normalized counts from a nullomer expression assay).
Classification models can be formed using any suitable statistical classification (or “learning”) method that attempts to segregate bodies of data into classes based on objective parameters present in the data. Classification methods may be either supervised or unsupervised. Examples of supervised and unsupervised classification processes are described in Jain, “Statistical Pattern Recognition: A Review,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 1, January 2000, the teachings of which are incorporated by reference in its entirety.
In supervised classification, training data containing examples of known categories are presented to a learning mechanism, which learns one or more sets of relationships that define each of the known classes. New data may then be applied to the learning mechanism, which then classifies the new data using the learned relationships. Examples of supervised classification processes include linear regression processes (e.g., multiple linear regression (MLR), partial least squares (PLS) regression and principal components regression (PCR)), binary decision trees (e.g., recursive partitioning processes such as CART—classification and regression trees), artificial neural networks such as back propagation networks, discriminant analyses (e.g., Bayesian classifier or Fischer analysis), logistic classifiers, and support vector classifiers (support vector machines).
In other embodiments, the classification models that are created can be formed using unsupervised learning methods. Unsupervised classification attempts to learn classifications based on similarities in the training data set, without pre-classifying the spectra from which the training data set was derived. Unsupervised learning methods include cluster analyses. A cluster analysis attempts to divide the data into “clusters” or groups that ideally should have members that are very similar to each other, and very dissimilar to members of other clusters. Similarity is then measured using some distance metric, which measures the distance between data items, and clusters together data items that are closer to each other. Clustering techniques include the MacQueen's K-means algorithm and the Kohonen's Self-Organizing Map algorithm. Learning algorithms asserted for use in classifying biological information are described, for example, in PCT International Publication No. WO 01/31580 (Barnhill et al., “Methods and devices for identifying patterns in biological systems and methods of use thereof”), U.S. application publication No. 2002/0193950 A1 (Gavin et al, “Method or analyzing mass spectra”), U.S. application publication No. 2003/0004402 A1 (Hitt et al., “Process for discriminating between biological states based on hidden patterns from biological data”), and U.S. application publication No. 2003/0055615 A1 (Zhang and Zhang, “Systems and methods for processing biological expression data”). The contents of the foregoing patent applications are incorporated herein by reference in their entireties.
The classification models can be formed on and used on any suitable digital computer. Suitable digital computers include micro, mini, or large computers using any standard or specialized operating system, such as a Unix, WINDOWS or LINUX based operating system.
The training data set(s) and the classification models can be embodied by computer code that is executed or used by a digital computer. The computer code can be stored on any suitable computer readable media including optical or magnetic disks, sticks, tapes, etc., and can be written in any suitable computer programming language including C, C++, visual basic, etc.
The learning algorithms described herein can be used for developing classification algorithms for nullomers for various types of cancer. The classification algorithms can, in turn, be used in diagnostic tests by providing diagnostic values (e.g., cut-off points) for nullomers used singly or in combination.
The level of nullomers indicative of various types of cancer may be used as a stand-alone diagnostic indicator of cancer in a subject. Optionally, the methods may include the performance of at least one additional test to facilitate the diagnosis of cancer. For example, other tests in addition to determining the level of one or more nullomers in order to facilitate a diagnosis of cancer may be performed. Any other test or combination of tests used in clinical practice to facilitate a diagnosis of cancer may be used in conjunction with the nullomers described herein.
In some embodiments, where a subject is diagnosed with a particular type of cancer by the methods described herein, the disclosure further provides methods of treating the subject identified as having a cancer. Accordingly, in some embodiments, the disclosure relates to a method of treating cancer in a subject, comprising determining the level of at least one nullomer in a sample from the subject, wherein a difference in the level of at least one nullomer versus that in a normal subject as determined relative to a suitable control is indicative of cancer in the subject, and administering a therapeutically effective amount of a cancer therapeutic to the subject. In another embodiments, the disclosure relates to a method of treating a subject having cancer, comprising identifying a subject having cancer in which the level of at least one nullomer in a sample from the subject is different (e.g., increased) versus that in a normal subject as determined relative to a suitable control, and administering a therapeutically effective amount of a cancer therapeutic to the subject.
The term “cancer therapeutic” includes, for example, substances approved by the U.S. Food and Drug Administration for the treatment of cancer. For instance, drugs approved to treat breast cancer include, but are not limited to, Abemaciclib, Abitrexate (Methotrexate), Abraxane (Paclitaxel Albumin-stabilized Nanoparticle Formulation), Ado-Trastuzumab Emtansine, Afinitor (Everolimus), Anastrozole, Aredia (Pamidronate Disodium), Arimidex (Anastrozole), Aromasin (Exemestane), Capecitabine, Clafen (Cyclophosphamide), Cyclophosphamide, Cytoxan (Cyclophosphamide), Docetaxel, Doxorubicin Hydrochloride, Ellence (Epirubicin Hydrochloride), Epirubicin Hydrochloride, Eribulin Mesylate, Everolimus, Exemestane, 5-FU (Fluorouracil Injection), Fareston (Toremifene), Faslodex (Fulvestrant), Femara (Letrozole), Fluorouracil Injection, Folex (Methotrexate), Folex PFS (Methotrexate), Fulvestrant, Gemcitabine Hydrochloride, Gemzar (Gemcitabine Hydrochloride), Goserelin Acetate, Halaven (Eribulin Mesylate), Herceptin (Trastuzumab), Ibrance (Palbociclib), Ixabepilone, Ixempra (Ixabepilone), Kadcyla (Ado-Trastuzumab Emtansine), Kisqali (Ribociclib), Lapatinib, Ditosylate, Letrozole, Megestrol Acetate, Methotrexate, Methotrexate LPF (Methotrexate), Mexate (Methotrexate), Mexate-AQ (Methotrexate), Neosar (Cyclophosphamide), Neratinib Maleate, Nerlynx (Neratinib Maleate), Nolvadex (Tamoxifen Citrate), Paclitaxel, Paclitaxel Albumin-stabilized Nanoparticle Formulation, Palbociclib, Pamidronate Disodium, Perjeta (Pertuzumab), Pertuzumab, Ribociclib, Tamoxifen Citrate, Taxol (Paclitaxel), Taxotere (Docetaxel), Thiotepa, Toremifene, Trastuzumab, Tykerb (Lapatinib Ditosylate), Velban (Vinblastine Sulfate), Velsar (Vinblastine Sulfate), Verzenio (Abemaciclib), Vinblastine Sulfate, Xeloda (Capecitabine), Zoladex (Goserelin Acetate).
The cancer therapeutics may be administered to a subject using a pharmaceutical composition. Suitable pharmaceutical compositions comprise a pharmaceutically effective amount of a cancer therapeutic (or a pharmaceutically acceptable salt or ester thereof), and optionally comprise a pharmaceutically acceptable carrier. In certain embodiments, these compositions optionally further comprise one or more additional therapeutic agents.
As used herein, the term “pharmaceutically acceptable salt” refers to those salts which are, within the scope of sound medical judgment, suitable for use in contact with the tissues of humans and lower animals without undue toxicity, irritation, allergic response and the like, and are commensurate with a reasonable benefit/risk ratio. Pharmaceutically acceptable salts of amines, carboxylic acids, and other types of compounds, are well known in the art. For example, S. M. Berge, et al. describe pharmaceutically acceptable salts in detail in J. Pharmaceutical Sciences, 66: 1-19 (1977), incorporated herein by reference. The salts can be prepared in situ during the final isolation and purification of the compounds, or separately by reacting a free base or free acid function with a suitable reagent. For example, a free base function can be reacted with a suitable acid. Furthermore, where the compounds carry an acidic moiety, suitable pharmaceutically acceptable salts thereof may, include metal salts such as alkali metal salts, e.g., sodium or potassium salts, and alkaline earth metal salts, e.g., calcium or magnesium salts. In some embodiments, the cancer therapeutic is a pharmaceutically acceptable salt.
The term “pharmaceutically acceptable ester,” as used herein, refers to esters that hydrolyze in vivo and include those that break down readily in the human body to leave the parent compound or a salt thereof. Suitable ester groups include, for example, those derived from pharmaceutically acceptable aliphatic carboxylic acids, particularly alkanoic, alkenoic, cycloalkanoic and alkanedioic acids, in which each alkyl or alkenyl moiety advantageously has not more than 6 carbon atoms. In some embodiments, the cancer therapeutic is a pharmaceutically acceptable ester.
As described above, the pharmaceutical compositions may additionally comprise a pharmaceutically acceptable carrier. The term “pharmaceutically acceptable carrier” includes any and all solvents, diluents, or other liquid vehicle, dispersion or suspension aids, surface active agents, isotonic agents, thickening or emulsifying agents, preservatives, solid binders, lubricants and the like, suitable for preparing the particular dosage form desired. Remington's Pharmaceutical Sciences, Sixteenth Edition, E. W. Martin (Mack Publishing Co., Easton, Pa., 1980) discloses various carriers used in formulating pharmaceutical compositions and known techniques for the preparation thereof. Some examples of materials which can serve as pharmaceutically acceptable carriers include, but are not limited to, sugars such as lactose, glucose and sucrose; starches such as corn starch and potato starch; cellulose and its derivatives such as sodium carboxymethyl cellulose, ethyl cellulose and cellulose acetate; powdered tragacanth; malt; gelatine; talc; excipients such as cocoa butter and suppository waxes; oils such as peanut oil, cottonseed oil; safflower oil, sesame oil; olive oil; corn oil and soybean oil; glycols; such as propylene glycol; esters such as ethyl oleate and ethyl laurate; agar; buffering agents such as magnesium hydroxide and aluminum hydroxide; alginic acid; pyrogenfree water; isotonic saline; Ringer's solution; ethyl alcohol, and phosphate buffer solutions, as well as other non-toxic compatible lubricants such as sodium lauryl sulfate and magnesium stearate, as well as coloring agents, releasing agents, coating agents, sweetening, flavoring and perfuming agents, preservatives and antioxidants can also be present in the composition, according to the judgment of the formulator.
Compositions for use in the present disclosure may be formulated to have any concentration of the cancer therapeutic desired. In some embodiments, the composition is formulated such that it comprises a therapeutically effective amount of the cancer therapeutic.
The disclosure generally relates to a method of diagnosing a subject with a benign, pre-malignant, or malignant hyperproliferative cell comprising: detecting the presence, absence, and/or quantity of at least one nullomer in a sample. In some embodiments, the step of detecting comprise exposing a sample from a subject (e.g., a human subject) to one or a plurality of probes, each probe capable of binding one or a plurality of nullomer in the sample. In some embodiments, the probe is a labeled nucleic acid molecule (DNA, RNA or hybrid thereof) that comprises at least about 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the complement of any nucleic acid sequences of TABLE 1 or TABLE 7. In some embodiments, the probe is a labeled nucleic acid molecule (DNA, RNA or hybrid thereof) that is an RNA sequence comprising at least about 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the complement of any nucleic acid sequences of TABLE 1, where each thymine is replaced with a uracil. In some embodiments, the plurality of probes are one or a combination of labeled nucleic acid sequences that are an RNA complementary to a nucleic acid sequence comprising at least about 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to any nucleic acid sequences of TABLE 1 or Table 7. In some embodiments, the plurality of probes are one or a combination of labeled nucleic acid sequences chosen from any nucleic acid sequences of TABLE 7. In some embodiments, the plurality of probes comprise one or a combination of nucleic acid sequences complementary to the nucleic acid sequences chosen from any nucleic acid sequences of TABLE 7. In some embodiments, the probe is a labeled nucleic acid molecule (DNA, RNA or hybrid thereof) that comprises at least about 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the complement of any nucleic acid sequences of TABLE 7. In some embodiments, the probe is a labeled nucleic acid molecule (DNA, RNA or hybrid thereof) that is an RNA sequence comprising at least about 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the complement of any nucleic acid sequences of TABLE 7, where each thymine is replaced with a uracil. In some embodiments, the plurality of probes are one or a combination of labeled nucleic acid sequences that are an RNA complementary to a nucleic acid sequence comprising at least about 70%, 80%, nucleic acid sequences of TABLE 7. In some embodiments, the plurality of probes are one or a combination of labeled nucleic acid sequences chosen from any nucleic acid sequences of TABLE 7. In some embodiments, the plurality of probes comprise one or a combination of nucleic acid sequences complementary to the nucleic acid sequences chosen from any nucleic acid sequences of TABLE 7.
In some embodiments, the probe is a labeled nucleic acid molecule (DNA, RNA or hybrid thereof) that comprises at least about 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the complement of any nucleic acid sequences of TABLE 1.
In some embodiments, the probe is a labeled nucleic acid molecule (DNA, RNA or hybrid thereof) that is an RNA sequence comprising at least about 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the complement of any nucleic acid sequences of TABLE 4, where each thymine is replaced with a uracil. In some embodiments, the plurality of probes are one or a combination of labeled nucleic acid sequences that are an RNA complementary to a nucleic acid sequence comprising at least about 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to any nucleic acid sequences of TABLE 4. In some embodiments, the plurality of probes are one or a combination of labeled nucleic acid sequences chosen from any nucleic acid sequences of TABLE 4. In some embodiments, the plurality of probes comprise one or a combination of nucleic acid sequences complementary to the nucleic acid sequences chosen from any nucleic acid sequences of TABLE 4.
In some embodiments, the probe is a labeled nucleic acid molecule (DNA, RNA or hybrid thereof) that comprises at least about 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the complement of any nucleic acid sequences of TABLE 5. In some embodiments, the probe is a labeled nucleic acid molecule (DNA, RNA or hybrid thereof) that is an RNA sequence comprising at least about 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the complement of any nucleic acid sequences of TABLE 5, where each thymine is replaced with a uracil. In some embodiments, the plurality of probes are one or a combination of labeled nucleic acid sequences that are an RNA complementary to a nucleic acid sequence comprising at least about 70%, 80%, nucleic acid sequences of TABLE 5. In some embodiments, the plurality of probes are one or a combination of labeled nucleic acid sequences chosen from any nucleic acid sequences of TABLE 5. In some embodiments, the plurality of probes comprise one or a combination of nucleic acid sequences complementary to the nucleic acid sequences chosen from any nucleic acid sequences of TABLE 5.
In any of the disclosed method embodiments, the subject may be a human diagnosed with or suspected as having cancer. In any of the disclosed method embodiments, wherein the step of detecting is preceded by a step of acquiring a sample from the subject.
In some embodiments, the probe or plurality of probes are one or a plurality of antibodies or antibody fragments comprising a CDR that binds to a nucleic acid molecule (DNA, RNA or hybrid thereof) that comprises at least 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to any nucleic acid sequences of TABLE 1. In some embodiments, the probe or plurality of probes are one or a plurality of antibodies or antibody fragments comprising a CDR that binds to a nucleic acid molecule (DNA, RNA or hybrid thereof) that comprises at least about 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to any nucleic acid sequences of TABLE 1, wherein each of sequences are modified such that the thymines in each sequence are replaced with a uracil. In some of the embodiments, the methods further comprise isolating RNA from the sample before exposing the sample to one or a plurality of probes. In some embodiments, the method comprises detecting or quantifying an amount of nullomers in a sample by performing semiquantitative or quantitative PCR or sequencing analysis of the nullomers in a sample. Probes may be immobilized to a solid support such as an ELISA plate, plastic, slide, microarray, silica chip or other surface such that the single-strand nucleotide sequences are exposed to a sample comprising nullomers from a subject. The probes may comprise, in some embodiments, from about 5 to about 100 nucleotides in length and comprise any of the sequences provided in TABLE 1 or any complementary sequence in RNA or DNA form of the sequences set forth in TABLE 1. In any of the disclosed method embodiments, the step of detecting the presence, absence, and/or quantity of at least one nullomer having at least about 70% sequence identity to one of the nullomers in a sample comprises using a chemoluminescent probe, fluorescent probe, and/or fluorescence microscopy, calculating the presence or quantity by correlating the signal of the detectable probe to the presence of the nullomer.
In some embodiments, any of the methods disclosed herein further comprise a step of correlating the presence or quantity of one or more nullomers, such as those disclosed in TABLE 1 or any combination thereof, to the likelihood that the subject has cancer. In some embodiments, the disclosure relates to a method of preparing, isolating or assessing a nucleic acid or ribonucleic acid fraction from a subject useful for analyzing a nullomer involved in cancer comprising: extracting DNA or RNA from a substantially cell-free sample of blood plasma or blood serum of a subject to obtain DNA or RNA pools; (b) producing a fraction of the DNA or RNA extracted in (a) by: (i) sequence discrimination of the DNA or RNA; and (ii) selectively removing nullomers by exposing one or a plurality of probes to the nullomers, wherein the nullomers after (b) comprises one or a plurality of nullomers disclosed in TABLE 1; and (c) analyzing the nullomers in the fraction of DNA or RNA produced in (b). In some embodiments, the step of analyzing comprises normalizing the amount of nullomers in the sample as compared to a control amount of nullomers from a control sample and determining whether the subject has cancer by comparing the normalized presence, absence or quantity of nullomers in the sample to the presence, absence or quantity of nullomers in a control sample.
The disclosure also provides kits for diagnosing type of cancer, tissue of origin, and status or stage of the cancer in a subject, which kits are useful for determining the level of one or more nullomers from TABLE 1, wherein the sequences optionally comprise uracils in place of one, more than one, or all of the disclosed thymines), and combinations thereof. In some embodiments, the one or more nullomers are selected from the nullomers listed in TABLE 1. Kits may include materials and reagents adapted to selectively detect the presence of a nullomer or group of nullomers diagnostic for cancer in a sample of a subject. For example, in some embodiments, the kit may include a reagent that specifically hybridizes to a nullomer. Such a reagent may be a nucleic acid molecule in a form suitable for detecting the nullomer, for example, a probe or a primer. The kit may include reagents useful for performing an assay to detect one or more nullomers, for example, reagents which may be used to detect one or more nullomers in a qPCR reaction. The kit may likewise include a microarray useful for detecting one or more nullomers.
In some embodiments, the kit may contain instructions for suitable operational parameters in the form of a label or product insert. For example, the instructions may include information or directions regarding how to collect a sample, how to determine the level of one or more nullomers in a sample, and/or how to correlate the level of one or more nullomers in a sample with the type of cancer, tissue of origin, and status or stage of the cancer of a subject.
In some embodiments, the kit can contain one or more containers with nullomer samples, to be used as reference standards, suitable controls, or for calibration of an assay to detect the nullomers in a test sample.
2H, 3H, 13C, 14C, 15N, 16O, 17O, 31P, 32p, 35S, 18F, 36C1, 225Ac, 227Ac, 212Bi, 213Bi, 109Cd, 60Co,
64Cu, 67Cu, 166Dy, 169Er, 152Eu, 154Eu, 153Gd, 198Au, 166Ho, 125I, 131I, 192Ir, 177Lu, 99Mo, 194Os, 103Pd,
195mpt, 32P, 33P, 223Ra, 186Re, 188Re, 105Rh, 145Sm, 153Sm, 47Sc, 75Se, 85Sr, 89Sr, 99mTc, 228Th, 229Th,
170Tm, 117mSn, 188W, 127Xe, 175Yb, 90Y, 91Y
In some methods of treatment disclosed herein, the agent in selected from one or a plurality of agents chosen from Table 3.
The above-described methods can be implemented in any of numerous ways. For example, the embodiments may be implemented using a computer program product (i.e. software), hardware, software or a combination thereof. When implemented in software, the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers.
Further, it should be appreciated that a computer may be embodied in any of a number of forms, such as a rack-mounted computer, a desktop computer, a laptop computer, or a tablet computer. Additionally, a computer may be embedded in a device not generally regarded as a computer but with suitable processing capabilities, including a Personal Digital Assistant (PDA), a smart phone or any other suitable portable or fixed electronic device.
Also, a computer may have one or more input and output devices. These devices can be used, among other things, to present a user interface. Examples of output devices that can be used to provide a user interface include printers or display screens for visual presentation of output and speakers or other sound generating devices for audible presentation of output. Examples of input devices that can be used for a user interface include keyboards, and pointing devices, such as mice, touch pads, and digitizing tablets. As another example, a computer may receive input information through speech recognition or in other audible format.
Such computers may be interconnected by one or more networks in any suitable form, including a local area network or a wide area network, such as an enterprise network, and intelligent network (IN) or the Internet. Such networks may be based on any suitable technology and may operate according to any suitable protocol and may include wireless networks, wired networks or fiber optic networks.
A computer employed to implement at least a portion of the functionality described herein may include a memory, coupled to one or more processing units (also referred to herein simply as “processors”), one or more communication interfaces, one or more display units, and one or more user input devices. The memory may include any computer-readable media, and may store computer instructions (also referred to herein as “processor-executable instructions”) for implementing the various functionalities described herein. The processing unit(s) may be used to execute the instructions. The communication interface(s) may be coupled to a wired or wireless network, bus, or other communication means and may therefore allow the computer to transmit communications to and/or receive communications from other devices. The display unit(s) may be provided, for example, to allow a user to view various information in connection with execution of the instructions. The user input device(s) may be provided, for example, to allow the user to make manual adjustments, make selections, enter data or various other information, and/or interact in any of a variety of manners with the processor during execution of the instructions.
The various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Additionally, such software may be written using any of a number of suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a framework or virtual machine.
In this respect, various inventive concepts may be embodied as a computer readable storage medium (or multiple computer readable storage media) (e.g., a computer memory, one or more floppy discs, compact discs, optical discs, magnetic tapes, flash memories, circuit configurations in Field Programmable Gate Arrays or other semiconductor devices, or other non-transitory medium or tangible computer storage medium) encoded with one or more programs that, when executed on one or more computers or other processors, perform methods that implement the various embodiments of the invention disclosed herein. The computer readable medium or media can be transportable, such that the program or programs stored thereon can be loaded onto one or more different computers or other processors to implement various aspects of the present invention as discussed above. In some embodiments, the system comprises cloud-based software that executes one or all of the steps of each disclosed method instruction.
The terms “program” or “software” are used herein in a generic sense to refer to any type of computer code or set of computer-executable instructions that can be employed to program a computer or other processor to implement various aspects of embodiments as discussed above. Additionally, it should be appreciated that according to one aspect, one or more computer programs that when executed perform methods of the present disclosure need not reside on a single computer or processor, but may be distributed in a modular fashion amongst a number of different computers or processors to implement various aspects of the present invention.
Computer-executable instructions may be in many forms, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments.
Also, data structures may be stored in computer-readable media in any suitable form. For simplicity of illustration, data structures may be shown to have fields that are related through location in the data structure. Such relationships may likewise be achieved by assigning storage for the fields with locations in a computer-readable medium that convey relationship between the fields. However, any suitable mechanism may be used to establish a relationship between information in fields of a data structure, including through the use of pointers, tags or other mechanisms that establish relationship between data elements.
Also, the disclosure relates to various embodiments in which one or more methods. The acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments.
In some embodiments, the disclosure relates to a system that comprises at least one processor, a program storage, such as memory, for storing program code executable on the processor, and one or more input/output devices and/or interfaces, such as data communication and/or peripheral devices and/or interfaces. In some embodiments, the user device and computer system or systems are communicably connected by a data communication network, such as a Local Area Network (LAN), the Internet, or the like, which may also be connected to a number of other client and/or server computer systems. The user device and client and/or server computer systems may further include appropriate operating system software.
In some embodiments, components and/or units of the devices described herein may be able to interact through one or more communication channels or mediums or links, for example, a shared access medium, a global communication network, the Internet, the World Wide Web, a wired network, a wireless network, a combination of one or more wired networks and/or one or more wireless networks, one or more communication networks, an a-synchronic or asynchronous wireless network, a synchronic wireless network, a managed wireless network, a non-managed wireless network, a burstable wireless network, a non-burstable wireless network, a scheduled wireless network, a non-scheduled wireless network, or the like.
Discussions herein utilizing terms such as, for example, “processing,” “computing,” “calculating,” “determining,” or the like, may refer to operation(s) and/or process(es) of a computer, a computing platform, a computing system, or other electronic computing device, that manipulate and/or transform data represented as physical (e.g., electronic) quantities within the computer's registers and/or memories into other data similarly represented as physical quantities within the computer's registers and/or memories or other information storage medium that may store instructions to perform operations and/or processes.
Some embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment including both hardware and software elements. Some embodiments may be implemented in software, which includes but is not limited to firmware, resident software, microcode, or the like.
Furthermore, some embodiments may take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For example, a computer-usable or computer-readable medium may be or may include any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
In some embodiments, the medium may be or may include an electronic, magnetic, optical, electromagnetic, InfraRed (IR), or semiconductor system (or apparatus or device) or a propagation medium. Some demonstrative examples of a computer-readable medium may include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a Random Access Memory (RAM), a Read-Only Memory (ROM), a rigid magnetic disk, an optical disk, or the like. Some demonstrative examples of optical disks include Compact Disk-Read-Only Memory (CD-ROM), Compact Disk-Read/Write (CD-R/W), DVD, or the like.
In some embodiments, a data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements, for example, through a system bus. The memory elements may include, for example, local memory employed during actual execution of the program code, bulk storage, and cache memories which may provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
In some embodiments, input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) may be coupled to the system either directly or through intervening I/O controllers. In some embodiments, network adapters may be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices, for example, through intervening private or public networks. In some embodiments, modems, cable modems and Ethernet cards are demonstrative examples of types of network adapters. Other suitable components may be used.
Some embodiments may be implemented by software, by hardware, or by any combination of software and/or hardware as may be suitable for specific applications or in accordance with specific design requirements. Some embodiments may include units and/or sub-units, which may be separate of each other or combined together, in whole or in part, and may be implemented using specific, multi-purpose or general processors or controllers. Some embodiments may include buffers, registers, stacks, storage units and/or memory units, for temporary or long-term storage of data or in order to facilitate the operation of particular implementations.
Some embodiments may be implemented, for example, using a machine-readable medium or article which may store an instruction or a set of instructions that, if executed by a machine, cause the machine to perform a method steps and/or operations described herein. Such machine may include, for example, any suitable processing platform, computing platform, computing device, processing device, electronic device, electronic system, computing system, processing system, computer, processor, or the like, and may be implemented using any suitable combination of hardware and/or software. The machine-readable medium or article may include, for example, any suitable type of memory unit, memory device, memory article, memory medium, storage device, storage article, storage medium and/or storage unit; for example, memory, removable or non-removable media, erasable or non-erasable media, writeable or re-writeable media, digital or analog media, hard disk drive, floppy disk, Compact Disk Read Only Memory (CD-ROM), Compact Disk Recordable (CD-R), Compact Disk Re-Writeable (CD-RW), optical disk, magnetic media, various types of Digital Versatile Disks (DVDs), a tape, a cassette, or the like. The instructions may include any suitable type of code, for example, source code, compiled code, interpreted code, executable code, static code, dynamic code, or the like, and may be implemented using any suitable high-level, low-level, object-oriented, visual, compiled and/or interpreted programming language, e.g., C, C++, Java™, BASIC, Pascal, Fortran, Cobol, assembly language, machine code, or the like.
Many of the functional units described in this specification have been labeled as circuits, in order to more particularly emphasize their implementation independence. For example, a circuit may be implemented as a hardware circuit comprising custom very-large-scale integration (VLSI) circuits or gate arrays, off-the-shelf semiconductors such as logic chips, transistors, or other discrete components. A circuit may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices or the like.
In some embodiment, the circuits may also be implemented in machine-readable medium for execution by various types of processors. An identified circuit of executable code may, for instance, comprise one or more physical or logical blocks of computer instructions, which may, for instance, be organized as an object, procedure, or function. Nevertheless, the executables of an identified circuit need not be physically located together, but may comprise disparate instructions stored in different locations which, when joined logically together, comprise the circuit and achieve the stated purpose for the circuit. Indeed, a circuit of computer readable program code may be a single instruction, or many instructions, and may even be distributed over several different code segments, among different programs, and across several memory devices. Similarly, operational data may be identified and illustrated herein within circuits, and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed over different locations including over different storage devices, and may exist, at least partially, merely as electronic signals on a system or network.
The computer readable medium (also referred to herein as machine-readable media or machine-readable content) may be a tangible computer readable storage medium storing the computer readable program code. The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, holographic, micromechanical, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. As alluded to above, examples of the computer readable storage medium may include but are not limited to a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a portable compact disc read-only memory (CD-ROM), a digital versatile disc (DVD), an optical storage device, a magnetic storage device, a holographic storage medium, a micromechanical storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, and/or store computer readable program code for use by and/or in connection with an instruction execution system, apparatus, or device.
The computer readable medium may also be a computer readable signal medium. A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electrical, electro-magnetic, magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport computer readable program code for use by or in connection with an instruction execution system, apparatus, or device. As also alluded to above, computer readable program code embodied on a computer readable signal medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, Radio Frequency (RF), or the like, or any suitable combination of the foregoing. In one embodiment, the computer readable medium may comprise a combination of one or more computer readable storage mediums and one or more computer readable signal mediums. For example, computer readable program code may be both propagated as an electro-magnetic signal through a fiber optic cable for execution by a processor and stored on RAM storage device for execution by the processor.
Computer readable program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program code may execute entirely on a user's computer, partly on the user's computer, as a stand-alone computer-readable package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
The program code may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the schematic flowchart diagrams and/or schematic block diagrams block or blocks.
Functions, operations, components and/or features described herein with reference to one or more embodiments, may be combined with, or may be utilized in combination with, one or more other functions, operations, components and/or features described herein with reference to one or more other embodiments, or vice versa.
Other embodiments are described in the following non-limiting Examples. Various publications, including patents, published applications, technical articles and scholarly articles are cited throughout the specification. Each of these cited publications is incorporated by reference herein in its entirety.
Cancer detection using cell-free DNA (cfDNA) has the potential to significantly improve cancer diagnosis and survival. However, cfDNA diagnostics suffer from several deficiencies, including low tumor cfDNA concentration, sensitivity and technical limitations, in particular for cfDNA methylation analyses. Here, we set out to test whether nullomers could be used as a diagnostic tool to detect cancer in general and also specific subtypes. We first analyzed The Cancer Genome Atlas (TCGA; (“The Cancer Genome Atlas Program” 2018)) database finding recurrent nullomers created by somatic mutations that could be used to detect not only cancer subtypes with higher accuracy than leading methods (Jiao et al. 2020) but also additional cancer features. Further analyses of cfDNA whole-genome sequencing datasets found that these nullomers can also be used to detect cancer subtypes in these data without the need for healthy control samples. Using a targeted nullomer sequencing panel in cfDNA from individuals with prostate cancer and normal controls found nullomer enrichment in cases. Finally, functional assays of prostate cancer associated nullomers show that they have a functional effect on both coding and noncoding sequences. Combined, our results show that nullomers can be used as rapid, sensitive, specific and straightforward cancer diagnosis and also aid in the identification of gene regulatory mutations associated with cancer.
i. Computational Characterization of Recurrent Nullomers
The GRCh38 reference assembly of the human genome was used throughout the study. Nullomer extraction was performed for kmer lengths up to 17 base pairs using the same algorithm described in Georgakopoulos-Soares et al., 2020. By definition, the reverse complement of a nullomer will also be a nullomer. Throughout this example when counting nullomers, the reverse complement of nullomer i is also considered unless i is a palindrome. Mutation calling for whole genome sequencing (WGS) tumor samples from 2,575 individuals across 21 tissues (ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium 2020) was performed for substitutions and indels as described in Georgakopoulos-Soares et al., 2019. To filter out common population variants, we obtained variant information from the gnomAD v2 (Karczewski et al. 2020) and annotated all nullomers that were generated due to these variants. We then excluded from our subsequent analyses nullomers that were found in the population with a p-value of 0.05 or higher. For each substitution, a local window was generated with the mutation introduced in the window sequence; each nullomer was then scanned across the window to check whether any matches were found. Matches were reported and stored. Recurrent nullomers were annotated as those that resulted from substitutions or indels across two or more patients within a cancer type. When possible, ri was chosen to get ˜10,000 nullomers from each tissue, otherwise it was set to 2.
ii. Classification
The algorithms to replicate our classification analysis are: TrainNullomerClassifier, which trains a predictive model on cancer tissue based on the nullomer profile and RunNullomerClassifier, which runs the trained classifier (from TrainNullomerClassifier) on a set of nullomers (from FindDNANullomersFromReads).
iii. Comparison to validated neoantigens
We downloaded a list of 1,967 validated neoantigens from biopharm.zju.edu.cn/download.neoantigen/iedb_validated.zip. Requiring predicted strong binding and a positive validation, left us with 1,700 neoantigens. To evaluate the enrichment of neoantigens corresponding to recurrent nullomers, we assumed a hypergeometric distributions with 1,700 draws from an urn with 188,659 white balls (recurrent nullomers) and 186,067,892 black balls (number of non-recurrent nullomers found).
iv. Nullomer Identification in ctDNA Samples
We developed a Poisson model where the expected number of nullomers of type i is given by C′eNi 3, where C is the coverage, e is the error rate, Ni is the number of loci where a substitution could result in the creation of nullomer i, and the division by 3 is to account for the fact that only one of three substitutions will create the nullomer.
v. Targeted Sequencing of cfDNA Samples
We designed a set of 4,590 baits (total of 78,102 bp of sequence space) covering all nullomer locations implicated in our prostate cancer classifier. 309 probes were removed from the panel due to synthesis issues, ending with 4,280 probes (93.24% final coverage) of 120 bp in length spanning evenly the exact location of the nullomer causing mutation. Custom target-enriched DNA libraries were generated by Twist Bioscience and prepared using Twist modular library preparation kits enabled by KAPA HiFi HotStart ReadyMix PCR Kit (Kapa Bioscience). Twist universal adapters were replaced with IDT's xGen UDI-UMI 96 barcodes system (IDT). cfDNA was extracted as described in Chen et. al. (Chen et al., 2021). Briefly, whole blood was collected using PAXgene Blood ccfDNA tubes (Qiagen) and the final extraction step was done using the QIAamp Circulating Nucleic Acid Kit (Qiagen). Extracted DNA was stored at −80° ° C. prior to further analysis. cfDNA was then hybridized and libraries were enriched using Streptavidin C1 beads and the washed material was amplified via 9-14 cycles of PCR. Targeted sequencing was performed with PE150 reads, dual index (8 bp i5 and 8 bp+9 bp UMI at i7 position) on a HiSeq4000 platform (Illumina).
vi. Promoter Luciferase Assays
Promoter sequences with and without the nullomer were synthetically generated and cloned into the modified Promega promoter assay luciferase vector pGL4.11b (a gift from Dr. Rick Myers, HudsonAlpha) by BioMatik Inc. and Sanger sequence verified. LNCaP cells were plated at an initial density of 2×105 cells/well in 24-well tissue culture plates and maintained in RPMI medium, 10% FBS supplemented with L-Glutamine and Penicillin/Streptomycin. Plasmids together with a renilla expressing plasmid, pGL4.74 (Promega), at a ratio of 10:1 luciferase:renilla were transfected using the X-tremeGENE™ HP DNA Transfection Reagent (Roche) using 1:4 ratio of DNA (μg) to reagent (μl). 72 hours post transfection, luciferase and renilla levels were measured using the Dual-Luciferase Reporter Assay System (Promega) following the manufacturer's protocol using a GloMax Explorer Multimode Microplate Reader (Promega). Luciferase activity was normalized to renilla levels and presented as Relative Luciferase Units (RLU). Statistical analysis was performed using Prism version 9.0.2 (GraphPad). All values were reported as means (AVG) and standard errors (SE). p values<0.05 were considered statistically significant.
vii. Software Availability
We generated an easy to use software package that enables performing nullomer cancer analyses from sequence-based datasets. The package is composed of six functions: 1) EnumerateNullomers, which extracts all nullomers of specified kmer lengths in a FASTA sample; 2) ExtractMutationNullomers, which finds all mutations that cause the resurfacing of a list of nullomers; 3) IdentifyRecurrentNullomers, which identifies nullomers that recur in a dataset through mutagenesis; 4) FindAlmostNullomers, which identifies the positions that can create a list of nullomers genome-wide for every possible substitution and single base-pair insertion and deletion; 5) FindNullomerVariants, which removes nullomers that are likely to result from common variants in a user specified variant VCF file; and 6) FindDNANullomersFromReads, which performs the identification of nullomers in raw read samples.
The package can be found at: github.com/Ahituv-lab/Nullomerator and a readthedocs tutorial provides in depth details on how to run the software functions.
i. Annotation of Mutations that Lead to Nullomers
As cancer causes DNA mutations, we investigated if they can result in the resurfacing of nullomers (
Analysis of the most frequent recurring nullomers revealed several previously known cancer-associated mutations (TABLE 4). For example, one of the most recurrent coding nullomers are the Gly12Asp, Gly12Val and Gly 12Cys missense mutations in KRAS, which is known to make up to 80% of cancer-associated KRAS mutations and causes KRAS to be constitutively active (Prior, Lewis, and Mattos, 2012; Muñoz-Maldonado, Zimmer, and Medová 2019). Although KRAS has been associated with several cancers, 190/313 (60%) of these mutations are found in pancreatic cancers. Several highly recurrent coding nullomers were also found in other known cancer-associated genes such as TP53, BRAF and PIK3CA. The top recurring nullomer mutation was located in a noncoding region, within the telomerase reverse transcriptase (TERT) promoter, which is known to be associated with numerous cancer types (Vinagre et al., 2013). This mutation, called−124C>T or C228T, is extremely common in numerous cancer types (Heidenreich et al., 2014) and thought to disrupt a G-quadruplex (Song et al., 2019) and lead to the binding of GAPB (Bell et al., 2015), an ETS transcription factor, resulting in increased TERT expression. We found this mutation in 97 patients with the highest incidence in glioblastoma (51%), fitting with its high prevalence rate and diagnostic use for this cancer type (Powter et al., 2021). There were also several nullomers that are created by different mutations (TABLE 5), e.g., CGACGTTCTGCCCACT, which was found in 74 patients at 32 loci in seven different cancers. Interestingly, some of the frequently recurrent nullomers are created by different mutations, yet they are predominantly found in one cancer. For example, GTTTTTCTCCTAGACC is found 40 times in skin cancer at 31 different loci while CTGGCAGTGAGCCACG is found 21 times in liver at 18 loci.
ii. Generation of a Cancer Subtype Nullomer Classifier
Based on the observation that most recurrent nullomers are predominantly found in one cancer type, we hypothesized that nullomers can be used to distinguish between cancer types. We filtered nullomers by keeping only those that appeared >=ri times in specific cancer type i. Comparison of the set of recurrent nullomers associated with each cancer type reveals that the overlap is small, as indicated by the Jaccard index which is <0.03, suggesting that each cancer type has a distinct nullomer signature (
To test if these recurrent nullomers can classify tumor samples, we trained a support vector machine classifier to identify tumor type. The classifier takes as input a 21-dimensional vector indicating the number of recurrent nullomers found for each cancer specific set. Evaluation using 10-fold cross-validation, revealed that our classifier achieves both high sensitivity and specificity, with an F1 score of 0.92 and an accuracy of 0.99. The performance was better than the deep learning model recently presented by Jiao et al (Jiao et al., 2020) and also requires less computational resources to train. Moreover, the nullomer based classifier is more intuitive and easier to interpret biologically as samples are distinguished based on the number of nullomers corresponding to cancer specific sets.
iii. Nullomers can Distinguish Additional Cancer Features
To test if nullomers could be used to distinguish other cancer features, we analyzed both breast and colorectal cancers. For breast cancer, determining whether a sample is deficient for BRCA1 or BRCA2 is important in treatment decisions (Lee, Moon, and Kim, 2020; Tung and Garber, 2018). We used a dataset of 560 breast cancers (Nik-Zainal et al., 2016), from which we selected 89 samples that were deficient for BRCA1 or BRCA2, and a similar number that were not deficient. We extracted 3,648 recurrent nullomers for the BRCA deficient samples and 1,174 for the non-BRCA deficient ones, and found that the resulting classifier achieves an accuracy of 0.76 and F1 score of 0.78 (
iv. Nullomers are Enriched in ctDNA
We next tested whether nullomers could be used to diagnose cancer in cfDNA. We focused on prostate cancer, due to the following reasons: 1) our availability of both WGS datasets and cfDNA samples; 2) the number of recurrent nullomers per this subtype (N=X) is in the median range (Y) of all 21 tissues that we analyzed; and 3) the current primary screen for this cancer measures levels of the prostate-specific antigen (PSA) in the blood and has high false negative and false positive rates (Barry, 2001) and the importance of more accurate screening for minimal residual disease after treatment or surgical interventions (Cackowski and Taichman, 2018; Murray, 2018). We first excluded all common variants (p>0.05) that lead to the resurfacing of nullomers that we characterized in the human genome (Georgakopoulos-Soares et al., 2020). We then analyzed WGS from 6 cfDNA samples from prostate cancer patients and 23 controls (Ulz et al., 2019). For each nullomer that we identified in the cfDNA (both cases and controls), we characterized all possible single nucleotide substitutions in the reference genome that could give rise to this nullomer. By intersecting this list of nullomer creating substitutions with known germline variants identified by the gnomAD project (Karczewski et al., 2020), we calculated the probability that each nullomer will be present in an individual. We excluded all nullomers that are found in the population with p>0.05, leaving us with 4,665 recurrent prostate nullomers.
Another source of nullomers in cfDNA WGS could be sequencing errors. To identify nullomers that were observed due to these technical artifacts, we developed a Poisson model (see Methods). Since sequencing errors are assumed to be distributed uniformly, nullomers arising for this reason will have a profile that differs from nullomers stemming from sequences that are present in the cfDNA sample. To ensure robust detection, and to be able to compare samples of different sequencing depth, we randomly split the reads into chunks of 5× and searched for nullomers in each chunk separately. Nullomers found to be significant in at least two chunks were assumed to not be sequencing errors.
After filtering our data for both common variants and sequencing errors, we analyzed whether we observe an enrichment for nullomers in cases versus controls. We found that the mean number of recurrent prostate nullomers detected in the patient samples is 22.3, while in the healthy controls we detect 10.7. The expected number of recurrent prostate nullomers due to germline variants is 10.3, suggesting that our test is highly sensitive. Moreover, the difference between cases and controls is consistent when using more stringent cutoffs for nullomers that could emerge due to germline variants. Taken together, these results demonstrate that our prostate nullomers classifier could serve as a sensitive and specific means of identifying cancer in cfDNA samples.
To experimentally validate that nullomers could be used as a cancer diagnosis tool, we generated a custom nullomer prostate cancer probe panel for cfDNA sequence target enrichment. This panel targets 4,280 regions (385/4,665 were removed due to technical reasons) harboring loci which could result in recurrent prostate cancer associated nullomers, along with 60 bp flanking regions. We extracted cfDNA samples from 6 healthy donors and 7 prostate cancer patients of various stages. Indexed libraries were hybridized to the custom oligo pool and sequencing of the enriched libraries was done in multiplex at high coverage (×10,000-×20,000). Overall, we observed a larger number of nullomers in cases as compared to controls, similarly to our results for the WGS cfDNA. Combined, our results show that nullomers could be used as a straightforward, sensitive and specific cfDNA diagnosis tool.
v. Nullomers Alter Promoter Activity
Only a small number of mutations in gene regulatory elements that affect gene expression have been found to be associated with cancer (Poulos et al., 2015). As the majority of our cancer-associated nullomers reside in noncoding sequence (99%), we tested whether nullomers could identify cancer-associated gene regulatory mutations that have a functional effect. Of note, our top recurring nullomer mutation was in the TERT promoter (TABLE 4), which is associated with numerous cancers (Vinagre et al., 2013). Focusing on prostate cancer, we selected five nullomers for luciferase reporter assays using the following criteria: i) nullomers that reside in a promoter based on ENCODE annotations (Consortium, Encode Project et al., 2012); and ii) the gene regulated by the promoter is associated with prostate cancer. Our list included nullomers in: 1) a promoter between two divergent genes, RPS2 and the lncRNA gene SNHG9, both of which are overexpressed in prostate cancer (Ohkia et al., 2004); 2) a promoter between two divergent genes, TMEM127 and CIAOI, with the former being downregulated in prostate cancer (Qin et al., 2014); 3) a promoter between two divergent genes, TT(23 and LRR(28, with the former showing aberrant splicing that relates to therapy resistance in prostate cancer cells (Bowler et al., 2018); 4) the promoter of GNA12 a protein that interacts with (′XCR5, which positively correlates with prostate cancer progression (El-Haibi et al., 2013); and 5) a promoter between two divergent genes, PRICKLE4 and FRS3, with the latter thought to affect malignant but not benign prostate cells (Valencia et al., 2011). We cloned the promoter sequence with and without the nullomer into a luciferase promoter assay vector and compared their activity in androgen-sensitive human prostate adenocarcinoma cells (LNCaP). For two out of the five assayed promoters, we observed a significant effect on reporter activity (
Cancer is a DNA mutation causing disease. Here, we show that by analysing cancer WGS datasets, we can find mutations that lead to the resurrection of nullomers. Further analyses of the recurrence of these nullomers shows that they can be used to classify not only cancer tissue origin but also additional cancer features, such as the type of breast or colorectal cancer. Analysis of cfDNA WGS datasets finds that nullomers could be used to tease out patients from control, which was further validated by testing a sequence enrichment panel on cfDNA extracted from prostate cancer patients and controls. Finally, using experimental assays, we show that nullomers have a functional effect on regulatory sequences.
Our analyses showed that in addition to tissue origin, nullomers can also be used to detect other cancer features in breast and colorectal cancer. This approach could likely be used to diagnose tumor features in other cancers types. It would also be interesting to test whether nullomers could detect additional cancer characteristics such as chance of recurrence, drug response, mortality and others. As nullomers do not exist in the human genome, they could also be great candidates for neoantigens. Previous work has shown that minimal absent words, short sequences that are absent from a genome or proteome, could be used to identify phosphorylation sites of high confidence, some of which could be associated with cancer (Koulouras and Frith, 2021). Nullomers were also shown to be effective in identifying unique peptides that are exceedingly distant from human peptides that potentially could be used as antibodies against Trypanosoma cruzi (Vergni, Gaudio, and Santoni, 2020) or SARS-COV-2 (Santoni and Vergni, 2020). Analysis of the Immune Epitope Database of validated antigens (Vita et al., 2019) found that 13 of the recurrent coding nullomers can create neoantigens with predicted strong binding levels that were subsequently validated. The expected number of neoantigens with strong binding levels is 1.72 (p-value<le-8, hypergeometric test), suggesting that missense mutations also resulting in nullomers are 7-fold more likely to also generate strongly binding neoantigens.
Nullomers could also be combined with other cancer biomarkers and risk factors to improve the diagnostic positive predictive value. For example, it was recently shown that combining a blood test that detects both protein biomarkers and DNA mutations along with positron emission tomography-computed tomography (PET-CT) could detect multiple cancers (Lennon et al., 2020). Adding specific cancer-associated coding mutations to nullomers in the screening of cfDNA could increase sensitivity and specificity. cfDNA methylation or ChIP-seq diagnostic assays could also improve this. Risk factors such as age, tobacco, alcohol, sun exposure, family history, radiation exposure, body mass index, physical activity and others could also enhance nullomer cancer diagnosis. In summary, adding nullomer-based diagnostics to existing cancer biomarkers and risk factors could improve the power to detect various cancer subtypes.
We used a sequence enrichment based assay to detect nullomers in cfDNA from blood taken from prostate cancer cases and controls. Alternate assays could potentially be used for future rapid diagnosis of cancer via nullomers. These could include the use of CRISPR-based detection tools that utilize Cas12 or Cas13 (Kellner et al., 2019). For example, recent use of Cas13 in a microwell array system allowed the rapid screening of over 4,500 targets for 169 human-associated viruses with high sensitivity and specificity (Ackerman et al., 2020). In addition, with nullomer-based diagnostics potentially not needing large amounts of cfDNA, cfDNA could be collected in a less invasive manner (blood draw), using for example urine or saliva, which were shown to be a viable but reduced source of cfDNA (Augustus et al., 2020; Ding et al., 2019).
Nullomers could be used as a novel tool to identify cancer-associated gene regulatory mutations. Amongst the 210 prostate cancer promoter nullomers, we selected five promoters and found that two of them significantly affected promoter activity due to the nullomer. Their difference in activity was in line with the gene's expression change in prostate cancer, with RPS2-SNHG9 having increased activity fitting with its overexpression in prostate cancer (Ohkia et al., 2004) and IMEM127-CIAOI abolishing activity, in line with its observed downregulation in prostate cancer (Qin et al., 2014). It is important to note that we “hand-selected” these promoters based on their prostate cancer association. Future high-throughput assays, such as massively parallel reporter assays (MPRAs; Inoue and Ahituv, 2015) that can test thousands of sequences and variants for their regulatory activity, could be used to test the effect of nullomers on gene regulation in an unbiased manner.
Our analyses used 2,577 patients with 21 different common cancer types to develop a cancer tissue of origin classifier. Additional genomes from tumor tissues, controls and cfDNA could improve this classifier even more. For rare cancer types, obtaining WGS datasets from tumor, matched control and cfDNA would be extremely helpful in allowing our nullomer classifier to detect these cancer types. It would also be interesting to assess how our nullomer classifier functions in cancers with a lower amount of mutations.
In summary, we show that nullomers can provide a powerful tool for cancer diagnosis. As they can easily be detected via sequence or CRISPR-based tools, it should be straightforward to integrate them in current routine cancer diagnostic tests and their use could increase the sensitivity and specificity of these tests. Combining nullomer-based screening with clinical characteristics and additional diagnostic tools/features could increase the positive predictive value of this diagnostic. In addition, as cfDNA could also be isolated from urine and saliva and detection of these sequences does not need a large amount of DNA, nullomer-based diagnosis could be carried out in a non-invasive manner. Our work also suggests that nullomers could be used to highlight cancer-associated gene regulatory mutations which have been difficult to identify. Further high-throughput characterization of these mutations could allow the detection of bona fide cancer-associated functional regulatory mutations that could be used for diagnosis and treatment.
The presence, absence or quantity of one or a plurality of the nullomers disclosed herein can be detected by CRISPR diagnosis. To this end, non-limiting exemplary sgRNA sequences that can be used to detect nullomers in cancer are provided in TABLE 6. As shown in TABLE 6, some of the nullomers are recurrent in several cancers (see NULLOMER INFO). Depending on the Cas protein used, different recognition pattern for sgRNAs is required. TABLE 6 exemplifies sgRNAs fitting either the Cas9 (saCas9) or the Cas12 (AsCpf1/LbCpf1 RR) protein.
The nullomers disclosed herein can distinguish other cancer features, for example, subtype of cancer. Non-limiting examples of nullomers that can distinguish BRCA and non-BRCA breast cancers are provided in TABLE 7 and TABLE 8.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US22/27536 | 5/3/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63183610 | May 2021 | US | |
63230584 | Aug 2021 | US |