The present disclosure relates to compositions and methods for cancer diagnosis, research and therapy, including but not limited to, cancer markers. In particular, the present disclosure relates to gene fusions as diagnostic markers and clinical targets for breast cancer.
Breast cancer is the second most common form of cancer among women in the U.S., and the second leading cause of cancer deaths among women. While the 1980s saw a sharp rise in the number of new cases of breast cancer, that number now appears to have stabilized. The drop in the death rate from breast cancer is probably due to the fact that more women are having mammograms. When detected early, the chances for successful treatment of breast cancer are much improved.
Breast cancer, which is highly treatable by surgery, radiation therapy, chemotherapy, and hormonal therapy, is most often curable when detected in early stages. Mammography is the most important screening modality for the early detection of breast cancer. Breast cancer is classified into a variety of sub-types, but only a few of these affect prognosis or selection of therapy. Patient management following initial suspicion of breast cancer generally includes confirmation of the diagnosis, evaluation of stage of disease, and selection of therapy. Diagnosis may be confirmed by aspiration cytology, core needle biopsy with a stereotactic or ultrasound technique for nonpalpable lesions, or incisional or excisional biopsy. At the time the tumor tissue is surgically removed, part of it is processed for determination of ER and PR levels.
Prognosis and selection of therapy are influenced by the age of the patient, stage of the disease, pathologic characteristics of the primary tumor including the presence of tumor necrosis, estrogen-receptor (ER) and progesterone-receptor (PR) levels in the tumor tissue, HER2 overexpression status and measures of proliferative capacity, as well as by menopausal status and general health. Overweight patients may have a poorer prognosis (Bastarrachea et al., Annals of Internal Medicine, 120: 18 [1994]). Prognosis may also vary by race, with blacks, and to a lesser extent Hispanics, having a poorer prognosis than whites (Elledge et al., Journal of the National Cancer Institute 86: 705 [1994]; Edwards et al., Journal of Clinical Oncology 16: 2693 [1998]).
The three major treatments for breast cancer are surgery, radiation, and drug therapy. No treatment fits every patient, and often two or more are required. The choice is determined by many factors, including the age of the patient and her menopausal status, the type of cancer (e.g., ductal vs. lobular), its stage, whether the tumor is hormone-receptive or not, and its level of invasiveness.
Breast cancer treatments are defined as local or systemic. Surgery and radiation are considered local therapies because they directly treat the tumor, breast, lymph nodes, or other specific regions. Drug treatment is called systemic therapy, because its effects are wide spread. Drug therapies include classic chemotherapy drugs, hormone blocking treatment (e.g., aromatase inhibitors, selective estrogen receptor modulators, and estrogen receptor downregulators), and monoclonal antibody treatment (e.g., against HER2). They may be used separately or, most often, in different combinations.
There is a need for additional diagnostic and treatment options, particularly treatments customized to a patient's tumor.
The present disclosure relates to compositions and methods for cancer diagnosis, research and therapy, including but not limited to, cancer markers. In particular, the present disclosure relates to gene fusions as diagnostic markers and clinical targets for breast cancer.
For example, in some embodiments, A kit for detecting gene fusions associated with cancer a subject, comprising at least a first gene fusion informative reagent for identification of a gene fusion comprising a 5′ member and a 3′ member, wherein the gene fusion is selected from, for example: a MAST gene fusion (e.g., ZNF700-MAST1, NFIX-MAST1, ARID1A-MAST2, TADA2A-MAST1, or GPBP1L1-MAST2), a NOTCH gene fusion (e.g., SEC16A-NOTCH1, SEC22B-NOTCH2, NOTCH1-GABRR2, NOTCH1-ch9:138722833, NOTCH1-SNHG7, NOTCH2-SEC22B, NOTCH2-ATP1A1, NOTCH2-FBXL20, NOTCH2-MACF1, NOTCH2-MAGI3, NOTCH2-TMEM150C, NOTCH3-VIM), a NOTCH deletion, a FGFR fusion (e.g., FGFR2-ATE1, FGFR2-AFF3FGFR1-ZNF791, FGFR1-WHSC1L1, FGFR2-CCDC6, FGFR2-CASP7, FGFR1-ERLIN2, FGFR1-GPR124, FGFR1-RHOT1, FGFR1-TACC1, FGFR2-NSMCE4A), an ETV6 fusion (e.g., YTHDF2-ETV6, CIT-ETV6, PEX5-ETV6, BCL2L14-ETV6, ETV6-CD70, ETV6-SYN1), GTF2I-ETV7, CTNNA1-JMJD1B or RB1CC1-JAK1. In some embodiments, the reagent is a probe that specifically hybridizes to the fusion junction of the gene fusion, a pair of primers that amplify a fusion junction of the gene fusion (e.g., a first primer that hybridizes to a 5′ member of the gene fusion and second primer that hybridizes to a 3′ member of the gene fusion), an antibody that binds to the fusion junction of a gene fusion polypeptide, a sequencing primer that binds to the gene fusion and generates an extension product that spans the fusion junction of the gene fusion, or a pair of probes wherein the first probe hybridizes to a 5′ member of the gene fusion and the second probe hybridizes to a 3′ member of the gene fusion gene. In some embodiments, the reagent is labeled. In some embodiments, the cancer is breast cancer.
In some embodiments, the present invention further provides a method for identifying cancer (e.g., breast cancer) in a patient comprising: a) contacting a biological sample from a subject with a nucleic acid or polypeptide detection assay comprising at least a first gene fusion informative reagent for identification of a gene fusion comprising a 5′ member and a 3′ member, wherein the gene fusion is selected from, for example: a MAST gene fusion (e.g., ZNF700-MAST1, NFIX-MAST1, ARID1A-MAST2, TADA2A-MAST1, or GPBP1L1-MAST2), a NOTCH gene fusion (e.g., SEC16A-NOTCH1, SEC22B-NOTCH2, NOTCH1-GABRR2, NOTCH1-ch9:138722833, NOTCH1-SNHG7, NOTCH2-SEC22B, NOTCH2-ATP1A1, NOTCH2-FBXL20, NOTCH2-MACF1, NOTCH2-MAGI3, NOTCH2-TMEM150C, NOTCH3-VIM), a NOTCH deletion, a FGFR fusion (e.g., FGFR2-ATE1, FGFR2-AFF3FGFR1-ZNF791, FGFR1-WHSC1L1, FGFR2-CCDC6, FGFR2-CASP7, FGFR1-ERLIN2, FGFR1-GPR124, FGFR1-RHOT1, FGFR1-TACC1, FGFR2-NSMCE4A), an ETV6 fusion (e.g., YTHDF2-ETV6, CIT-ETV6, PEX5-ETV6, BCL2L14-ETV6, ETV6-CD70, ETV6-SYN1), GTF2I-ETV7, CTNNA1-JMJD1B or RB1CC1-JAK1; and b) identifying cancer (e.g., breast cancer) in said subject when the gene fusion is present in the sample. In some embodiments, the sample is, for example, tissue, blood, plasma, serum, cells or tissues. In some embodiments, the method further comprises the step of determining a treatment course of action based on the presence or absence of the gene fusion in the sample. For example, in some embodiments, the treatment course of action comprises administration of an inhibitor that targets a member of the gene fusion when the gene fusion is present in the sample.
Additional embodiments of the present disclosure are provided in the description and examples below.
a-d shows FGFR gene fusions in breast cancer.
Unless defined otherwise, all terms of art, notations and other scientific terms or terminology used herein have the same meaning as is commonly understood by one of ordinary skill in the art to which this disclosure belongs. Many of the techniques and procedures described or referenced herein are well understood and commonly employed using conventional methodology by those skilled in the art. As appropriate, procedures involving the use of commercially available kits and reagents are generally carried out in accordance with manufacturer defined protocols and/or parameters unless otherwise noted. All patents, applications, published applications and other publications referred to herein are incorporated by reference in their entirety. If a definition set forth in this section is contrary to or otherwise inconsistent with a definition set forth in the patents, applications, published applications, and other publications that are herein incorporated by reference, the definition set forth in this section prevails over the definition that is incorporated herein by reference.
As used herein, “a” or “an” means “at least one” or “one or more.”
As used herein, the term “gene fusion” refers to a chimeric genomic DNA, a chimeric messenger RNA, a truncated protein or a chimeric protein resulting from the fusion of at least a portion of a first gene to at least a portion of a second gene. In some embodiments, gene fusions involve internal deletions of genomic DNA within a single gene (e.g., no second gene is involved in the fusion). The gene fusion need not include entire genes or exons of genes.
As used herein, the term “gene upregulated in cancer” refers to a gene that is expressed (e.g., mRNA or protein expression) at a higher level in cancer (e.g., breast cancer) relative to the level in other tissue. In this context, “other tissue” may refer to, for example, tissues from different organs in the same subject or to normal tissues of the same or different type. In some embodiments, genes upregulated in cancer are expressed at a level between at least 10% to 300% higher than the level of expression in other tissue. For example, genes upregulated in cancer are frequently expressed at a level preferably at least 25%, at least 50%, at least 100%, at least 200%, or at least 300% higher than the level of expression in other tissue.
As used herein, the term “gene upregulated in breast tissue” refers to a gene that is expressed (e.g., mRNA or protein expression) at a higher level in breast tissue relative to the level in other tissue. In some embodiments, genes upregulated in breast tissue are expressed at a level between at least 10% to 300%. For example, genes upregulated in cancer are frequently expressed at a level preferably at least 25%, at least 50%, at least 100%, at least 200%, or at least 300% higher than the level of expression in other tissues. In some embodiments, genes upregulated in breast tissue are exclusively expressed in breast tissue.
As used herein, the term “transcriptional regulatory region” refers to the region of a gene comprising sequences that modulate (e.g., upregulate or downregulate) expression of the gene. In some embodiments, the transcriptional regulatory region of a gene comprises a non-coding upstream sequence of a gene, also called the 5′ untranslated region (5′UTR). In other embodiments, the transcriptional regulatory region contains sequences located within the coding region of a gene or within an intron (e.g., enhancers).
As used herein, the terms “detect”, “detecting” or “detection” may describe either the general act of discovering or discerning or the specific observation of a detectably labeled composition.
As used herein, the term “stage of cancer” refers to a qualitative or quantitative assessment of the level of advancement of a cancer. Criteria used to determine the stage of a cancer include, but are not limited to, the size of the tumor and the extent of metastases (e.g., localized or distant).
As used herein, the term “nucleic acid molecule” refers to any nucleic acid containing molecule, including but not limited to, DNA or RNA. The term encompasses sequences that include any of the known base analogs of DNA and RNA including, but not limited to, 4-acetylcytosine, 8-hydroxy-N-6-methyladenosine, aziridinylcytosine, pseudoisocytosine, 5-(carboxyhydroxylmethyl)uracil, 5-fluorouracil, 5-bromouracil, 5-carboxymethylaminomethyl-2-thiouracil, 5-carboxymethylaminomethyluracil, dihydrouracil, inosine, N6-isopentenyladenine, 1-methyladenine, 1-methylpseudouracil, 1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 2-methyladenine, 2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-methyladenine, 7-methylguanine, 5-methylaminomethyluracil, 5-methoxy-aminomethyl-2-thiouracil, beta-D-mannosylqueosine, 5′-methoxycarbonylmethyluracil, 5-methoxyuracil, 2-methylthio-N-6-isopentenyladenine, uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid, oxybutoxosine, pseudouracil, queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil, N-uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid, pseudouracil, queosine, 2-thiocytosine, and 2,6-diaminopurine.
The term “gene” refers to a nucleic acid (e.g., DNA) sequence that comprises coding sequences necessary for the production of a polypeptide, precursor, or RNA (e.g., rRNA, tRNA). The polypeptide can be encoded by a full length coding sequence or by any portion of the coding sequence so long as the desired activity or functional properties (e.g., enzymatic activity, ligand binding, signal transduction, immunogenicity, etc.) of the full-length or fragment are retained. The term also encompasses the coding region of a structural gene and the sequences located adjacent to the coding region on both the 5′ and 3′ ends for a distance of about 1 kb or more on either end such that the gene corresponds to the length of the full-length mRNA. Sequences located 5′ of the coding region and present on the mRNA are referred to as 5′ non-translated sequences. Sequences located 3′ or downstream of the coding region and present on the mRNA are referred to as 3′ non-translated sequences. The term “gene” encompasses both cDNA and genomic forms of a gene. A genomic form or clone of a gene contains the coding region interrupted with non-coding sequences termed “introns” or “intervening regions” or “intervening sequences.” Introns are segments of a gene that are transcribed into nuclear RNA (hnRNA); introns may contain regulatory elements such as enhancers. Introns are removed or “spliced out” from the nuclear or primary transcript; introns therefore are absent in the messenger RNA (mRNA) transcript. The mRNA functions during translation to specify the sequence or order of amino acids in a nascent polypeptide.
As used herein, the term “oligonucleotide,” refers to a short length of single-stranded polynucleotide chain. Oligonucleotides are typically less than 200 residues long (e.g., between 15 and 100), however, as used herein, the term is also intended to encompass longer polynucleotide chains. Oligonucleotides are often referred to by their length. For example a 24 residue oligonucleotide is referred to as a “24-mer”. Oligonucleotides can form secondary and tertiary structures by self-hybridizing or by hybridizing to other polynucleotides. Such structures can include, but are not limited to, duplexes, hairpins, cruciforms, bends, and triplexes.
As used herein, the term “probe” refers to an oligonucleotide, whether occurring naturally as in a purified restriction digest or produced synthetically, recombinantly or by PCR amplification, which is capable of hybridizing to at least a portion of another oligonucleotide of interest. A probe may be single-stranded or double-stranded. Probes are useful in the detection, identification and isolation of particular gene sequences. It is contemplated that any probe used in methods of the present disclosure will be labeled with any “reporter molecule,” so that is detectable in any detection system, including, but not limited to enzyme (e.g., ELISA, as well as enzyme-based histochemical assays), fluorescent, radioactive, and luminescent systems. It is not intended that the methods or reagents of the present disclosure be limited to any particular detection system or label.
The term “isolated” when used in relation to a nucleic acid, as in “an isolated oligonucleotide” or “isolated polynucleotide” refers to a nucleic acid sequence that is identified and separated from at least one component or contaminant with which it is ordinarily associated in its natural source. An isolated nucleic acid is present in a form or setting that is different from that in which it is found in nature. In contrast, non-isolated nucleic acids are found in the state they exist in nature. For example, a given DNA sequence (e.g., a gene) is found on the host cell chromosome in proximity to neighboring genes; RNA sequences, such as a specific mRNA sequence encoding a specific protein, are found in the cell as a mixture with numerous other mRNAs that encode a multitude of proteins. However, isolated nucleic acid encoding a given protein includes, by way of example, such nucleic acid in cells ordinarily expressing the given protein where the nucleic acid is in a chromosomal location different from that of natural cells, or is otherwise flanked by a different nucleic acid sequence than that found in nature. The isolated nucleic acid, oligonucleotide, or polynucleotide may be present in single-stranded or double-stranded form. When an isolated nucleic acid, oligonucleotide or polynucleotide is to be utilized to express a protein, the nucleic acid, oligonucleotide or polynucleotide often will contain, at a minimum, the sense or coding strand (i.e., the oligonucleotide or polynucleotide may be single-stranded), but may contain both the sense and anti-sense strands (i.e., the oligonucleotide or polynucleotide may be double-stranded).
As used herein, the term “purified” or “to purify” refers to the removal of components (e.g., contaminants) from a sample. For example, antibodies are purified by removal of contaminating non-immunoglobulin proteins; they are also purified by the removal of immunoglobulin that does not bind to the target molecule. The removal of non-immunoglobulin proteins and/or the removal of immunoglobulins that do not bind to the target molecule results in an increase in the percent of target-reactive immunoglobulins in the sample. In another example, recombinant polypeptides are expressed in bacterial host cells and the polypeptides are purified by the removal of host cell proteins; the percent of recombinant polypeptides is thereby increased in the sample.
As used herein, the term “sample” is used in its broadest sense. In one sense, it is meant to include a specimen or culture obtained from any source, as well as biological and environmental samples. Biological samples may be obtained from animals (including humans) and encompass fluids, solids, tissues, and gases. Biological samples include blood products, such as plasma, serum and the like. Such examples are not however to be construed as limiting the sample types applicable to the present invention.
Provided herein are compositions and methods for cancer diagnosis, research and therapy, including but not limited to, cancer markers. In particular, the present disclosure relates to gene fusions as diagnostic markers and clinical targets for breast cancer.
Recurrent gene fusions and translocations have long been associated with hematologic malignancies and rare soft tissue tumors as driving genetic lesions (Delattre, O. et al. Nature 359, 162-5 (1992); Nowell et al., J Natl Cancer Inst 25, 85-109 (1960); Rowley, J. D. Annu Rev Genet 32, 495-519 (1998)). Over the last few years, it is becoming apparent that these genetic rearrangements are also found in common solid tumors including a large subset of prostate cancers (Kumar-Sinha et al., Nat Rev Cancer 8, 497-511 (2008); Tomlins, S. A. et al. Science 310, 644-8 (2005)) and smaller subsets of lung cancer, among others (Prensner, J. R. & Chinnaiyan, Curr Opin Genet Dev 19, 82-91 (2009)). A number of these gene fusions are targetable including BCR-ABL in chronic myelogenous leukemia (Druker, B. J. Translation of the Philadelphia chromosome into therapy for CML. Blood 112, 4808-17 (2008)), ALK gene fusions in non-small cell lung cancer (Perner, S. et al. Neoplasia 10, 298-302 (2008); Soda, M. et al. Nature 448, 561-6 (2007)) RET in papillary thyroid cancer (Grieco, M. et al. Cell 60, 557-63 (1990)), and RAF family fusions in prostate cancer and other solid tumors (Palanisamy, N. et al. Nat Med 16, 793-8 (2010)).
Breast cancer is a heterogeneous disease with several morphologic and molecular subtypes. Experiments conducted during the course of development of embodiments of the present invention identified gene fusions in breast cancer cell lines and tissues. Individual samples often harbored multiple rearrangements, with amplicons being a hot-spot for gene fusion events. Two novel classes of recurrent gene rearrangement in breast cancer involving microtubule associated serine threonine (MAST) kinases and Notch family genes were identified.
Discovery of the genetic aberrations contributing to the development of breast cancer has increased greatly in the past decades, beginning with the discovery of amplification of the HER2 locus in a subset of cases (Slamon, D. J. et al. Science 235, 177-82 (1987)). Breast cancer can be classified into subtypes as estrogen/progesterone receptor positive, HER2 amplification positive, or triple negative, based on expression of these three genes. Triple negative breast carcinoma in particular, lacks detailed molecular characterization (Foulkes et al., N Engl J Med 363, 1938-48 (2010); Sotiriou et al., N Engl J Med 360, 790-800 (2009)). Experiments conducted during the development of embodiments of the present invention identified functional gene fusions involving NOTCH1 and NOTCH2 in estrogen receptor (ER) negative breast carcinomas (Table 1).
The gene fusions in breast cancer involving MAST kinases and the Notch family of transcription factors represent novel classes of functionally recurrent gene fusions with therapeutic implications. MAST kinase and Notch gene rearrangements are mutually exclusive aberrations, and together, may represent up to 8-10% of breast cancers with a particular enrichment in ER negative disease. MAST1 expression has been associated with resistance to the anti-cancer drug 5-fluorouracil (5-FU) (De Angelis et al., Mol Cancer 5, 20 (2006)). In a recent study of genetic variation in mitotic kinases associated with breast cancer risk, identified common haplotypes of MAST2 to be significantly associated with breast cancer risk (P=0.04) (Wang, X. et al. Breast Cancer Res Treat 119, 453-62 (2009)). Functionally, MAST2 has been linked with the dystrophin/utrophin network of microtubule filaments via the syntrophins. MAST2 has also been shown to act as a scaffolding protein for TRAF6, regulating its activity, including inhibition of NF-κB, regulating cellular inflammatory responses (Xiong et al., J Biol Chem 279, 43675-83 (2004)). The tumor suppressor phosphatase PTEN has been shown to interact with the PDZ domain of MAST2 and related serine/threonine kinases (Valiente, M. et al. J Biol Chem 280, 28936-43 (2005)), indicating regulatory networks impacted by MAST genes.
The involvement of aberrant Notch gene function in human cancer was first reported as rare gene fusions in T-cell acute lymphoblastic leukemia (T-ALL) (Ellisen, L. W. et al. Cell 66, 649-61 (1991)). Later studies revealed activating point mutations in NOTCH1 in a majority of T-ALL cases (Grabher et al., Nat Rev Cancer 6, 347-59 (2006)), however mutations of this type have not been found in breast carcinoma.
The target genes of the Notch pathway depend critically on the context of Notch activation (Radtke, F. & Raj, K. Nat Rev Cancer 3, 756-67 (2003)). It has been shown that the phenotypic effects of Notch in mammary epithelial cells vary with dose (Mazzone, M. et al. Proc Natl Acad Sci USA 107, 5012-7 (2010)). Different arrangements of Notch responsive elements in promoters also modulate the effects of Notch activation in a dose dependent manner. The breast carcinoma cell lines investigated herein exhibit dependence on the resulting effects of NOTCH1 activation.
GSIs and other Notch inhibitors, as well as MAST-kinase specific inhibitors or the currently available serine/threonine kinase inhibitors find use in breast cancer therapy (e.g., against cancers expressing the fusions).
The present disclosure identifies recurrent gene fusions indicative of cancer (e.g., breast cancer). In some embodiments, the gene fusions are the result of a chromosomal rearrangement of a first and second gene resulting in a gene fusion. Example gene fusions include, but are not limited to, a MAST gene fusion (e.g., zinc finger protein 700 (ZNF700)-microtubule associated serine/threonine kinase 1 (MAST1), nuclear factor I/X (NFIX)-MAST1, AT rich interactive domain 1A (ARID1A)-microtubule associated serine/threonine kinase 2 (MAST2), transcriptional adaptor 2A (TADA2A)-MAST1, GC-rich promoter binding protein 1-like 1 (GPBP1L1)-MAST2), a NOTCH gene fusion (e.g., SEC16 homolog A (SEC16A)-NOTCH1, SEC22 vesicle trafficking protein homolog B (SEC22B)-NOTCH2, NOTCH1-gamma-aminobutyric acid (GABA) A receptor, rho 2 (GABRR2), NOTCH1-ch9:138722833, NOTCH1-small nucleolar RNA host gene 7 (SNHG7), NOTCH2-SEC22B, NOTCH2-ATPase, Na+/K+ transporting, alpha 1 polypeptide (ATP1A1), NOTCH2-F-box and leucine-rich repeat protein 20 (FBXL20), NOTCH2-microtubule-actin crosslinking factor 1 (MACF1), NOTCH2-membrane associated guanylate kinase, WW and PDZ domain containing 3 (MAGI3), NOTCH2-transmembrane protein 150C (TMEM150C), NOTCH3-vimentin (VIM)), a NOTCH deletion, a FGFR fusion (e.g., fibroblast growth factor receptor 2 (FGFR2)-arginyltransferase 1 (ATE1), FGFR2-AF4/FMR2 family, member 3 (AFF3), FGFR1-zinc finger protein 791 (ZNF791), FGFR1-Wolf-Hirschhorn syndrome candidate 1-like 1 (WHSC1L1), FGFR2-coiled-coil domain containing 6 (CCDC6), FGFR2-caspase 7, apoptosis-related cysteine peptidase (CASP7), FGFR1-ER lipid raft associated 2 (ERLIN2), FGFR1-G protein-coupled receptor 124 (GPR124), FGFR1-ras homolog gene family, member T1 (RHOT1), FGFR1-transforming, acidic coiled-coil containing protein 1 (TACC1), FGFR2-non-SMC element 4 homolog A (NSMCE4A)), an ETV6 fusion (e.g., YTH domain family, member 2 (YTHDF2)-ets variant 6 (ETV6), citron (rho-interacting, serine/threonine kinase 21) (CIT)-ETV6, peroxisomal biogenesis factor 5 (PEX5)-ETV6, BCL2-like 14 (apoptosis facilitator) (BCL2L14)-ETV6, ETV6-CD70, ETV6-synapsin I (SYN1)), general transcription factor IIi (GTF2I)-ETV7, catenin (cadherin-associated protein), alpha 1, 102 kDa (CTNNA1)-jumonji domain containing 1B (JMJD1B) or
RB1-inducible coiled-coil 1 (RB1CC1)-Janus kinase 1 (JAK1).
In some embodiments, the 5′ fusion partner is a transcriptional region of a gene (e.g., ZNF700, NFIX, ARIDIA, TADA2A, GPB1L1, SEC16A, a NOTCH kinase and SEC22B).
In some embodiments, the 3′ fusion partner is a kinase (e.g., a MAST or NOTCH family kinase). In some embodiments, the fusion comprises funcational kinase domain(s) of the kinase. In some embodiments, the 3′ fusion partner is, for example, GABBR2, chr9: 138722833, SNHG7 or SEC22B. In some embodiments, gene fusions result in overexpression of the NOTCH or MAST kinase, for example, by the association of a non-native promoter, driving aberrant expression of NOTCH or MAST.
In some embodiments, fusions comprise internal NOTCH fusions (e.g., due to a deletion of NOTCH genomic DNA without a fusion partner).
MAST kinase family genes (MAST1-4, and MAST-like) are characterized by the presence of a serine/threonine kinase domain and a PDZ domain, involved in protein scaffolding and interaction with other proteins (Garland et al., Brain Res 1195, 12-9 (2008)). MAST1 and MAST2 are widely expressed in diverse tissues including brain, heart, liver, lung, kidney, and testis, while MAST3 and MAST4 show more restricted expression in several tissues and MAST-like is predominantly expressed in heart and testis (Garland et al., supra).
The Notch family of signaling molecules is widely conserved in metazoans and is composed of four members in the human genome. Notch signaling between adjoining cells affects diverse functions including differentiation, proliferation, and self-renewal (Bolos et al., Endocr Rev 28, 339-63 (2007)). The pleiotropic effects of Notch pathway activity are particularly context and dosage dependent (Mazzone, M. et al. Proc Natl Acad Sci USA 107, 5012-7 (2010); Radtke et al., Nat Rev Cancer 3, 756-67 (2003)). The canonical Notch pathway is illustrated in
The gene fusion proteins of the present disclosure, including fragments, derivatives and analogs thereof, may be used as immunogens to produce antibodies having use in the diagnostic, screening, research, and therapeutic methods described below. The antibodies may be polyclonal or monoclonal, chimeric, humanized, single chain, Fv or Fab fragments. Various procedures known to those of ordinary skill in the art may be used for the production and labeling of such antibodies and fragments. See, e.g., Burns, ed., Immunochemical Protocols, 3rd ed., Humana Press (2005); Harlow and Lane, Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory (1988); Kozbor et al., Immunology Today 4: 72 (1983); Köhler and Milstein, Nature 256: 495 (1975). Antibodies or fragments exploiting the differences between the truncated or chimeric protein resulting from a gene fusion and their respective native proteins are particularly preferred (e.g., the antibody preferentially binds to the protein expressed by the gene fusion relative to its binding to the protein generated by the non-fusion gene(s)).
The gene fusions described herein may be detectable as DNA, RNA or protein. Initially, the gene fusion is detectable as a chromosomal rearrangement of genomic DNA having a 5′ portion from a first gene and a 3′ portion from a second. Once transcribed, the gene fusion may be detectable as a chimeric mRNA having a 5′ portion from a first gene and a 3′ portion from a second gene or a chimeric mRNA with a deletion of mRNA. Once translated, the gene fusion may be detectable as fusion of a 5′ portion from a first protein and a 3′ portion from a second protein or a truncated version of a first or second protein. The truncated or fusion proteins may differ from their respective native proteins in amino acid sequence, post-translational processing and/or secondary, tertiary or quaternary structure. Such differences, if present, can be used to identify the presence of the gene fusion. Specific methods of detection are described in more detail below.
The present disclosure provides DNA, RNA and protein based diagnostic, prognostic and screening methods that either directly or indirectly detect the gene fusions. The present disclosure also provides compositions and kits for diagnostic and screening purposes.
The diagnostic and screening methods of the present disclosure may be qualitative or quantitative. Quantitative methods may be used, for example, to discriminate between indolent and aggressive cancers via a cutoff or threshold level. Where applicable, qualitative or quantitative methods of embodiments of the disclosure include amplification of a target, a signal or an intermediary (e.g., a universal primer).
An initial assay may confirm the presence of a gene fusion but not identify the specific fusion. A secondary assay may then be performed to determine the identity of the particular fusion, if desired. The second assay may use a different detection technology than the initial assay.
The gene fusions may be detected along with other markers in a multiplex or panel format. Markers are selected for their predictive value alone or in combination with the gene fusions. Exemplary breast cancer markers include, but are not limited to those described in U.S. Pat. No. 5,622,829, U.S. Pat. No. 5,720,937, U.S. Pat. No. 6,294,349, each of which is herein incorporated by reference in its entirety. Markers for other cancers, diseases, infections, and metabolic conditions are also contemplated for inclusion in a multiplex or panel format.
The diagnostic methods of the present disclosure may also be modified with reference to data correlating particular gene fusions with the stage, aggressiveness or progression of the disease or the presence or risk of metastasis. Ultimately, the information provided will assist a physician in choosing the best course of treatment for a particular patient.
A. Sample
Any sample suspected of containing the gene fusions may be tested according to the methods of the present disclosure. By way of non-limiting example, the sample may be tissue (e.g., a breast biopsy sample or a tissue sample obtained by mastectomy), blood, cell secretions or a fraction thereof (e.g., plasma, serum, exosomes, etc.).
The patient sample typically involves preliminary processing designed to isolate or enrich the sample for the gene fusion(s) or cells that contain the gene fusion(s). A variety of techniques known to those of ordinary skill in the art may be used for this purpose, including but not limited to: centrifugation; immunocapture; cell lysis; and, nucleic acid target capture (See, e.g., EP Pat. No. 1 409 727, herein incorporated by reference in its entirety).
B. DNA and RNA Detection
The gene fusions of the present disclosure may be detected as chromosomal rearrangements of genomic DNA or chimeric mRNA using a variety of nucleic acid techniques known to those of ordinary skill in the art, including but not limited to: nucleic acid sequencing; nucleic acid hybridization; and, nucleic acid amplification.
1. Sequencing
Illustrative non-limiting examples of nucleic acid sequencing techniques include, but are not limited to, chain terminator (Sanger) sequencing and dye terminator sequencing, or high throughput sequencing methods. The present disclosure is not intended to be limited to any particular methods of sequencing. Those of ordinary skill in the art will recognize that because RNA is less stable in the cell and more prone to nuclease attack experimentally RNA is usually reverse transcribed to DNA before sequencing.
Chain terminator sequencing uses sequence-specific termination of a DNA synthesis reaction using modified nucleotide substrates. Extension is initiated at a specific site on the template DNA by using a short radioactive, or other labeled, oligonucleotide primer complementary to the template at that region. The oligonucleotide primer is extended using a DNA polymerase, standard four deoxynucleotide bases, and a low concentration of one chain terminating nucleotide, most commonly a di-deoxynucleotide. This reaction is repeated in four separate tubes with each of the bases taking turns as the di-deoxynucleotide. Limited incorporation of the chain terminating nucleotide by the DNA polymerase results in a series of related DNA fragments that are terminated only at positions where that particular di-deoxynucleotide is used. For each reaction tube, the fragments are size-separated by electrophoresis in a slab polyacrylamide gel or a capillary tube filled with a viscous polymer. The sequence is determined by reading which lane produces a visualized mark from the labeled primer as you scan from the top of the gel to the bottom.
Dye terminator sequencing alternatively labels the terminators. Complete sequencing can be performed in a single reaction by labeling each of the di-deoxynucleotide chain-terminators with a separate fluorescent dye, which fluoresces at a different wavelength.
A variety of nucleic acid sequencing methods are contemplated for use in the methods of the present disclosure including, for example, chain terminator (Sanger) sequencing, dye terminator sequencing, and high-throughput sequencing methods. Many of these sequencing methods are well known in the art. See, e.g., Sanger et al., Proc. Natl. Acad. Sci. USA 74:5463-5467 (1997); Maxam et al., Proc. Natl. Acad. Sci. USA 74:560-564 (1977); Drmanac, et al., Nat. Biotechnol. 16:54-58 (1998); Kato, Int. J. Clin. Exp. Med. 2:193-202 (2009); Ronaghi et al., Anal. Biochem. 242:84-89 (1996); Margulies et al., Nature 437:376-380 (2005); Ruparel et al., Proc. Natl. Acad. Sci. USA 102:5932-5937 (2005), and Harris et al., Science 320:106-109 (2008); Levene et al., Science 299:682-686 (2003); Korlach et al., Proc. Natl. Acad. Sci. USA 105:1176-1181 (2008); Branton et al., Nat. Biotechnol. 26(10):1146-53 (2008); Eid et al., Science 323:133-138 (2009); each of which is herein incorporated by reference in its entirety.
2. Hybridization
Illustrative non-limiting examples of nucleic acid hybridization techniques include, but are not limited to, in situ hybridization (ISH), microarray, and Southern or Northern blot.
In situ hybridization (ISH) is a type of hybridization that uses a labeled complementary DNA or RNA strand as a probe to localize a specific DNA or RNA sequence in a portion or section of tissue (in situ), or, if the tissue is small enough, the entire tissue (whole mount ISH). DNA ISH can be used to determine the structure of chromosomes. RNA ISH is used to measure and localize mRNAs and other transcripts within tissue sections or whole mounts. Sample cells and tissues are usually treated to fix the target transcripts in place and to increase access of the probe. The probe hybridizes to the target sequence at elevated temperature, and then the excess probe is washed away. The probe that was labeled with radio-, fluorescent- or antigen-labeled bases is localized and quantitated in the tissue using autoradiography, fluorescence microscopy or immunohistochemistry. ISH can also use two or more probes, labeled with radioactivity or the other non-radioactive labels, to simultaneously detect two or more transcripts.
a. FISH
In some embodiments, fusion sequences are detected using fluorescence in situ hybridization (FISH). The preferred FISH assays for methods of embodiments of the present disclosure utilize bacterial artificial chromosomes (BACs). These have been used extensively in the human genome sequencing project (see Nature 409: 953-958 (2001)) and clones containing specific BACs are available through distributors that can be located through many sources, e.g., NCBI. Each BAC clone from the human genome has been given a reference name that unambiguously identifies it. These names can be used to find a corresponding GenBank sequence and to order copies of the clone from a distributor.
b. Microarrays
Different kinds of biological assays are called microarrays including, but not limited to: DNA microarrays (e.g., cDNA microarrays and oligonucleotide microarrays); protein microarrays; tissue microarrays; transfection or cell microarrays; chemical compound microarrays; and, antibody microarrays. A DNA microarray, commonly known as gene chip, DNA chip, or biochip, is a collection of microscopic DNA spots attached to a solid surface (e.g., glass, plastic or silicon chip) forming an array for the purpose of expression profiling or monitoring expression levels for thousands of genes simultaneously. The affixed DNA segments are known as probes, thousands of which can be used in a single DNA microarray. Microarrays can be used to identify disease genes by comparing gene expression in disease and normal cells. Microarrays can be fabricated using a variety of technologies, including but not limited to: printing with fine-pointed pins onto glass slides; photolithography using pre-made masks; photolithography using dynamic micromirror devices; ink-jet printing; or, electrochemistry on microelectrode arrays.
Southern and Northern blotting may be used to detect specific DNA or RNA sequences, respectively. In these techniques DNA or RNA is extracted from a sample, fragmented, electrophoretically separated on a matrix gel, and transferred to a membrane filter. The filter bound DNA or RNA is subject to hybridization with a labeled probe complementary to the sequence of interest. Hybridized probe bound to the filter is detected. A variant of the procedure is the reverse Northern blot, in which the substrate nucleic acid that is affixed to the membrane is a collection of isolated DNA fragments and the probe is RNA extracted from a tissue and labeled.
3. Amplification
Chromosomal rearrangements of genomic DNA and chimeric mRNA may be amplified prior to or simultaneous with detection. Illustrative non-limiting examples of nucleic acid amplification techniques include, but are not limited to, polymerase chain reaction (PCR), reverse transcription polymerase chain reaction (RT-PCR), transcription-mediated amplification (TMA), ligase chain reaction (LCR), strand displacement amplification (SDA), and nucleic acid sequence based amplification (NASBA). Those of ordinary skill in the art will recognize that certain amplification techniques (e.g., PCR) require that RNA be reversed transcribed to DNA prior to amplification (e.g., RT-PCR), whereas other amplification techniques directly amplify RNA (e.g., TMA and NASBA).
The polymerase chain reaction (U.S. Pat. Nos. 4,683,195, 4,683,202, 4,800,159 and 4,965,188, each of which is herein incorporated by reference in its entirety), commonly referred to as PCR, uses multiple cycles of denaturation, annealing of primer pairs to opposite strands, and primer extension to exponentially increase copy numbers of a target nucleic acid sequence. In a variation called RT-PCR, reverse transcriptase (RT) is used to make a complementary DNA (cDNA) from mRNA, and the cDNA is then amplified by PCR to produce multiple copies of DNA. For other various permutations of PCR see, e.g., U.S. Pat. Nos. 4,683,195, 4,683,202 and 4,800,159; Mullis et al., Meth. Enzymol. 155: 335 (1987); and, Murakawa et al., DNA 7: 287 (1988), each of which is herein incorporated by reference in its entirety.
Transcription mediated amplification (U.S. Pat. Nos. 5,480,784 and 5,399,491, each of which is herein incorporated by reference in its entirety), commonly referred to as TMA, synthesizes multiple copies of a target nucleic acid sequence autocatalytically under conditions of substantially constant temperature, ionic strength, and pH in which multiple RNA copies of the target sequence autocatalytically generate additional copies. See, e.g., U.S. Pat. Nos. 5,399,491 and 5,824,518, each of which is herein incorporated by reference in its entirety. In a variation described in U.S. Pat. No. 7,374,885 (herein incorporated by reference in its entirety), TMA optionally incorporates the use of blocking moieties, terminating moieties, and other modifying moieties to improve TMA process sensitivity and accuracy.
The ligase chain reaction (Weiss, R., Science 254: 1292 (1991), herein incorporated by reference in its entirety), commonly referred to as LCR, uses two sets of complementary DNA oligonucleotides that hybridize to adjacent regions of the target nucleic acid. The DNA oligonucleotides are covalently linked by a DNA ligase in repeated cycles of thermal denaturation, hybridization and ligation to produce a detectable double-stranded ligated oligonucleotide product.
Strand displacement amplification (Walker, G. et al., Proc. Natl. Acad. Sci. USA 89: 392-396 (1992); U.S. Pat. Nos. 5,270,184 and 5,455,166, each of which is herein incorporated by reference in its entirety), commonly referred to as SDA, uses cycles of annealing pairs of primer sequences to opposite strands of a target sequence, primer extension in the presence of a dNTPaS to produce a duplex hemiphosphorothioated primer extension product, endonuclease-mediated nicking of a hemimodified restriction endonuclease recognition site, and polymerase-mediated primer extension from the 3′ end of the nick to displace an existing strand and produce a strand for the next round of primer annealing, nicking and strand displacement, resulting in geometric amplification of product. Thermophilic SDA (tSDA) uses thermophilic endonucleases and polymerases at higher temperatures in essentially the same method (EP Pat. No. 0 684 315).
Other amplification methods include, for example: nucleic acid sequence based amplification (U.S. Pat. No. 5,130,238, herein incorporated by reference in its entirety), commonly referred to as NASBA; one that uses an RNA replicase to amplify the probe molecule itself (Lizardi et al., BioTechnol. 6: 1197 (1988), herein incorporated by reference in its entirety), commonly referred to as Qβ replicase; a transcription based amplification method (Kwoh et al., Proc. Natl. Acad. Sci. USA 86:1173 (1989)); and, self-sustained sequence replication (Guatelli et al., Proc. Natl. Acad. Sci. USA 87: 1874 (1990), each of which is herein incorporated by reference in its entirety). For further discussion of known amplification methods see Persing, David H., “In Vitro Nucleic Acid Amplification Techniques” in Diagnostic Medical Microbiology: Principles and Applications (Persing et al., Eds.), pp. 51-87 (American Society for Microbiology, Washington, D.C. (1993)).
4. Detection Methods
Non-amplified or amplified gene fusion nucleic acids can be detected by any conventional means. For example, the gene fusions can be detected by hybridization with a detectably labeled probe and measurement of the resulting hybrids. Illustrative non-limiting examples of detection methods are described below.
One illustrative detection method, the Hybridization Protection Assay (HPA) involves hybridizing a chemiluminescent oligonucleotide probe (e.g., an acridinium ester-labeled (AE) probe) to the target sequence, selectively hydrolyzing the chemiluminescent label present on unhybridized probe, and measuring the chemiluminescence produced from the remaining probe in a luminometer. See, e.g., U.S. Pat. No. 5,283,174; Nelson et al., Nonisotopic Probing, Blotting, and Sequencing, ch. 17 (Larry J. Kricka ed., 2d ed. 1995, each of which is herein incorporated by reference in its entirety).
Another illustrative detection method provides for quantitative evaluation of the amplification process in real-time. Evaluation of an amplification process in “real-time” involves determining the amount of amplicon in the reaction mixture either continuously or periodically during the amplification reaction, and using the determined values to calculate the amount of target sequence initially present in the sample. A variety of methods for determining the amount of initial target sequence present in a sample based on real-time amplification are well known in the art. These include methods disclosed in U.S. Pat. Nos. 6,303,305 and 6,541,205, each of which is herein incorporated by reference in its entirety. Another method for determining the quantity of target sequence initially present in a sample, but which is not based on a real-time amplification, is disclosed in U.S. Pat. No. 5,710,029, herein incorporated by reference in its entirety.
Amplification products may be detected in real-time through the use of various self-hybridizing probes, most of which have a stem-loop structure. Such self-hybridizing probes are labeled so that they emit differently detectable signals, depending on whether the probes are in a self-hybridized state or an altered state through hybridization to a target sequence. By way of non-limiting example, “molecular torches” are a type of self-hybridizing probe that includes distinct regions of self-complementarity (referred to as “the target binding domain” and “the target closing domain”) which are connected by a joining region (e.g., non-nucleotide linker) and which hybridize to each other under predetermined hybridization assay conditions. In a preferred embodiment, molecular torches contain single-stranded base regions in the target binding domain that are from 1 to about 20 bases in length and are accessible for hybridization to a target sequence present in an amplification reaction under strand displacement conditions. Under strand displacement conditions, hybridization of the two complementary regions, which may be fully or partially complementary, of the molecular torch is favored, except in the presence of the target sequence, which will bind to the single-stranded region present in the target binding domain and displace all or a portion of the target closing domain. The target binding domain and the target closing domain of a molecular torch include a detectable label or a pair of interacting labels (e.g., luminescent/quencher) positioned so that a different signal is produced when the molecular torch is self-hybridized than when the molecular torch is hybridized to the target sequence, thereby permitting detection of probe:target duplexes in a test sample in the presence of unhybridized molecular torches. Molecular torches and a variety of types of interacting label pairs, including fluorescence resonance energy transfer (FRET) labels, are disclosed in, for example U.S. Pat. Nos. 6,534,274 and 5,776,782, each of which is herein incorporated by reference in its entirety.
The interaction between two molecules can also be detected, e.g., using fluorescence energy transfer (FRET) (see, for example, Lakowicz et al., U.S. Pat. No. 5,631,169; Stavrianopoulos et al., U.S. Pat. No. 4,968,103; each of which is herein incorporated by reference). A fluorophore label is selected such that a first donor molecule's emitted fluorescent energy will be absorbed by a fluorescent label on a second, ‘acceptor’ molecule, which in turn is able to fluoresce due to the absorbed energy.
Alternately, the ‘donor’ protein molecule may simply utilize the natural fluorescent energy of tryptophan residues. Labels are chosen that emit different wavelengths of light, such that the ‘acceptor’ molecule label may be differentiated from that of the ‘donor’. Since the efficiency of energy transfer between the labels is related to the distance separating the molecules, the spatial relationship between the molecules can be assessed. In a situation in which binding occurs between the molecules, the fluorescent emission of the ‘acceptor’ molecule label should be maximal. A FRET binding event can be conveniently measured through standard fluorometric detection means well known in the art (e.g., using a fluorimeter).
Another example of a detection probe having self-complementarity is a “molecular beacon.” Molecular beacons include nucleic acid molecules having a target complementary sequence, an affinity pair (or nucleic acid arms) holding the probe in a closed conformation in the absence of a target sequence present in an amplification reaction, and a label pair that interacts when the probe is in a closed conformation. Hybridization of the target sequence and the target complementary sequence separates the members of the affinity pair, thereby shifting the probe to an open conformation. The shift to the open conformation is detectable due to reduced interaction of the label pair, which may be, for example, a fluorophore and a quencher (e.g., DABCYL and EDANS). Molecular beacons are disclosed, for example, in U.S. Pat. Nos. 5,925,517 and 6,150,097, herein incorporated by reference in its entirety.
Other self-hybridizing probes are well known to those of ordinary skill in the art. By way of non-limiting example, probe binding pairs having interacting labels, such as those disclosed in U.S. Pat. No. 5,928,862 (herein incorporated by reference in its entirety) might be adapted for use in method of embodiments of the present disclosure. Probe systems used to detect single nucleotide polymorphisms (SNPs) might also be utilized in the present invention. Additional detection systems include “molecular switches,” as disclosed in U.S. Publ. No. 20050042638, herein incorporated by reference in its entirety. Other probes, such as those comprising intercalating dyes and/or fluorochromes, are also useful for detection of amplification products methods of embodiments of the present disclosure. See, e.g., U.S. Pat. No. 5,814,447 (herein incorporated by reference in its entirety).
C. Protein Detection
The gene fusions of the present disclosure may be detected as truncated or chimeric proteins using a variety of protein techniques known to those of ordinary skill in the art, including but not limited to: protein sequencing and immunoassays.
1. Sequencing
Illustrative non-limiting examples of protein sequencing techniques include, but are not limited to, mass spectrometry and Edman degradation.
Mass spectrometry can, in principle, sequence any size protein. A protein is digested by an endoprotease, and the resulting solution is passed through a high pressure liquid chromatography column. At the end of this column, the solution is sprayed out of a narrow nozzle charged to a high positive potential into the mass spectrometer. The charge on the droplets causes them to fragment until only single ions remain. The peptides are then fragmented and the mass-charge ratios of the fragments measured. The mass spectrum is analyzed by computer and often compared against a database of previously sequenced proteins in order to determine the sequences of the fragments. The process is then repeated with a different digestion enzyme, and the overlaps in sequences are used to construct a sequence for the protein.
In the Edman degradation reaction (see, e.g., Edman, Acta Chem. Scand. 4:283-93 (1950)), the peptide to be sequenced is adsorbed onto a solid surface (e.g., a glass fiber coated with polybrene). Though there are various well known modifications to this procedure (including automated modifications), one exemplary method involves the use of the Edman reagent, phenylisothiocyanate (PITC), which is added, together with a mildly basic buffer solution of 12% trimethylamine, to an adsorbed peptide, and which reacts with the amine group of the N-terminal amino acid of the adsorbed peptide. The terminal amino acid derivative can then be selectively detached by the addition of anhydrous acid. The derivative isomerizes to give a substituted phenylthiohydantoin, which can be washed off and identified by chromatography, and the cycle can be repeated. The efficiency of each step is about or over 98%, which allows about 50 amino acids to be reliably determined.
2. Immunoassays
Illustrative non-limiting examples of immunoassays include, but are not limited to: immunoprecipitation; Western blot; ELISA; immunohistochemistry; immunocytochemistry; immunochromatography; flow cytometry; and, immuno-PCR. Polyclonal or monoclonal antibodies detectably labeled using various techniques known to those of ordinary skill in the art (e.g., colorimetric, fluorescent, chemiluminescent or radioactive labels) are suitable for use in the immunoassays.
Immunoprecipitation is the technique of precipitating an antigen out of solution using an antibody specific to that antigen. The process can be used to identify proteins or protein complexes present in cell extracts by targeting a specific protein or a protein believed to be in the complex. The complexes are brought out of solution by insoluble antibody-binding proteins isolated initially from bacteria, such as Protein A and Protein G. The antibodies can also be coupled to sepharose beads that can easily be isolated out of solution. After washing, the precipitate can be analyzed using mass spectrometry, Western blotting, or any number of other methods for identifying constituents in the complex.
A Western blot, or immunoblot, is a method to detect protein in a given sample of tissue homogenate or extract. It uses gel electrophoresis to separate denatured proteins by mass. The proteins are then transferred out of the gel and onto a membrane, typically polyvinyldiflroride or nitrocellulose, where they are probed using antibodies specific to the protein of interest. As a result, researchers can examine the amount of protein in a given sample and compare levels between several groups.
An ELISA, short for Enzyme-Linked ImmunoSorbent Assay, is a biochemical technique to detect the presence of an antibody or an antigen in a sample. It utilizes a minimum of two antibodies, one of which is specific to the antigen and the other of which is coupled to an enzyme. The second antibody will cause a chromogenic or fluorogenic substrate to produce a signal. Variations of ELISA include sandwich ELISA, competitive ELISA, and ELISPOT. Because the ELISA can be performed to evaluate either the presence of antigen or the presence of antibody in a sample, it is a useful tool both for determining serum antibody concentrations and also for detecting the presence of antigen.
Immunohistochemistry and immunocytochemistry refer to the process of localizing proteins in a tissue section or cell, respectively, via the principle of antigens in tissue or cells binding to their respective antibodies. Visualization is enabled by tagging the antibody with color producing or fluorescent tags. Typical examples of color tags include, but are not limited to, horseradish peroxidase and alkaline phosphatase. Typical examples of fluorophore tags include, but are not limited to, fluorescein isothiocyanate (FITC) or phycoerythrin (PE).
Flow cytometry is a technique for counting, examining and optionally sorting microscopic particles or cells suspended in a stream of fluid. It allows simultaneous multiparametric analysis of the physical and/or chemical characteristics of single cells flowing through an optical/electronic detection apparatus. A beam of light (e.g., a laser) of a single frequency or color is directed onto a hydrodynamically focused stream of fluid. A number of detectors are aimed at the point where the stream passes through the light beam; one in line with the light beam (Forward Scatter or FSC) and several perpendicular to it (Side Scatter (SSC) and one or more fluorescent detectors). Each suspended particle passing through the beam scatters the light in some way, and fluorescent chemicals in the particle may be excited into emitting light at a lower frequency than the light source. The combination of scattered and fluorescent light is picked up by the detectors, and by analyzing fluctuations in brightness at each detector, one for each fluorescent emission peak, it is possible to deduce various facts about the physical and chemical structure of each individual particle. FSC correlates with the cell volume and SSC correlates with the density or inner complexity of the particle (e.g., shape of the nucleus, the amount and type of cytoplasmic granules or the membrane roughness).
Immuno-polymerase chain reaction (IPCR) utilizes nucleic acid amplification techniques to increase signal generation in antibody-based immunoassays. Because no protein equivalence of PCR exists, that is, proteins cannot be replicated in the same manner that nucleic acid is replicated during PCR, the only way to increase detection sensitivity is by signal amplification. The target proteins are bound to antibodies which are directly or indirectly conjugated to oligonucleotides. Unbound antibodies are washed away and the remaining bound antibodies have their oligonucleotides amplified. Protein detection occurs via detection of amplified oligonucleotides using standard nucleic acid detection methods, including real-time methods.
D. Data Analysis
In some embodiments, a computer-based analysis program is used to translate the raw data generated by the detection assay (e.g., the presence, absence, or amount of a given gene fusion or other markers) into data of predictive value for a clinician. The clinician can access the predictive data using any suitable means. Thus, in some preferred embodiments, the present disclosure provides the further benefit that the clinician, who may not be specifically trained in genetics or molecular biology, need not understand the raw data. The data is can be presented directly to the clinician in its most useful form. The clinician is may then be then able to immediately utilize the information in order to optimize the care of the subject.
The present disclosure contemplates any method capable of receiving, processing, and transmitting the information to and from laboratories conducting the assays, medical personal, and subjects. For example, in some embodiments of the present invention, a sample (e.g., a biopsy or a serum or urine sample) is obtained from a subject and submitted to a profiling service (e.g., clinical lab at a medical facility, genomic profiling business, etc.), located in any part of the world (e.g., in a country different than the country where the subject resides or where the information is ultimately used) to generate raw data. Where the sample comprises a tissue or other biological sample, the subject may visit a medical center to have the sample obtained and sent to the profiling center, or subjects may collect the sample themselves (e.g., a urine sample) and directly send it to a profiling center. Where the sample comprises previously determined biological information, the information may be directly sent to the profiling service by the subject (e.g., an information card containing the information may be scanned by a computer and the data transmitted to a computer of the profiling center using an electronic communication systems). Once received by the profiling service, the sample is processed and a profile is produced (i.e., expression data), specific for the diagnostic or prognostic information desired for the subject.
The profile data may then be prepared in a format suitable for interpretation by a treating clinician. For example, rather than providing raw expression data, the prepared format may represent a diagnosis or risk assessment (e.g., likelihood of cancer being present) for the subject, along with recommendations for particular treatment options. The data may be displayed to the clinician by any suitable method. For example, in some embodiments, the profiling service generates a report that can be printed for the clinician (e.g., at the point of care) or displayed to the clinician on a computer monitor.
In some embodiments, the information is first analyzed at the point of care or at a regional facility. The raw data is then sent to a central processing facility for further analysis and/or to convert the raw data to information useful for a clinician or patient. The central processing facility provides the advantage of privacy (all data is stored in a central facility with uniform security protocols), speed, and uniformity of data analysis. The central processing facility can then control the fate of the data following treatment of the subject. For example, using an electronic communication system, the central facility can provide data to the clinician, the subject, or researchers.
In some embodiments, the subject is able to directly access the data using the electronic communication system. The subject may chose, for example, further or altered intervention or counseling based on the results. In some embodiments, the data is used for research use. For example, the data may be used to further optimize the inclusion or elimination of markers as useful indicators of a particular condition or stage of disease.
E. In Vivo Imaging
The gene fusions of the present disclosure may also be detected using in vivo imaging techniques, including but not limited to: radionuclide imaging; positron emission tomography (PET); computerized axial tomography, X-ray or magnetic resonance imaging methods, fluorescence detection, and chemiluminescent detection. In some embodiments, in vivo imaging techniques are used to visualize the presence of or expression of cancer markers in an animal (e.g., a human or non-human mammal). For example, in some embodiments, cancer marker mRNA or protein is labeled using a labeled antibody specific for the cancer marker. A specifically bound and labeled antibody can be detected in an individual using an in vivo imaging method, including, but not limited to, radionuclide imaging, positron emission tomography, computerized axial tomography, X-ray or magnetic resonance imaging method, fluorescence detection, and chemiluminescent detection. Methods for generating antibodies to the cancer markers of the present disclosure are described below.
The in vivo imaging methods of the present disclosure are useful in the diagnosis of cancers that express the cancer markers of the present invention (e.g., breast cancer). In vivo imaging is used to visualize the presence of a marker indicative of the cancer. Such techniques allow for diagnosis without the use of an unpleasant biopsy. The in vivo imaging methods of the present disclosure are also useful for providing prognoses to cancer patients. For example, the presence of a marker indicative of cancers likely to metastasize can be detected. The in vivo imaging methods of the present disclosure can further be used to detect metastatic cancers in other parts of the body.
In some embodiments, reagents (e.g., antibodies) specific for the gene fusions of the present disclosure are fluorescently labeled. The labeled antibodies are introduced into a subject (e.g., orally or parenterally). Fluorescently labeled antibodies are detected using any suitable method (e.g., using the apparatus described in U.S. Pat. No. 6,198,107, herein incorporated by reference).
In other embodiments, antibodies are radioactively labeled. The use of antibodies for in vivo diagnosis is well known in the art. Sumerdon et al., (Nucl. Med. Biol 17:247-254 [1990] have described an optimized antibody-chelator for the radioimmunoscintographic imaging of tumors using Indium-111 as the label. Griffin et al., (J Clin One 9:631-640 [1991]) have described the use of this agent in detecting tumors in patients suspected of having recurrent colorectal cancer. The use of similar agents with paramagnetic ions as labels for magnetic resonance imaging is known in the art (Lauffer, Magnetic Resonance in Medicine 22:339-342 [1991]). The label used will depend on the imaging modality chosen. Radioactive labels such as Indium-111, Technetium-99m, or Iodine-131 can be used for planar scans or single photon emission computed tomography (SPECT). Positron emitting labels such as Fluorine-19 can also be used for positron emission tomography (PET). For MRI, paramagnetic ions such as Gadolinium (III) or Manganese (II) can be used.
Radioactive metals with half-lives ranging from 1 hour to 3.5 days are available for conjugation to antibodies, such as scandium-47 (3.5 days) gallium-67 (2.8 days), gallium-68 (68 minutes), technetiium-99m (6 hours), and indium-111 (3.2 days), of which gallium-67, technetium-99m, and indium-111 are preferable for gamma camera imaging, gallium-68 is preferable for positron emission tomography.
A useful method of labeling antibodies with such radiometals is by means of a bifunctional chelating agent, such as diethylenetriaminepentaacetic acid (DTPA), as described, for example, by Khaw et al. (Science 209:295 [1980]) for In-111 and Tc-99m, and by Scheinberg et al. (Science 215:1511 [1982]). Other chelating agents may also be used, but the 1-(p-carboxymethoxybenzyl)EDTA and the carboxycarbonic anhydride of DTPA are advantageous because their use permits conjugation without affecting the antibody's immunoreactivity substantially.
Another method for coupling DPTA to proteins is by use of the cyclic anhydride of DTPA, as described by Hnatowich et al. (Int. J. Appl. Radiat. Isot. 33:327 [1982]) for labeling of albumin with In-111, but which can be adapted for labeling of antibodies. A suitable method of labeling antibodies with Tc-99m which does not use chelation with DPTA is the pretinning method of Crockford et al., (U.S. Pat. No. 4,323,546, herein incorporated by reference).
A preferred method of labeling immunoglobulins with Tc-99m is that described by Wong et al. (Int. J. Appl. Radiat. Isot., 29:251 [1978]) for plasma protein, and recently applied successfully by Wong et al. (J. Nucl. Med., 23:229 [1981]) for labeling antibodies.
In the case of the radiometals conjugated to the specific antibody, it is likewise desirable to introduce as high a proportion of the radiolabel as possible into the antibody molecule without destroying its immunospecificity. A further improvement may be achieved by effecting radiolabeling in the presence of the specific cancer marker of the present invention, to insure that the antigen binding site on the antibody will be protected. The antigen is separated after labeling.
In still further embodiments, in vivo biophotonic imaging (Xenogen, Almeda, Calif.) is utilized for in vivo imaging. This real-time in vivo imaging utilizes luciferase. The luciferase gene is incorporated into cells, microorganisms, and animals (e.g., as a fusion protein with a gene fusion of the present disclosure). When active, it leads to a reaction that emits light. A CCD camera and software is used to capture the image and analyze it.
F. Compositions & Kits
Any of these compositions, alone or in combination with other compositions of the present disclosure, may be provided in the form of a kit. For example, the single labeled probe and pair of amplification oligonucleotides may be provided in a kit for the amplification and detection of gene fusions of the present invention. Kits may further comprise appropriate controls and/or detection reagents. The probe and antibody compositions of the present disclosure may also be provided in the form of an array.
Compositions for use in the diagnostic methods of the present invention include, but are not limited to, probes, amplification oligonucleotides, and antibodies. Particularly preferred compositions detect a product only when a first gene fuses to a second gene. These compositions include: a single labeled probe comprising a sequence that hybridizes to the junction at which a 5′ portion from a first gene fuses to a 3′ portion from a second gene (i.e., spans the gene fusion junction); a pair of amplification oligonucleotides wherein the first amplification oligonucleotide comprises a sequence that hybridizes to a transcriptional regulatory region of a 5′ portion from a first gene fuses to a 3′ portion from a second gene; an antibody to an amino-terminally truncated protein resulting from a fusion of a first protein to a second gene; or, an antibody to a chimeric protein having an amino-terminal portion from a first gene and a carboxy-terminal portion from a second gene. Other useful compositions, however, include: a pair of labeled probes wherein the first labeled probe comprises a sequence that hybridizes to a transcriptional regulatory region of a first gene and the second labeled probe comprises a sequence that hybridizes to a second gene, probes and primers that span the fusion junction of a fusion generated by an internal deletion and antibodies that bind to amino acid sequences generated by internal deletions.
In some embodiments, the present disclosure provides compositions and methods for determining a treatment course of action in response to a subject's gene fusion status. For example, screening for NOTCH or MAST family kinase fusions is useful in identifying people with cancer who benefit from treatment with NOTCH or MAST kinase inhibitors. Individuals found to a have a gene fusions that comprises a NOTCH or MAST family member gene fusion are then treated with a NOTCH or MAST inhibitor, respectively.
The present disclosure is not limited to a particular NOTCH or MAST pathway inhibitor. NOTCH and MAST kinase inhibitors are known in the art. In some embodiments, inhibitors are antisense oligonucleotides, siRNA, antibodies and small molecules. Exemplary small molecule inhibitors include, but are not limited to, GSIs and other Notch inhibitors, as well as MAST-kinase specific inhibitors or the currently available serine/threonine kinase inhibitors. Examples include, but are not limited to, γ-secretase inhibitors (e.g., IL-X (cbz-IL-CHO), tripeptide γ-secretase inhibitor (z-Leu-leu-Nle-CHO), dipeptide γ-secretase inhibitor N—[N-(3,5-difluorophenacetyl)-L-alanyl]-S-phenylglycine t-butyl ester (DAPT), dibenzazepine), MK0752 (developed by Merck, Whitehouse Station, N.J.).
In other embodiments, FGF fusions are targeted by, for example, R3Mab, Palifermin or Kepivance (Amgen inc).
In some embodiments, the present disclosure provides drug screening assays (e.g., to screen for anticancer drugs). In some embodiments, the screening methods utilize cancer markers described herein. For example, in some embodiments, provided herein are methods of screening for compounds that alter (e.g., decrease) the expression of gene fusions. The compounds or agents may interfere with transcription, by interacting, for example, with the promoter region. The compounds or agents may interfere with mRNA produced from the fusion (e.g., by RNA interference, antisense technologies, etc.). The compounds or agents may interfere with pathways that are upstream or downstream of the biological activity of the fusion. In some embodiments, candidate compounds are antisense or interfering RNA agents (e.g., oligonucleotides) directed against cancer markers. In other embodiments, candidate compounds are antibodies or small molecules that specifically bind to a cancer marker regulator or expression products of the present disclosure and inhibit its biological function.
In one screening method, candidate compounds are evaluated for their ability to alter cancer marker expression by contacting a compound with a cell expressing a cancer marker and then assaying for the effect of the candidate compounds on expression. In some embodiments, the effect of candidate compounds on expression of a cancer marker gene is assayed for by detecting the level of cancer marker mRNA expressed by the cell. mRNA expression can be detected by any suitable method.
In other embodiments, the effect of candidate compounds on expression of cancer marker genes is assayed by measuring the level of polypeptide encoded by the cancer markers. The level of polypeptide expressed can be measured using any suitable method, including but not limited to, those disclosed herein.
Specifically, provided herein are screening methods for identifying modulators, i.e., candidate or test compounds or agents (e.g., proteins, peptides, peptidomimetics, peptoids, small molecules or other drugs) which bind to gene fusions of the present disclosure, have an inhibitory (or stimulatory) effect on, for example, cancer marker expression or cancer marker activity, or have a stimulatory or inhibitory effect on, for example, the expression or activity of a cancer marker substrate. Compounds thus identified can be used to modulate the activity of target gene products (e.g., cancer marker genes) either directly or indirectly in a therapeutic protocol, to elaborate the biological function of the target gene product, or to identify compounds that disrupt normal target gene interactions. Compounds that inhibit the activity or expression of cancer markers are useful in the treatment of proliferative disorders, e.g., cancer, particularly breast cancer.
In one embodiment, the disclosure provides assays for screening candidate or test compounds that are substrates of a cancer marker protein or polypeptide or a biologically active portion thereof. In another embodiment, the disclosure provides assays for screening candidate or test compounds that bind to or modulate the activity of a cancer marker protein or polypeptide or a biologically active portion thereof.
The test compounds of the present disclosure can be obtained using any of the numerous approaches in combinatorial library methods known in the art, including biological libraries; peptoid libraries (libraries of molecules having the functionalities of peptides, but with a novel, non-peptide backbone, which are resistant to enzymatic degradation but which nevertheless remain bioactive; see, e.g., Zuckennann et al., J. Med. Chem. 37: 2678-85 [1994]); spatially addressable parallel solid phase or solution phase libraries; synthetic library methods requiring deconvolution; the ‘one-bead one-compound’ library method; and synthetic library methods using affinity chromatography selection. The biological library and peptoid library approaches are preferred for use with peptide libraries, while the other four approaches are applicable to peptide, non-peptide oligomer or small molecule libraries of compounds (Lam (1997) Anticancer Drug Des. 12:145).
Examples of methods for the synthesis of molecular libraries can be found in the art, for example in: DeWitt et al., Proc. Natl. Acad. Sci. U.S.A. 90:6909 [1993]; Erb et al., Proc. Nad. Acad. Sci. USA 91:11422 [1994]; Zuckermann et al., J. Med. Chem. 37:2678 [1994]; Cho et al., Science 261:1303 [1993]; Carrell et al., Angew. Chem. Int. Ed. Engl. 33.2059 [1994]; Carell et al., Angew. Chem. Int. Ed. Engl. 33:2061 [1994]; and Gallop et al., J. Med. Chem. 37:1233 [1994].
Libraries of compounds may be presented in solution (e.g., Houghten, Biotechniques 13:412-421 [1992]), or on beads (Lam, Nature 354:82-84 [1991]), chips (Fodor, Nature 364:555-556 [1993]), bacteria or spores (U.S. Pat. No. 5,223,409; herein incorporated by reference), plasmids (Cull et al., Proc. Nad. Acad. Sci. USA 89:18651869 [1992]) or on phage (Scott and Smith, Science 249:386-390 [1990]; Devlin Science 249:404-406 [1990]; Cwirla et al., Proc. Natl. Acad. Sci. 87:6378-6382 [1990]; Felici, J. Mol. Biol. 222:301 [1991]).
In one embodiment, an assay is a cell-based assay in which a cell that expresses a cancer marker mRNA or protein or biologically active portion thereof is contacted with a test compound, and the ability of the test compound to the modulate cancer marker's activity is determined Determining the ability of the test compound to modulate cancer marker activity can be accomplished by monitoring, for example, changes in enzymatic activity, destruction or mRNA, or the like.
The present disclosure contemplates the generation of transgenic animals comprising an exogenous cancer marker gene (e.g., gene fusion) of the present disclosure or mutants and variants thereof (e.g., truncations or single nucleotide polymorphisms). In preferred embodiments, the transgenic animal displays an altered phenotype (e.g., increased or decreased presence of markers) as compared to wild-type animals. Methods for analyzing the presence or absence of such phenotypes include but are not limited to, those disclosed herein. In some preferred embodiments, the transgenic animals further display an increased or decreased growth of tumors or evidence of cancer.
The transgenic animals of the present disclosure find use in drug (e.g., cancer therapy) screens. In some embodiments, test compounds (e.g., a drug that is suspected of being useful to treat cancer) and control compounds (e.g., a placebo) are administered to the transgenic animals and the control animals and the effects evaluated.
The transgenic animals can be generated via a variety of methods. In some embodiments, embryonal cells at various developmental stages are used to introduce transgenes for the production of transgenic animals. Different methods are used depending on the stage of development of the embryonal cell. The zygote is the best target for micro-injection. In the mouse, the male pronucleus reaches the size of approximately 20 micrometers in diameter that allows reproducible injection of 1-2 picoliters (pl) of DNA solution. The use of zygotes as a target for gene transfer has a major advantage in that in most cases the injected DNA will be incorporated into the host genome before the first cleavage (Brinster et al., Proc. Natl. Acad. Sci. USA 82:4438-4442 [1985]). As a consequence, all cells of the transgenic non-human animal will carry the incorporated transgene. This will in general also be reflected in the efficient transmission of the transgene to offspring of the founder since 50% of the germ cells will harbor the transgene. U.S. Pat. No. 4,873,191 describes a method for the micro-injection of zygotes; the disclosure of this patent is incorporated herein in its entirety.
In other embodiments, retroviral infection is used to introduce transgenes into a non-human animal. In some embodiments, the retroviral vector is utilized to transfect oocytes by injecting the retroviral vector into the perivitelline space of the oocyte (U.S. Pat. No. 6,080,912, incorporated herein by reference). In other embodiments, the developing non-human embryo can be cultured in vitro to the blastocyst stage. During this time, the blastomeres can be targets for retroviral infection (Janenich, Proc. Natl. Acad. Sci. USA 73:1260 [1976]). Efficient infection of the blastomeres is obtained by enzymatic treatment to remove the zona pellucida (Hogan et al., in Manipulating the Mouse Embryo, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. [1986]). The viral vector system used to introduce the transgene is typically a replication-defective retrovirus carrying the transgene (Jahner et al., Proc. Natl. Acad Sci. USA 82:6927 [1985]). Transfection is easily and efficiently obtained by culturing the blastomeres on a monolayer of virus-producing cells (Stewart, et al., EMBO J., 6:383 [1987]). Alternatively, infection can be performed at a later stage. Virus or virus-producing cells can be injected into the blastocoele (Jahner et al., Nature 298:623 [1982]). Most of the founders will be mosaic for the transgene since incorporation occurs only in a subset of cells that form the transgenic animal. Further, the founder may contain various retroviral insertions of the transgene at different positions in the genome that generally will segregate in the offspring. In addition, it is also possible to introduce transgenes into the germline, albeit with low efficiency, by intrauterine retroviral infection of the midgestation embryo (Jahner et al., supra [1982]). Additional means of using retroviruses or retroviral vectors to create transgenic animals known to the art involve the micro-injection of retroviral particles or mitomycin C-treated cells producing retrovirus into the perivitelline space of fertilized eggs or early embryos (PCT International Application WO 90/08832 [1990], and Haskell and Bowen, Mol. Reprod. Dev., 40:386 [1995]).
In other embodiments, the transgene is introduced into embryonic stem cells and the transfected stem cells are utilized to form an embryo. ES cells are obtained by culturing pre-implantation embryos in vitro under appropriate conditions (Evans et al., Nature 292:154 [1981]; Bradley et al., Nature 309:255 [1984]; Gossler et al., Proc. Acad. Sci. USA 83:9065 [1986]; and Robertson et al., Nature 322:445 [1986]). Transgenes can be efficiently introduced into the ES cells by DNA transfection by a variety of methods known to the art including calcium phosphate co-precipitation, protoplast or spheroplast fusion, lipofection and DEAE-dextran-mediated transfection. Transgenes may also be introduced into ES cells by retrovirus-mediated transduction or by micro-injection. Such transfected ES cells can thereafter colonize an embryo following their introduction into the blastocoel of a blastocyst-stage embryo and contribute to the germ line of the resulting chimeric animal (for review, See, Jaenisch, Science 240:1468 [1988]). Prior to the introduction of transfected ES cells into the blastocoel, the transfected ES cells may be subjected to various selection protocols to enrich for ES cells which have integrated the transgene assuming that the transgene provides a means for such selection. Alternatively, the polymerase chain reaction may be used to screen for ES cells that have integrated the transgene. This technique obviates the need for growth of the transfected ES cells under appropriate selective conditions prior to transfer into the blastocoel.
In still other embodiments, homologous recombination is utilized to knock-out gene function or create deletion mutants (e.g., truncation mutants). Methods for homologous recombination are described in U.S. Pat. No. 5,614,396, incorporated herein by reference.
The following examples are provided in order to demonstrate and further illustrate certain preferred embodiments and aspects of the present disclosure and are not to be construed as limiting the scope thereof.
Breast cancer cell lines were purchased from the American Type Culture Collection (ATCC) or obtained from individual collections. Cells were grown in specified media supplemented with fetal bovine serum and antibiotics (Invitrogen), or supplements designated for the media (Lonza). This study was approved by the respective Internal Review Boards and breast cancer samples were obtained from the University of Michigan and the Breakthrough Breast Cancer Research Centre, Institute of Cancer Research (London, UK). Table 2 shows the complete list of cell lines and tissue samples used for this study.
Total RNA was extracted from normal and cancer breast cell lines and breast tumor tissues using Trizol reagent (Invitrogen), and further purified on RNeasy columns (QIAGEN) according to the manufacturer's instructions. Five additional human breast cancer total RNAs were purchased from Origene. The quality of RNA was assessed with the Agilent Bioanalyzer 2100 using RNA Nano reagents (Agilent). Two rounds of polyA selection were performed using SeraMag oligo dT magnetic beads (SeraDyn) following the Illumina protocol. Transcriptome libraries from the mRNA fractions were generated following the RNA-SEQ protocol (Illumina) and size selected using 3% NuSieve agarose gels (Lonza) followed by gel extraction using QIAEX II reagents (QIAGEN) with a gel melting temperature of 32° C. Libraries were quantified using the Bioanalyzer 2100 using the DNA 1000 protocol and reagents (Agilent). Each sample was sequenced in a single lane with the Illumina Genome Analyzer II (40-80 nucleotide read length) or with the Illumina HiSeq 2000 (100 nucleotide read length). Number of reads passing filter for each sample is shown in Table 3. Paired-end transcriptome reads passing filter were mapped to the human reference genome (hg18) and UCSC genes, allowing up to two mismatches, with Illumina ELAND software (Efficient Alignment of Nucleotide Databases). Sequence alignments were subsequently processed to nominate gene fusions using the method described earlier (Maher, C. A. et al. Nature 458, 97-101 (2009); Maher, C. A. et al. Proc Natl Acad Sci USA 106, 12353-8 (2009)). In brief, paired end reads were processed to identify any that either contained or spanned a fusion junction. Encompassing paired reads refer to those in which each read aligns to an independent transcript, thereby encompassing the fusion junction. Spanning mate pairs refer to those in which one sequence read aligns to a gene and its paired-end spans the fusion junction. Both categories undergo a series of filtering steps to remove false positives before being merged together to generate the final chimera nominations.
Following RNA integrity analysis using the Agilent BioAnalyzer 2100 protocol, 74 individual breast carcinomas were placed in two pools. The first pool consisted of 200 ng each of 35 RNAs with RIN values between 3 and 5 and the second pool consisted of 39 RNAs with RIN values between 5.1 and 7.5. The pooled RNAs were depleted of rRNAs using RiboMinus reagents and protocols (Invitrogen). The rRNA depleted pools were converted to paired-end libraries Illumina RNA-SEQ paired end libraries following the standard protocol with the omission of the poly A selection. Following size selection of 250 to 350 bp fragments on agarose gels, the DNA was recovered using the QIAQuick method (QIAGEN) and amplified for 8 cycles using Illumina PE1.0 and PE 2.0 primers and amplification conditions. After purification by the Ampure XP method (Agencourt) the concentration was determined using a Naondrop spectrophotometer. Capture probes were generated for exons 2-10 of MAST1 and MAST2. Primer pairs generating PCR products between 105 and 140 bp were designed and a sequence encoding the T7 RNA polymerase promoter was added to the 5′ end of the forward primer in each pair. The primers are shown in Table 6. 10 cycles of PCR amplification using 10 ng of cDNA plasmids for each gene was performed using HotStar polymerase reagents (QIAGEN). Biotinylated RNA probes were synthesized by in vitro transcription reactions using the T7 Maxiscript protocol (Ambion). Reactions were performed using 0.5 mM ATP, 0.5 mM CTP, 0.5 mM GTP, 0.3 mM UTP, and 0.2 mM biotin-16-UTP. After synthesis at 37° C. for 1 hr, the reactions were digested with DNase I and RNA was purified using the RNAClean method (Agencourt). Each biotinylated RNA probe was adjusted to a concentration of 100 ng/μl and pooled. Pooled probes were hybridized to 2 μg of the previously generated paired-end libraries using conditions and reagents of the SureSelect system (Agilent). Following hybridization for 48 hr, fragments were captured using Dynal M280 streptavidin magnetic beads, washed and eluted using SureSelect protocols. The captured library was reamplified for 14 cycles using Illumina primers and conditions, purified using Ampure XP reagents and submitted for sequencing.
Breast cancer cell line DNAs (ATCC) were labeled and hybridized to Agilent 244K chips using the manufacturer's protocol. Arrays were scanned with an Agilent Microarray Scanner and data were extracted and analyzed with CGH Analytics software.
To detect the genomic rearrangements of NOTCH1 gene in HCC1599 and HCC2218 cells, mate-pair genomic libraries with a 4-4.5 kb insert size were prepared and sequenced. In brief, genomic DNA was isolated from the two cells lines and fragmented by a HydroShear device (Genomic Solutions) to a peak size of 4-5 kb. Mate pair libraries were prepared according to the manufacturer's instructions (Illumina). The libraries were sequenced with the Illumina HiSeq 2000 system.
To validate the fusion gene transcripts detected by next-generation sequencing, total RNA was isolated from the index cell lines, control cell lines, and breast tissues. Quantitative RT-PCR assays using SYBR Green Master Mix (Applied Biosystems) were carried out with the StepOne Real-Time PCR System (Applied Biosystems). Relative mRNA levels of each chimera shown were normalized to the expression of the housekeeping gene GAPDH. All the oligonucleotide primers were obtained from Integrated DNA Technologies (IDT) and the sequences are listed in Table 6. To detect the genomic fusion junction between NOTCH2 and SEC22B genes in HCC1187 cells, primers were designed flanking the predicted fusion position and PCR reactions were carried out to amplify the fusion fragments. PCR products were purified from agarose gels using the QIAEX II system (QIAGEN) and sequenced by Sanger sequencing methods at the University of Michigan Sequencing Core.
For MAST2 fusion protein detection, cell pellets were sonicated in NP40 lysis buffer (50 mM Tris-HCl, 1% NP40, pH 7.4, Sigma), complete protease inhibitor mixture (Roche) and phosphatase inhibitor (EMD bioscience) Immunoblot analysis for MAST2 was carried out using MAST2 antibody from Novus Biologicals. Human β-actin antibody (Sigma) was used as a loading control. For NOTCH1 protein detection, cells were lysed in RIPA buffer containing protease inhibitor cocktail (Pierce). Proteins were separated by SDS-PAGE, transferred to nitrocellulose membranes and probed with antibodies recognizing total NOTCH1 (Cell Signaling), γ-secretase-cleaved NOTCH1 (NICD, Cell Signaling), or beta-actin (Santa Cruz). The signal was detected by chemiluminescence using Immun-Star Western C reagents (Bio-Rad).
Immunoblot analysis for pAKT, total AKT, pERK, total ERK, PTEN were performed after supplement starvation of TERT-HME1 cells for 3 h. Note that, upon supplement starvation pERK could not be resolved as two distinct bands of p42/p44. For the MDAMB-468 cells the cells were treated with fusion specific siRNAs for 2 days and serum starved for 6 hours before probing for the signaling molecules. All the above antibodies were purchased from Cell Signaling. Additional immunoblot screening of signaling molecules was performed at Kinexus, using lysates prepared as previously described.
The ZNF700-MAST1 fusion ORF from BrCa00001 was cloned into pENTR-D-TOPO Entry vector (Invitrogen) following the manufacturer's instructions. Sequence confirmed entry clones in correct orientation were recombined into Gateway pcDNA-DEST40 mammalian expression vector (Invitrogen) by LR Clonase II enzyme reaction. Plasmids with C-terminus V5 tags were generated and tested for protein expression by transfection in HEK293 cells. A full-length expression construct of MAST2 with DDK tag was obtained from Origene.
Each of the five MAST fusion alleles, were cloned with an amino terminal FLAG epitope tag into the lentiviral vector pCDH510-B (SABiosciences). Lentivirus was produced by cotransfecting each of the MAST vectors with the ViraPower packaging mix (Invitrogen) into 293T cells using FuGene HD transfection reagent (Roche). Twelve hours posttransfection, the media was changed. Thirty-six hours post-transfection the viral supernatants were harvested, centrifuged at 5000 g for 30 minutes and then filtered through a 0.45 micron Steriflip filter unit (Millipore) TERT-HME1 cells at 30% confluence were infected at an MOI of 20 with the addition of polybrene at 8 μg/ml. Forty-eight hours post-infection, the cells were split and placed into selective media containing 5 μg/ml puromycin. Pools of resistant cells were obtained and analyzed for expression of the MAST fusion constructs by western blot analysis with monoclonal anti-FLAG antibody (Sigma-Aldrich). Stable pools of TERT-HME1 cells expressing the NOTCH fusion alleles, as well as a control NOTCH1 intracellular domain were generated using the same procedures as was done above, with the exception that the NOTCH fusion alleles were cloned into pCDH510B without an amino terminal FLAG epitope tag.
HEK293 cells were transfected with the above mentioned constructs using Fugene 6 reagent (Roche). MAST1 protein over-expression was validated by probing with V5 antibody (Sigma). MAST2 over-expression was validated using DDK antibody (Origene). HMEC-TERT cells were transfected using Fugene 6 and polyclonal populations of cells expressing MAST1, MAST2 or empty vector constructs were selected using geneticin. For siRNA knockdown experiments, Smart-pool siRNAs from Thermo were used (J-004633-06, J-004633-07, and J-004633-08). All siRNA transfections were carried out using oligofectamine reagent (Life Sciences) and three days post transfection the cells were plated for proliferation assays. At the indicated times cell numbers were measured using Coulter Counter. Lentiviral particles expressing the MAST2 shRNA (Sigma, TRCN0000001733) were transduced using polybrene, according to the manufacturer's instructions. Polyclonal populations expressing the MAST2 shRNA sequences were selected using 0.5-1 μg/ml puromycin.
Equal number of MDA-MB-468 cells, transduced with scrambled or MAST2 shRNA lentivirus particles were plated and selected using puromycin. After 7-8 days the plates were stained with crystal violet to visualize the number of colonies formed. For quantitation of differential staining, the plates were treated with 10% acetic acid and absorbance was read at 750 nm.
Polyclonal populations of HMEC-TERT over-expressing MAST1, MAST2 or vector control were plated and relative confluence measurements were made at 30 minute intervals using the Incucyte system. Rate of increase in confluence is indicative of increase in cell proliferation. For the wound healing assay, vector control or MAST1 over-expressing cells were plated at high density and 6 hours later, uniform scratch wounds were made using Woundmaker (Incucyte). Relative migration potential of the cells was assessed by confluence measurements at regular time intervals as indicated, over the wound area.
Chicken chorioallantoic membrane (CAM) assay for tumor growth was carried out as follows. Fertilized eggs were incubated in a humidified incubator at 38° C. for 10 days, and then CAM was dropped by drilling two holes: a small hole through the eggshell into the air sac and a second hole near the allantoic vein that penetrates the eggshell membrane but not the CAM. Subsequently, a cutoff wheel (Dremel) was used to cut a 1 cm2 window encompassing the second hole near the allantoic vein to expose the underlying CAM. When ready, CAM was gently abraded with a sterile cotton swab to provide access to the mesenchyme and 2×106 cells in 50 μl volume were implanted on top. The windows were subsequently sealed and the eggs returned to the incubator. After 7 days extra-embryonic tumors were isolated and weighed. 5-10 eggs per group were used in each experiment.
Four week-old female SCID C.B17 mice were procured from a breeding colony at University of Michigan. MDA-MB-468 cells infected with lentivirus constructs of scrambled or MAST2 shRNA were selected for 3 days using puromycin. Mice were anesthetized using a cocktail of xylazine (80 mg/kg IP) and ketamine (10 mg/kg IP) for chemical restraint. MAST2 shRNA or scrambled shRNA knockdown MDA-MB-468 breast cancer cells (4 million) or NOTCH1 fusion allele positive HCC1599 breast cancer cell line (5 million) were resuspended in 100 ul of 1×PBA with 20% Matrigel (BD Biosciences) and implanted into right and left abdominal-inguinal mammary fat. Ten mice were included in each group. Two weeks after tumor implantation, HCC1599 xenograted mice were treated with γ-secretase inhibitor (DAPT) dissolved in 5% ethanol in corn oil (IP). Mice in control group also received 5% ethanol in corn oil as vehicle control. Tumor growth was recorded weekly by using digital calipers and tumor volumes were calculated using the formula (R/6) (L×W2), where L=length of tumor and W=width.
For cell proliferation assays, cells were seeded into 96-well plates in triplicate and allowed to attach overnight before drug treatment. The γ-secretase inhibitor DAPT (EMD Biosciences) was added to the cultures the next day at concentrations of 0, 0.3, 1, and 3 μM. Relative cell numbers were measured by WST-1 assays at indicated time points following the manufacturer's instructions (Roche).
Breast cancer cells were seeded into 24-well dishes in triplicate and allowed to attach overnight. Cells were then infected with a Notch-reporter construct Lenti-RBPJ-firefly luciferase together with a Lenti-CMV-Renilla luciferase control (SABiosciences/QIAGEN). The two lentiviral stocks were mixed at a ratio of 50 Notch reporters to 1 CMV control and a single mixture was used to infect all recipient cell lines at a MOI of 100. Following incubation for 48 hours, cell lysates were prepared and measured for Notch activity using Promega Dual Luciferase reagents and Passive Lysis Buffer. Firefly luciferase levels were normalized using corresponding Renilla luciferase levels for each cell line. To confirm that Notch pathways are activated in the index cell lines through Notch gene rearrangements, the activated NOTCH1 and NOTCH2 alleles were cloned from HCC1599, HCC2218, and HCC1187 into a pcDNA3.1 vector. These expression constructs, pcDNA3.1-1599-NOTCH1, pcDNA3.1-2218-NOTCH1, and pcDNA3.1-1187-NOTCH2 and positive control NOTCH1-NICD, were individually transfected into 293T cells along with the pGL4-RBPJ-4X reporter plasmid and pTKRenilla luciferase control plasmid. Cells were harvested for luciferase activity assays 24 hours after transfection and assayed as above.
A panel of 41 breast cancer cell lines, and 37 breast cancer tissues, along with 8 benign breast epithelial cell lines and 2 benign breast tissues, was sequenced by paired-end sequencing of transcriptome libraries followed by analysis for gene fusions using a previously developed chimera discovery pipeline (Maher, C. A. et al., Nature 458, 97-101 (2009); Maher, C. A. et al. Proc Natl Acad Sci USA 106, 12353-8 (2009)). 42 of the samples were ER (estrogen receptor) positive, 21 exhibited amplified ERBB2, and 26 were classified as triple negative (Tables 2 and 3). Fusion transcript discovery and validation lead to the identification of 372 gene fusions, at an average of over four gene fusions per breast cancer sample (Table 4). Gene fusions were identified in all 41 breast cancer cell lines and all but 3 primary tumors. A slightly higher number of gene fusions was detected in the cell lines compared to primary tumors.
A closer examination of the chromosomal coordinates of the fusion partner genes revealed that a majority of the gene fusions clustered in regions of chromosomal amplifications (
Chromosome 17 harbors the ERBB2 amplicon and an adjacent amplicon that includes genes such as BCAS3, RPS6 KB1, and TMEM49 among others, accounted for a third of all the gene fusions in samples with CGH data. (Table 4). Other recurrent loci harboring multiple gene fusions include the BCAS4 amplicon on chr20 and the chr8q amplicon. No single gene fusion from the more than 350 identified here was found to be recurrent in the compendium, even as several fusion genes did appear in combination with different fusion partners. For example, three fusions each involving IKZF3 and BCAS3 as 3′ partners were found in three different cell lines—all with different 5′ partners; likewise TRIM37 was a common 5′ partner in three distinct gene fusions with different 3′ partners. Overall, 24 genes were found to be recurrent fusion partners, often associated with amplicons (Table 4).
In order to focus on potentially tumorigenic ‘driver’ fusions, the gene fusions were prioritized based on the known cancer-associated functions of component genes such as if the 3′ partner was a kinase, oncogene, tumor suppressor or known to be fusion partners in the Mitelman Database of chromosomal aberrations in cancer. In the sample set, 5 cases of fusions of MAST family kinases and 7 cases with fusions of genes in the Notch family were identified. Singleton fusions with open reading frames that could potentially be considered ‘drivers’ included SPRED1-BUB1B (kinase), MYO15B-MAP3K3 (kinase), BCL2L14-ETV6 (ETS transcription factor), MSI2-NEK8 (kinase), and SEC11C-MALT1 (oncogene) among others (Tables 1 and 5). Notch and MAST kinase fusions were mutually exclusive and occurred mostly in ER negative breast carcinoma samples (Table 1 and
Three independent cases of MAST gene fusions were identified by initial transcriptome sequence analyses-ZNF700-MAST1 in breast cancer tissue BrCa00001, NFIX-MAST1 in breast carcinoma BrCa10017, and ARID1A-MAST2 in a triple negative (ER-/PR-/ERBB2-) breast cancer cell line MDA-MB-468 (
Each of the fusions was confirmed by fusion-specific PCR in the respective samples (
Next, the functional aspects of MAST fusion proteins were investigated. The ZNF700-MAST1 fusion transcript encodes a truncated MAST1 protein that retains the kinase (as well as PDZ) domain. The fusion encoded open reading frame from the index sample, breast cancer tissue BrCa00001, was cloned into an expression vector. A commercially available full-length MAST2 expression construct was used to mimic the function of ARID1A-MAST2 over-expression, as this fusion encodes nearly full length MAST2 (along with a 379 amino acid segment from ARID1A). To assess the potential oncogenic functions of MAST genes, epitope tagged truncated MAST1 and full length MAST2 were ectopically over-expressed in the benign breast cell line, HMEC-TERT. Expression of the respective constructs was confirmed using anti-V5 and anti-DDK antibodies (
With the identification of the newer MAST fusions using the pooled transcript capture and sequencing approach and for a more comprehensive analysis of all the MAST fusions identified in the study, MAST1/MAST2 fusions were cloned and expressed in a lentiviral expression system. Consistent with the earlier observations, TERT-HME1 cells overexpressing the five MAST fusions (
To study the role of the endogenous ARID1A-MAST2 fusion in MDA-MB-468 cells, multiple independent MAST2 siRNAs were used to achieve a marked knockdown of the MAST2 fusion (
To characterize the effects of the ARID1A-MAST2 fusion in MDA-MB-468 cells further, shRNA targeting MAST2, which displayed efficient knockdown of ARID1A-MAST2 fusion at both the transcript (
Fusion transcript discovery and validation detected a high frequency of Notch gene rearrangement with 7 rearrangements involving either NOTCH1 or NOTCH2 in the samples tested (Table 1,
All of the Notch family gene rearrangements were found in ER negative breast carcinomas, and all but one in triple negative breast carcinomas. While both 5′ and 3′ fusion transcripts of Notch were identified in breast cancer samples (
To determine whether the observed fusions transcripts were the result of DNA rearrangements, mate-pair genomic library sequencing and long-range genomic PCR was performed to identify DNA breakpoints associated with the gene loci involved in the fusion transcripts (
The Notch fusion transcripts are abundantly expressed and are specific to samples harboring DNA rearrangements. SYBR Green QPCR experiments using primers on either side of each of the transcript fusion junctions detected expression exclusively in the sample harboring the underlying DNA rearrangements (
The predicted open reading frames for the NOTCH1 and NOTCH2 fusion transcripts are illustrated in
It was next evaluated whether the Notch fusion alleles identified above were capable of activating the Notch pathway in the index cases and when introduced into recipient cells. The activity of the Notch pathway in a panel of breast cell lines was measured using a dual luciferase assay following lentiviral delivery of Notch reporter and control vectors into recipient cells. The results presented in
The three index breast cell lines containing the Notch fusions (HCC1599, HCC2218, and HCC1187) exhibit decreased cell-matrix adhesion and grow in suspension, or as weakly adherent clusters, unlike the majority of breast carcinoma cell lines (
Notch fusion alleles provide a target for therapeutic intervention. The three characterized Notch fusions represent two functional classes. The first class, exemplified by the HCC2218 and HCC1599 fusions, produces a protein similar to that produced by the ADAM17/TACE catalyzed S2 cleavage, which occurs during ligand activation of the Notch pathway. The second class, exemplified by the HCC1187 fusion, produces a protein similar to the NICD produced after cleavage at S3 by γ-secretase. The first class requires cleavage at S3 site by γ-secretase to release NICD, and thus would be expected to be sensitive to γ-secretase inhibitors (GSIs). The second class would be unaffected by GSIs, as the fusion generates an ORF similar to NICD. To test this, stable Notch reporter cell lines were established from each of the three Notch fusion positive carcinoma lines by infection with a lentivirus carrying the Notch responsive promoter driving firefly luciferase. Each of the three cell lines was treated with the γ-secretase inhibitor DAPT 31, and luciferase activity was measured in cell lysates 24 hours later.
Treatment with the γ-secretase inhibitor DAPT repressed Notch target gene expression in a rapid manner. Expression levels of the Notch target genes CCND1, MYC, and HEY1 were monitored over a 24-hour treatment time course in the cell lines harboring Notch fusions dependent on γ-secretase processing (
aThe ER/PR positivity and ERBB2 overexpression status are derived from RNA sequencing data presented in this study.
bDr. Jorge Reis-Filho, The Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London, UK.
cDr. Stephen Ethier, Karmonos Cancer Institute, Detroit, MI.
dDr. Michael Kinch, Basic Medical Science, Purdue University.
aThe ER/PR positivity and ERBB2 overexpression status are from clinical diagnosis.
bDr. Jorge Reis-Filho, The Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London, UK.
indicates data missing or illegible when filed
Experiments were conducted to identify additional fusions in breast cancer. Experiments identified an FGFR fusion in breast cancer and functionally recurrent fusions of ETV6 in breast cancer.
Table 9 shows FGFR3 fusions in a variety of cancers.
Fibroblast growth factors (FGFs) (FGF1-10 and 16-23) are mitogenic signaling molecules that have roles in angiogenesis, wound healing, cell migration, neural outgrowth and embryonic development. FGFs bind heparan sulfate glycosaminoglycans (HSGAGs), which facilitates dimerization (activation) of FGF receptors (FGFRs). FGFRs are transmembrane catalytic receptors that have intracellular tyrosine kinase activity.
Overexpression of fibroblast growth factor receptor 3 (FGFR3) has been shown to drive oncogenesis in a subset of patients with multiple myeloma. FGFR3 is an oncogenic driver of bladder cancer, indicating that FGFR3 has important roles in the oncogenesis of other epithelial cancers.
Table 10 shows ETV6 fusions in breast cancer
Additional breast cancer gene fusions include, but are not limited to, CTNNA1-JMJD1B and RB1CC1-JAK1.
Table 11 and
Although a variety of embodiments have been described in connection with the present disclosure, it should be understood that the claimed invention should not be unduly limited to such specific embodiments. Indeed, various modifications and variations of the described compositions and methods of the invention will be apparent to those of ordinary skill in the art and are intended to be within the scope of the following claims.
This application claims priority to U.S. Provisional Application No. 61/539,737, filed Sep. 27, 2011, which is herein incorporated by reference in its entirety.
This invention was made with government support under W81XWH-08-1-0110 and W81XWH-09-2-0014 awarded by The Army Medical Research and Materiel Command and CA111275 and CA046952 awarded by the National Institutes of Health. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
61539737 | Sep 2011 | US |