The present invention relates to a method to identify alternatively spliced variants enriched in cancer specimens.
The transformation of a normal cell into a malignant cell results, among other things, in the uncontrolled proliferation of the progeny cells, which exhibit immature, undifferentiated morphology, exaggerated survival and proangiogenic properties and expression, overexpression or constitutive activation of oncogenes not normally expressed in this form by normal, mature cells.
Nearly all cancers are caused by abnormalities in the genetic material of the transformed cells. These abnormalities may be due to the effects of carcinogens, such as tobacco smoke, radiation, chemicals, or infectious agents. Other cancer-promoting genetic abnormalities may be randomly acquired through errors in DNA replication, or are inherited, and thus present in all cells from birth. Complex interactions between carcinogens and the host genome may explain why only some develop cancer after exposure to a known carcinogen. New aspects of the genetics of cancer pathogenesis, such as DNA methylation, and microRNAs are increasingly being recognized as important.
One example of cancer is epithelial ovarian cancer which is the second most common gynecological cancer and the deadliest amongst gynecological pelvic malignancies. Early symptoms of ovarian cancer are often mild and unspecific, making this disease difficult to detect. In most cases, at the time of diagnosis, cancer cells have already disseminated throughout the peritoneal cavity. In fact, over 70% of patients are diagnosed with late stage disease and only a minority survive over 5 years post-diagnosis. Early detection offers a 90% 5-year survival rate. The inability to detect ovarian cancer at an early stage and its propensity for peritoneal metastasis are largely responsible for these low survival rates.
Currently, there are no reliable methods for detecting early stages of epithelial ovarian cancer. Blood level of CA125 tumour antigen is employed as a predictor of clinical recurrence of ovarian cancers, and to monitor response to anticancer therapy (Yang et al., 1994, Zhonghua Fu Chan Ke Za Zhi, 29: 147-149). The CA125 serum marker combined with transvaginal ultrasonography are the current clinical tests offered for screening for early stages of ovarian cancer in high risk populations (Nikolic et al., 2006, Bosn J Basic Med Sci 6: 3-6). However, neither of these modalities individually or combined have proven reliable (Nikolic et al., 2006, Bosn J Basic Med Sci 6: 3-6), and there is an urgent need to develop new screening tests to detect epithelial ovarian cancer at an early stage.
Epithelial ovarian tumours are heterogeneous and include many different histopathological subtypes: serous, endometrioid, mucinous, clear cell, undifferentiated or mixed. The serous type is the most frequent and the second most lethal. Recent studies have focused on differences in molecular profiling of gene expression patterns to uncover diagnostic and prognostic markers as well as new therapeutic targets in a variety of cancers. Although promising results have been reported in some cancers, the genes that are differentially expressed between normal and cancer cells seem to vary between individual microarray studies, reflecting either a variability in methods and in the choice of model systems or a heterogeneity in selected tissues (Kopper & Timar, 2005, Pathol Oncol Res, 11: 197-203).
Another example of cancer is breast cancer. Breast cancer is the fifth most common cause of cancer death (after lung cancer, stomach cancer, liver cancer and colon cancer). Among women worldwide, breast cancer is the most common cause of cancer death. There are numerous ways breast cancer is classified. Like most cancers, breast cancer can be divided into groups based on the tissue of origin, e.g. epithelial (carcinoma) versus stromal (sarcoma). The vast majority of breast cancers arise from epithelial tissue, i.e. they are carcinomas.
Breast cancer is diagnosed by the examination of surgically removed breast tissue. A number of procedures can obtain tissue or cells prior to definitive treatment for histological or cytological examination. Such procedures include fine-needle aspiration, nipple aspirates, ductal lavage, core needle biopsy, and local surgical excision. These diagnostic steps, when coupled with radiographic imaging, are usually accurate in diagnosing a breast lesion as cancer. Occasionally, pre-surgical procedures such as fine needle aspirate may not yield enough tissue to make a diagnosis, or may miss the cancer entirely. Imaging tests are sometimes used to detect metastasis and include chest X-ray, bone scan, Cat scan, MRI, and PET scanning. While imaging studies are useful in determining the presence of metastatic disease, they are not in and of themselves diagnostic of cancer. Only microscopic evaluation of a biopsy specimen can yield a cancer diagnosis. Ca 15.3 (carbohydrate antigen 15.3, epithelial mucin) is a tumor marker determined in blood which can be used to follow disease activity over time after definitive treatment. Blood tumor marker testing is not routinely performed for the screening of breast cancer, and has poor performance characteristics for this purpose.
Thus, detection of many cancers still relies on detection of an abnormal mass in the organ of interest. In many cases, a tumor is often detected only after a malignancy is advanced and may have metastasized to other organs. For example, breast cancer is typically detected by obtaining a biopsy from a lump detected by a mammogram or by physical examination of the breast. Also, although measurement of prostate-specific antigen (PSA) has significantly improved the detection of prostate cancer, confirmation of prostate cancer typically requires detection of an abnormal morphology or texture of the prostate. Thus, there is a need for methods for earlier detection of cancer. Such new methods could, for example, replace or complement the existing ones, reducing the margins of uncertainty and expanding the basis for medical decision making.
It would be highly desirable to be provided with novel biomarkers for the early detection, prognosis and clinical management of cancers. Sensitive and specific tests that can diagnose different stages of cancer would greatly improve patient survival rates by facilitating early diagnosis and tailored therapies. It would also be highly desirable to be provided with new screening tests to detect cancer at an early stage.
The present invention relates to a method to identify alternatively spliced variants enriched in cancer specimens.
In another embodiment, the cancer that can be detected in these cancer specimens is selected from the group consisting of breast cancer, glioma, large intestinal cancer, lung cancer, small cell lung cancer, stomach cancer, liver cancer, blood cancer, bone cancer, pancreatic cancer, skin cancer, head or neck cancer, cutaneous or intraocular melanoma, uterine sarcoma, ovarian cancer, rectal or colorectal cancer, anal cancer, colon cancer, fallopian tube carcinoma, endometrial carcinoma, cervical cancer, vulval cancer, squamous cell carcinoma, vaginal carcinoma, Hodgkin's disease, non-Hodgkin's lymphoma, esophageal cancer, small intestine cancer, endocrine cancer, thyroid cancer, parathyroid cancer, adrenal cancer, soft tissue tumor, urethral cancer, penile cancer, prostate cancer, chronic or acute leukemia, lymphocytic lymphoma, bladder cancer, kidney cancer, ureter cancer, renal cell carcinoma, renal pelvic carcinoma, CNS tumor, glioma, astrocytoma, glioblastoma multiforme, primary CNS lymphoma, bone marrow tumor, brain stem nerve gliomas, pituitary adenoma, uveal melanoma, testicular cancer, oral cancer, pharyngeal cancer, pediatric neoplasms, leukemia, neuroblastoma, retinoblastoma, glioma, rhabdomyoblastoma and sarcoma.
In one aspect, a method for prognosis of cancer in a subject by detecting a signature of splicing events comprising the steps of obtaining a nucleic acid sample from said subject, and determining whether the nucleic acid sample contains a signature specific to a cancer is disclosed.
There is also provided a method for profiling cancer in a subject by detecting a signature of splicing events comprising the steps of obtaining a nucleic acid sample from said subject, and determining whether the nucleic acid sample contains a signature specific to a cancer.
In a preferred embodiment, the signature comprises at least 1 splicing variant.
Further, the method disclosed herein also can comprise an initial step of designing at least two independent PCR primer pairs for each predicted exon-exon junction from a transcript map of a gene affected in a cancer.
In addition, the method disclosed herein can comprise a step of PCR amplifying the nucleic acid sample with the PCR primer pairs to obtain amplicons.
Further, the method disclosed herein can comprise the step of measuring the size and sequence of said amplicons.
Also in accordance with the present invention, there is disclosed a method for identifying a signature specific of a cancer, said signature consisting of at least one specific splicing event or a specific combination of splicing events, said method comprising the steps of designing at least two independent PCR primer pairs for each predicted exon-exon junction from a transcript map of a gene affected in cancer; reverse transcribing a template from RNA from a sample of cancer tissue and a sample from normal tissue; amplifying amplicons of said gene by PCR with the PCR primers pairs using the template reverse transcribed from the cancer tissue and the normal tissue; and determining the size and sequence of said amplicons; wherein the presence of said at least one alternative splicing event corresponds to the signature of the cancer.
Further, the method disclosed herein can comprise the step of performing a comparative analysis of amplicons obtained from the template reverse transcribed from the cancer tissue and the normal tissue.
Furthermore, the method disclosed herein can comprise the step of identifying the presence of at least one alternative splicing event in the gene.
In a preferred embodiment, the PCR primer pairs are designed to amplify amplicons ranging from 100 to 700 base pairs.
In another embodiment, the step of amplifying is carried out in a liquid handling system linked to a thermocycler.
Furthermore, the method disclosed herein can comprise the step of selecting amplicons with a difference of at least 10% of points between a mean Ψs for normal and cancer tissue and with a maximum standard deviation of the Ψs for each tissue type of at most 26%.
In accordance with the present invention, there is also provided a diagnostic kit for detecting a signature of ovarian cancer in a patient comprising PCR primer pairs for predicted exon-exon junctions of at least one splicing variant; and a set of instructions for using said primers to generate and detect a signature specific of a cancer, said signature consisting of at least one splicing variant or a specific combination of splicing variants.
In addition, the kit disclosed herein can also comprise a transcript map.
In another embodiment, the splicing variants occur in genes selected from the group consisting of AFF3, AGR3, APP, AXIN1, BMP4, BTC, C11orf17, CADM1, CCNE1, CHEK2, DNMT3B, FANCA, FANCL, FGFR1, FGFR2, FGFR4, FN1-EDA, FIN-EDB, FIN-IIICS, GATA3, GNB3, GPR137, HMGA1, HSC, IGSF4, KITLG, LGALS9, MCL1, NRG1, NUP98, PAXIP1, PLD1, POLI, POLM, PSAP, PTK2, PTPN13, RAD52, SHMT1, SLIT2, SRP19, STIM1, SYK, SYNE2 TOPBP1, TSSC4, TUBA4A and UTRN.
In another embodiment, the splicing variants occur in genes selected from the group consisting of ADAM15, BCAS1, C11ORF4, CCL4, CTNNA1, DDR1, DRF1, DSC3, ECGF1, ECT2, FN1, F3, H63, HMGA1, HMMR, INSR, LIG3, LIG4, NOTCH3, PACE4, POLB, PTPRB, RSN, RUNX2, SHC1, TLK1 and TNFRSF5.
In a preferred embodiment, the splicing events are selected from the group consisting of an alternative 3′ splicing, an alternative 5′ splicing, an alternative 3′ and 5′ splicing, a cassette exon and alternative 5′ or 3′ splicing, a multiple cassette exon splicing, a mutually exclusive exon splicing and a cassette exon splicing. Further, said splicing events are alternative cassette exon events.
In another embodiment, the splicing variant is:
Having thus generally described the nature of the invention, reference will now be made to the accompanying drawings, showing by way of illustration, preferred embodiments thereof, and in which:
In accordance with the present invention, there is provided a method to identify alternatively spliced variants enriched in cancer specimens.
The term “cancer” includes but is not limited to, breast cancer, large intestinal cancer, lung cancer, small cell lung cancer, stomach cancer, liver cancer, blood cancer, bone cancer, pancreatic cancer, skin cancer, head or neck cancer, cutaneous or intraocular melanoma, uterine sarcoma, ovarian cancer, rectal or colorectal cancer, anal cancer, colon cancer (generally considered the same entity as colorectal and large intestinal cancer), fallopian tube carcinoma, endometrial carcinoma, cervical cancer, vulval cancer, squamous cell carcinoma, vaginal carcinoma, Hodgkin's disease, non-Hodgkin's lymphoma, esophageal cancer, small intestine cancer, endocrine cancer, thyroid cancer, parathyroid cancer, adrenal cancer, soft tissue tumor, urethral cancer, penile cancer, prostate cancer, chronic or acute leukemia, lymphocytic lymphoma, bladder cancer, kidney cancer, ureter cancer, renal cell carcinoma, renal pelvic carcinoma, CNS tumor, glioma, astrocytoma, glioblastoma multiforme, primary CNS lymphoma, bone marrow tumor, brain stem nerve gliomas, pituitary adenoma, uveal melanoma (also known as intraocular melanoma), testicular cancer, oral cancer, pharyngeal cancer or a combination thereof. In an embodiment, the cancer is a brain tumor, e.g. glioma. In another embodiment, the cancer expresses the HER-2 or the HER-3 oncoprotein. The term “cancer” also includes pediatric cancers, including pediatric neoplasms, including leukemia, neuroblastoma, retinoblastoma, glioma, rhabdomyoblastoma, sarcoma and other malignancies.
Accordingly, there is provided a method of identifying new markers characteristic of a signature for a cancer.
Alternative splicing of pre-mRNA is a post-transcriptional process that allows the production of distinct mRNAs from a single gene with the potential to expand protein structure and diversity. Alternative splicing can also introduce or remove regulatory elements to affect mRNA translation, localization or stability. More than 70% of human genes may undergo alternative splicing with many genes capable of producing dozens and even hundreds of different isoforms.
In multicellular organisms, alternative splicing is a process that is tightly regulated during development and in different tissues. Inherited and acquired changes in pre-mRNA splicing patterns have been associated with several human diseases including cancer (Venables, 2006, Bioessays, 28: 378-386). Some of these changes can arise from mutations at either the splice sites or within proximal splicing enhancer or silencer elements (Pagani & Baralle, 2004, Nat Rev Genet, 5: 389-396). In other cases, variations in the expression of trans-acting splicing factors have been observed (Brinkman, 2004, Clin Biochem, 37: 584-594). A direct effect in splice site use resulting in the production of cancer-specific splice isoforms has been observed in a few cases (Karni et al., 2007, Struct Mol Biol, 14: 185-193). Cancer-specific alterations in splice site selection can affect genes controlling cellular proliferation (e.g., FGFR2, p53, MDM2, FHIT and BRCA1), cellular invasion (e.g., CD44, Ron), angiogenesis (e.g, VEGF), apoptosis (e.g, Fas, Bcl-x and caspase-2) and multidrug resistance (e.g., MRP-1).
Following initial computational efforts designed to exploit collections of expressed sequence tag (EST) databases, there has been an increase in high-throughput experimental approaches to identify changes in splicing events under a variety of conditions. Oligonucleotide-based microarray technologies have been introduced to identify global alternative splicing events and to examine changes in the alternative splicing of a large collection of events. These approaches are useful for monitoring the expression of known splice variants. However, they are not designed to discover novel splice sites, nor do they provide information on the combinatorial patterns of exon inclusion/skipping in the same gene. Furthermore, the lack of standardized analysis and normalization can compromise the interpretation of the results.
Arrays made from alternative splice junction probes have been used to detect splicing changes in Hodgkin Lymphoma (Relogio et al., 2005, J Biol Chem, 280: 4779-4784) and breast cancer cell lines and xenografts (Li et al., 2006, Cancer Res, 66: 1990-1999). A related medium-throughput technique has been used to show that alternative splicing analysis can complement the power of gene expression analysis of prostate tumours (Li et al., 2006, Cancer Res, 66: 4079-4088; Zhang et al., 2006, BMC Bioinformatics, 7: 202). So far, 30 different genes have been shown to be alternatively spliced in a cancer-specific manner (Venables, 2004, Cancer Res, 64: 7647-7654). In ovarian tumours specifically, three genes have been reported to be regulated at the level of splicing (He et al., 2007, Oncogene, advance online publication, Feb. 19, 2007; Sigalas et al., 1996, Nat Med 2: 912-917).
Thus, in the present invention, there is disclosed a layered and integrated system for splicing isoform annotation platform (LISA). LISA relies on automated RT-PCR technology that generates tissue-specific annotation of alternative splicing events. The bioinformatics infrastructure supporting the annotation effort helps assess the potential functional impact of individual alternative splicing events and allows adaptable visualization of large sets of validated results.
The LISA is used in a preferred embodiment to identify alternatively spliced variants enriched in cancer specimens. There is reported herein a set of highly significant and biologically relevant splicing differences that make up a strong signature for cancer samples.
The signature of a cancer sample consists in the presence of alternatively spliced variants in the sample, More specifically, the signature is composed of at least one gene, which discriminates between a cancerous tissue and normal tissue with about 90% accuracy. More preferably, the signature is composed of at least 2, 3, 4 or 5 variants, or markers, disclosed for example in Table 1 herein below, more preferably at least 6, 7, 8, 9, or 10, preferably 15, 20, 25, 30, 35, 40, 45, more preferably 48 variants. Thus, the combination of more than one alternative spliced variant can also be used as a signature.
In order to identify such alternatively spliced variants, there is disclosed a method using the layered and integrated system described herein. As a first step, a map of splicing events is generated. A list of genes potentially involved in cancer such as, for example in ovarian or breast cancer, is first obtained by screening databases. Then, the exon structure of each gene is determined. All splice sites are identified, generating the splicing map. The following step consists in designing PCR primers and designing PCR reactions to cover all putative exon-exon junctions identified on the splicing map. Following, the RNA isolated from samples from “normal tissues” (without cancer) and from samples positive for a specific cancer is reverse transcribed in bulk. The DNA obtained from the reverse transcription is then used as template for the PCR reactions conceived previously, also using the PCR primers designed previously. Once PCR amplicons are obtained, the amplicons from normal tissues are compared to those obtained from cancer tissues. Splicing events are thus identified following this comparison. The method described herein allows identification of splicing events which will be part of a cancer signature and will thus allow prognosing or profiling the presence of the target cancer in a patient by identifying the presence of this signature, i.e. the presence of one or more alternative splicing events occurring in cancer samples and not occurring in normal samples.
The LISA uses RT-PCR to provide a systematic and comprehensive coverage of alternative splicing events. As illustrated in
Contrary to current approaches that identify tissue-specific variations in alternative splicing profiles relying heavily on microarray analysis to produce large quantities of data that must further be validated by RT-PCR, a preferred embodiment is directed to a method that directly inspects hundreds of genes by RT-PCR without recourse to cumbersome slab gel methods. LISA effectively fills a gap between large-scale microarray studies and individual gene investigations, providing an alternative to array-based expression profiling.
As disclosed herein below, the LISA was used to provide high quality comprehensive annotation of alternative splicing for 600 genes in 46 different tissues. The analysis required nearly 100 000 RT-PCR reactions that were carried out in less than eight weeks.
One major advantage of LISA is the associated in silico filtering modules that can combine alternative splicing data with queries on sequence or coding information, such as Pfam domains, putative RNA secondary structure, and single nucleotide polymorphisms.
Furthermore, the encompassed method herein further comprises an initial step of verifying the tissue-specific representation of expression data in such databases. Depending on the result of this assessment, the coverage of each gene could be modified accordingly. Poorly represented tissues would benefit from a complete annotation strategy, as employed here, whereas for well represented tissues, the design module could be modified to focus only on EST supported alternative splicing events. This would allow gene analysis to be performed with limited number of PCR reactions, enabling the screening of many more genes or tissue specimens with the same total number of reactions.
In one example of a screen presented here, with only 600 genes, 48 splicing events not previously detected in ovarian cancers were identified.
The majority (>80%) of the identified cancer-specific alternative splicing events are exon cassettes that extended the coding portions of genes. (see Table 1). For example, the short DNMT3B isoform is lacking part of the catalytic DNA methyltransferase domain, including the TRD loop previously shown to be important for cytosine recognition, and is therefore inactive. Another example where alternative splicing affects function concerns the growth factor KITLG. In this case, the skipped exon encodes a metalloprotease cleavage site that determines whether KITLG will be membrane-bound or secreted. The transmembrane form is more active in promoting cell-cell adhesion, cell proliferation and survival by inducing more persistent tyrosine kinase activation than the secreted isoform. The overall preferential enrichment of in-frame alternative cassette exons within functional domains of ovarian cancer-associated genes suggests that alternative splicing of these genes contributes to ovarian tumour biology.
1ASE: alternative splicing event;
2EOC: epithelial ovarian cancer.
The present invention will be more readily understood by referring to the following examples which are given to illustrate the invention rather than to limit its scope.
The gene list was obtained by a keyword search for “ovarian cancer” in NCBI Gene database. The search was performed in January 2006, and was limited to human genes with “known” RefSeq status (Nucleic Acids Res 2005 Jan. 1; 33(1): D501-D504). The 233 genes generated from this search were cross referenced with the AceView database and 182 genes showing evidence of alternative splicing were selected for this study. The exon structure of each gene was determined using AceView as a source for cDNAs and multi-exon ESTs (Thierry-Mieg & Thierry-Mieg, 2006, Genome Biol, 7 Suppl 1, S12, 1-14). The LISA automatically identifies all splice sites and generates a splicing map, as shown for the neogenin homolog 1 (NEO1) gene in
Generally, one forward and one reverse primer are designed for each predicted exon-exon junction, as shown in
Normal and serous epithelial ovarian cancer tissue samples were obtained as frozen specimens from the Cancer Research Network of the FRSQ. Histopathology, grade and stage were assigned according to the International Federation of Gynecology and Osbtetrics (FIGO) criteria. Only chemotherapy naïve tumor samples were used in the study. RNA Extraction from 50 mg tissue samples was done using TRIZOL® Reagent according to the manufacturer's protocol, using a PowerMax™ homogenizing system equipped with a 10 mm saw tooth blade (VWR International). To retain maximum yield of RNA, DNase treatment was not performed. Extracted RNA was isopropanol precipitated, then resuspended in pure water and stored at −80° C. RNA concentration was quantified on an Agilent 2100 BioAnalyzer (Agilent technologies). Typical total RNA yields of 1 to 66 μg per 50 mg specimen were obtained.
Ovarian tissues were classified as normal or cancerous according to the relative expression profile of the genes KRT18, KRT7, VIM, CDH1, TERT relative to GAPDH as measured by QPCR using PCR primers flanking dual fluorescent probes (see Table 3). Eight normal and eight tumour RNA samples showing expression data closest to the median for each gene's expression level for the normal or tumour tissue type were selected and combined in equal amounts to formulate 2 normal and 2 tumour pools of 4 samples each. Tissue quality control was established using real-time PCR amplification of known genes with known cancer or tissues type specific expression profile including the epithelial cell markers KRT7, KRT18 and CDH1, the stromal marker vimentin, and the tumour cell content indicator hTERT (Table 3). KRT7, KRT18 and CDH1 were shown to the upregulated in high grade serous ovarian cancer (Chu & Weiss, 2002, Mod Pathol, 15: 6-10; Ouellet et al., 2005, Oncogene, 24: 4672-4687; Sun et al., 2007, Eur J Obstet Gynecol Reprod Biol, 130: 249-257).
1Dual labelled fluorescent probe, with 5′-FAM, 3′-TAMRA.
Tissues that fail the quality control were considered to have low tumour tissue content or reflect different or aberrant tumour subset, and were not considered further in the study. For the quality control, tissues (normal and cancerous) are first classified based on histopathological assessment. Moreover, since the portion of tissue used for subsequent analysis may be from a different region of the tumor that been examined by pathologists, one must assess the quality of the tissue that will be used following classification by pathologists by comparing expression levels of the 5 genes with the median expression levels for all tissues of a given type (normal or tumour) as called by histopathological assessment. Normal versus tumour tissues have different expression patterns for these 5 genes.
Reverse transcription was performed on 2 μg total RNA samples in the presence of RNAse inhibitor according to the manufacturers' protocols. Reactions were primed with both (dT)21 and random hexamers at final concentrations of 1 μM and 0.9 μM respectively. The integrity of the cDNA was assessed by SYBR® Green based quantitative PCR, performed on three housekeeping genes: MRPL19, PUM1 and GAPDH using primers illustrated in Table 4.
Ct (quantitative PCR cycle threshold) values for these genes, typically in the range of 14-25, depending on the gene, were used to verify the integrity of each cDNA sample. These Ct values are determined using standard SYBR green QPCR methods on an Eppendorf Mastercycler thermocycler. Following QPCR, the samples were analyzed by capillary electrophoresis to ensure that only one amplicon of the expected size was obtained.
PCR reactions were performed on 20 ng cDNA in 10 μl final volume containing 0.2 mM each dNTP, 1.5 mM MgCl2, 0.6 μM each primer and 0.2 units of Taq DNA polymerase. An initial incubation of 2 minutes at 95° C. was followed by 35 cycles at 94° C. 30 s, 55° C. 30 s, and 72° C. 60 s. The amplification was completed by a 2 minute incubation at 72° C.
RNA quantification and integrity analysis was performed on an Agilent bioanalyzer (Agilent, Santa Clara, Calif.), using the manufacturer's software. Analysis of the DNA amplification reactions was performed on Caliper LabChip® 90 instruments (Caliper LifeSciences, Hopkinton, Mass.), and amplicon sizing and relative quantification was performed by the manufacturer's software, prior to being uploaded to the LISA database.
The LISA was built around the LAMP solution stack of software programs (Linux operating system, Apache web server, Mysql database management server and Perl and Python programming languages). In addition, several peripheral Perl and Python modules for experimental design, analysis, and display of results interact with the LISA. Statistical t-tests and unsupervised clustering were performed using the R package.
The capillary electrophoresis instrument software (Caliper LifeSciences, Hopkinton, Mass.) provides size and concentration data for the detected peaks of each PCR reaction. These data are uploaded to the LISA database and compared with expected amplicon sizes for that experiment. Using the experimentally determined amplicon sizing data, a signal detection protocol assigns detected amplicons to expected sizes. Gene sequence, primer sequence, single nucleotide polymorphism sites and protein coding data are associated to each element of experimental data stored in the database.
For each PCR reaction covering an AS event, the concentration data from all RNA sources under consideration were used to determine the most prevalent assigned amplicon. For each RNA source, the ratio of the concentration of this amplicon to the total assigned amplicon concentrations measured is calculated and is expressed as a percentage, termed the percent splicing index, (PSI or Ψ). Ψ values for each reaction are used to compare alternative splicing profiles between RNA sources. Percent splicing index, Ψ values for different RNA sources are used in statistical t-tests, and resulting p-values are used in the screening process to determine cancer specificity. Reaction sets with Bonferroni-corrected p-values of less than 0.0002 were considered statistically significant hits.
The designed sets of PCR experiments are passed to the automated platform together with associated experimental conditions such as the RNA source, and PCR reaction conditions. Once the PCR amplification and capillary electrophoresis are completed, an electropherogram is generated that reflects the amplification pattern, as shown in
In addition to the tissue specific representation shown in
To identify cancer associated splicing events, a LISA based screening pipeline was constructed for genes associated with ovarian cancer (
The discovery screen was carried out using two pools of RNA extracted from high-grade (grades 2 and 3; grades are standard clinical classification of tumor that take into account the size and invasive status of the tumor) serous epithelial ovarian cancer specimens and two pools of RNA extracted from unmatched normal ovaries (same age group and no prior chemotherapy). Each pool contained an equivalent mix of four independent tissues. Normal ovarian tissues were selected from women undergoing oophorectomy for reasons other than ovarian cancer, and the normality of the ovaries was confirmed by standard pathology tissue examination. Ovaries with benign tumours or cysts were excluded. Most of the donors were postmenopausal women of the age group when most serous tumours develop. Screening of the two cancers and two normal pools identified 104 cancer associated splicing events in 98 different genes. Alternative splicing events identified in the discovery screen were re-examined on individual RNA samples extracted from 25 serous epithelial ovarian cancers (grades 2 and 3) and 21 normal ovary samples. Overlapping PCR reactions using RNA from each tissue were carried out as in the discovery screen confirming the association of 48 alternative splicing events in 45 genes with ovarian cancer tissues. The other 56 events identified in the discovery phase were found to be associated only with a subclass of cancer tissues and thus may represent either differences between individuals or cancer subtypes. Table 5 gives the gene name and the percent of cancer tissue samples lying outside the range of the normal samples (defined as the percent of identification in Table 5). For example, 50% means half of the cancer tissue samples did not overlap with any of the normal tissue samples. The second column gives the p-values that characterize statistical separation between the cancer and normal populations.
This indicates that of the 600 genes associated with ovarian cancer, breast cancer and/or DNA damage/repair tested, 45 (7.5%) harboured at least one ovarian cancer-specific splicing event.
Following the discovery screen, candidate reactions covering AS events were selected for the validation screen using the Ψ values for the 4 pools. Reactions showing a difference of at least 10 percentage points between the mean Ψs for normal and tumour pools and a maximum standard deviation of the Ψs for each tissue type not exceeding 26% were selected. Following the validation screen, Ψ values were used in a t-test for significant differences between normal and tumour tissue samples. Reaction sets using Bonferroni corrected p-values<0.0002 were selected (see Table 5).
Graphical displays were generated with Perl-based analysis modules. The modules analyze the transcript map and capillary electrophoresis data obtained for each experiment data, and apply RNA source based unsupervised clustering of the results prior to generating the displays.
The entire discovery screen annotation dataset was queried to identify unassigned amplicons which were present in more than one pool sample, and which satisfy one of the following conditions: i) amplicon detected in normal pools only, ii) amplicon detected in tumour pools only, iii) amplicon detected in normal and tumour pools, but at least double the concentration one pool type relative to the other. Candidate amplicons identified by this in silico database query were purified by agarose gel electrophoresis and sequenced.
LISA-based analysis of alternative splicing automatically detects all types of alternative splicing with the exception of alternative intron inclusion, which requires special attention (
When unpredicted amplification products were obtained, they were automatically classified as “unassigned” and stored for subsequent analysis. Sequencing of eight such products appearing in RNA samples from at least two different ovarian tissues revealed that only one represented a truly novel splicing event. Others were derived from the amplification of unrelated sequence or the amplification of unspliced DNA fragments. As shown in
A new splice site was found in the middle of a previously unspliced exon leading to the generation of two alternative splicing isoforms.
To evaluate the potential of the alternative splicing events identified by LISA as diagnostic markers for ovarian cancer, their individual and collective capacity to accurately differentiate between normal and cancer tissues was evaluated. The variance in the splicing pattern of each of the identified 48 ovarian cancer associated splicing events (Table 1) was calculated as percent splicing index (PSI, ψ) and was used to classify the 46 individual and 4 pools of normal and tumour tissues based on splicing similarity (
Similarly to the methodology that is described hereinabove, the LISA based screening was used to identify a signature of diagnostic markers for breast cancer. The approach used differed in that only putative alternative splicing events, as opposed to all exon-exon junctions, were targeted by PCR primer pairs. This reduced the average number of PCR reactions per gene from 54 used for the ovarian tissue screen to 5 for the breast tissue screen of 600 genes.
Consequently, the LISA methodology was efficient to identify specific signatures for two unrelated types of cancer (ovarian cancer and breast cancer).
While the invention has been described in connection with specific embodiments thereof, it will be understood that it is capable of further modifications and this application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains and as may be applied to the essential features hereinbefore set forth, and as follows in the scope of the appended claims.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/CA08/01477 | 8/15/2008 | WO | 00 | 2/12/2010 |
Number | Date | Country | |
---|---|---|---|
60935489 | Aug 2007 | US | |
60988213 | Nov 2007 | US |