BREAST CANCER SPLICE VARIANTS

Information

  • Patent Application
  • 20230374608
  • Publication Number
    20230374608
  • Date Filed
    April 18, 2023
    a year ago
  • Date Published
    November 23, 2023
    a year ago
Abstract
Provided herein, in some embodiments, are methods, compositions, and systems for identifying alternatively spliced tumor-specific exon inclusion and exclusion events that can be used for survival prognosis.
Description
REFERENCE TO AN ELECTRONIC SEQUENCE LISTING

The contents of the electronic sequence listing (J022770014US03-SEQ-HJD.xml; Size: 235,625 bytes; and Date of Creation: Apr. 18, 2023) is herein incorporated by reference in its entirety.


BACKGROUND

Breast cancer survival rates indicate what portion of people with the same type and stage of breast cancer are still alive a certain amount of time (e.g., 5 years) after they are diagnosed. The extensive heterogeneity of breast cancer, however, complicates a precise assessment of prognosis, making therapeutic decisions difficult and treatments inappropriate in some cases.


SUMMARY

Provided herein, in some aspects, is a molecular profiling platform that may be used, for example, to identify exon splicing events (e.g., exon inclusion or exon exclusion) that are specific to breast cancer and can be used for survival prognosis. Alternative splicing is a biological phenomenon that increases protein diversity. In one type of alternative splicing, referred to as “exon skipping,” exons are either spliced out of the transcript based on cellular conditions or are not spliced out but instead remain in the transcript and are “skipped” over. Exon skipping events are regulated by RNA-binding proteins (RPBs) and the spliceosome complex. A common metric for evaluating the extent of exon skipping is percent spliced in (PSI or Ψ), which represents the percentage of transcripts that include a specific exon or splice site.


Prior approaches for analyzing cancer tissue samples separately analyzed a group of normal samples (non-cancerous samples) and a group of cancer samples (samples known to be cancerous) to generate two distributions. Data in the non-overlapping parts of the two distributions would be analyzed to assess the differences between the two groups of samples. Due to the heterogeneity of the biological data, where alternative splicing can occur for reasons other than having cancer (e.g., exon skipping can occur naturally for non-cancerous (normal) healthy patients), the conventional “two-distribution” approach is not well suited to identifying exon skipping events that are predictive of cancer.


The present disclosure provides, in some aspects, methods that combine the analysis (e.g., PSI values) determined for normal and cancer tissue samples and analyze the combined input using a probabilistic model (GMM) to identify subpopulations (clusters) within the overall population that can be further analyzed to assess whether they are cancer-specific. Some of the data described herein is based on an analysis of ˜9300 normal and tumor samples from The Cancer Genome Atlas (TCGA), which identified ˜67,000 exon skipping events. From this data, a subset of exon splicing events (e.g., exon inclusion or exon exclusion) specific to breast cancer was identified.


In some aspects, the present disclosure provides a method comprising assaying nucleic acids of a sample for the presence or absence of a target exon comprising a nucleotide sequence of any one of SEQ ID NOS: 22-24, 26-36, 38-40, 73-75, 77-79, 82-100, 102-104. In some embodiments, the target exon comprises a nucleotide sequence of any one of SEQ ID NOS: 27, 98, 102, or 104.


In other aspects, the present disclosure provides a method comprising assaying nucleic acids of a sample for the presence or absence of at least 2 target exons, wherein each target exon comprises a nucleotide sequence of any one of SEQ ID NOS: 23, 27, 35, 85, 88, 89, 98, 101, 102, or 104. In some embodiments, each target exon comprises a nucleotide sequence of any one of SEQ ID NOS: 27, 98, 101, 102, or 104.


In yet other aspects, the present disclosure provides a method comprising assaying nucleic acids of a sample for the presence or absence of at least 3 target exons, wherein each target exon comprises a nucleotide sequence of any one of SEQ ID NOS: 21, 23, 27, 30, 31, 32, 35, 36, 39, 85, 87-89, 91, 94, 98, or 101-104.


In still further aspects, the present disclosure provide a method comprising assaying nucleic acids of a sample for the presence or absence of at least 8 different target exons, wherein each target exon comprises a nucleotide sequence of any one of SEQ ID NOs: 21-40 or 73-104.


In some embodiments, the sample is a breast tissue sample. For example, the sample may be obtained from a subject suspect of having, at risk of, or diagnosed with breast cancer. In some embodiments, the subject is a female subject.


In some embodiments, the nucleic acids comprise messenger ribonucleic acid (mRNA), or complementary deoxyribonucleic acid (cDNA) synthesized from mRNA obtained from the sample.


In some embodiments, the methods further comprise detecting the presence of a target exon comprising a nucleotide sequence of any one of SEQ ID NOs: 24, 28, 31, 33, and/or 38 or the absence of a target exon comprising a nucleotide sequence of any one of SEQ ID NOs: 82, 87 and/or 91, and assigning a favorable survival prognosis to the sample. In some embodiments, the methods further comprise detecting the presence of a target exon comprising a nucleotide sequence of any one of SEQ ID NOs: 21-23, 25-27, 29, 30, 32, and/or 34-40 or the absence of a target exon comprising a nucleotide sequence of any one of SEQ ID NOs: 73-81, 83-86, 88-90, and/or 92-104, and assigning an unfavorable survival prognosis to the sample.


Also provided herein are complementary deoxyribonucleic acids (cDNAs) comprising a nucleotide sequence of any one of SEQ ID NOs: 1-20 or 105-136. In some embodiments, the cDNAs comprise a nucleotide sequence of any one of SEQ ID NOs: 22-24, 27-34, 36, 38, or 40. Compositions comprising the cDNAs are also contemplated herein. In some embodiments, the compositions further comprise a probe or pair of primers that binds the cDNA. Some compositions of the present disclosure comprise (a) a messenger ribonucleic acid (mRNA) comprising a nucleotide sequence of any one of SEQ ID NOs: 1-20 or 105-136 and (b) a probe or a pair of primers that binds a nucleotide sequence of any one of SEQ ID NOs: 1-20 or 105-136. In some embodiments, the probe or primer comprises a detectable label.


Further provided herein are kits comprising a molecule that can detect the presence or absence of a target exon comprising a nucleotide sequence of any one of SEQ ID NOS: 22-24, 26-36, 38-40, 73-75, 77-79, 82-100, 102-104, and a detection reagent selected from buffers, salts, polymerases, and deoxyribonucleotide triphosphates (dNTPs). In some embodiments, the molecule comprise a probe or primer that bind a nucleic acid comprising a nucleotide sequence of any one of SEQ ID NOS: 22-24, 26-36, 38-40, 73-75, 77-79, 82-100, 102-104.


Also provided herein are kits comprising: (a) molecules that can detect the presence or absence of at least 2 target exons, wherein each target exon comprises a nucleotide sequence of any one of SEQ ID NOS: 23, 27, 35, 85, 88, 89, 98, 101, 102, or 104, (b) molecules that can detect the presence or absence of at least 3 target exons, wherein each target exon comprises a nucleotide sequence of any one of SEQ ID NOS: 21, 23, 27, 30, 31, 32, 35, 36, 39, 85, 87-89, 91, 94, 98, or 101-104, or (c) molecules that can detect the presence or absence of at least 8 different target exons, wherein each target exon comprises a nucleotide sequence of any one of SEQ ID NOs: 21-40 or 73-104, and a detection reagent selected from buffers, salts, polymerases, and deoxyribonucleotide triphosphates (dNTPs). In some embodiments, at least one of the probes and/or primers comprises a detectable label.


BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1A: Alternative splicing leads to target exon inclusion or exon exclusion in cancer patients when compared to normal tissues. FIG. 1B: Frequency of exon splicing events (e.g., exon inclusion and exon exclusion) in TCGA patients. In total, 20 exon inclusion events and 32 exon exclusion events that are breast cancer specific and associated to survival were detected using the novel Gaussian mixture modeling (GMM) clustering approach. The table indicates the presence or absence of the 52 exon splicing events (rows) across 824 breast cancer patients in TCGA (columns). Exon splicing events are ordered by frequency. Unfavorable and favorable prognosis are shown, respectively.



FIG. 2A: Frequency (%) of detection for the list of 52-exon splicing events in the TCGA cohort with survival information (n=824, above). FIG. 2B: Type of exon splicing biomarker detected in patients using the 52-exon splicing biomarker panel.



FIG. 3A: GMM analysis of mixed normal and breast cancer samples for the splicing event 1446 (CCDC115 gene). The GMM analysis showed 4 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 3B: Frequency (%) of tumor and normal samples across the 4 clusters identified for the splicing event 1446 (CCDC115 gene). Cluster 4 is composed mostly of breast cancer samples. FIG. 3C: Exon levels (PSI) for tumor specific cluster C4 and normal tissues in TCGA. This analysis indicates that the target exon (also referred to herein as an “alternative exon”) is expressed in 97 breast cancer patients in cluster C4, while very low or absent in normal tissues. FIG. 3D: Survival analysis of breast cancer patients in cluster C4 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C4 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 4A: GMM analysis of mixed normal and breast cancer samples for the splicing event 13343 (ENAH gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon PSI (Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 4B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 13343 (ENAH gene). Cluster 3 is composed mostly of breast cancer samples. FIG. 4C: Exon splicing levels (PSI) for tumor specific cluster C3 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 41 breast cancer patients in cluster C3, while very low or absent in normal tissues. FIG. 4D: Survival analysis of breast cancer patients in cluster C3 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C3 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 5A: GMM analysis of mixed normal and breast cancer samples for the splicing event 15088 (POLI gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon PSI (Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 5B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 15088 (POLI gene). Cluster 3 is composed mostly of breast cancer samples. FIG. 5C: Exon splicing levels (PSI) for tumor specific cluster C3 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 100 breast cancer patients in cluster C3, while very low or absent in normal tissues. FIG. 5D: Survival analysis of breast cancer patients in cluster C3 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C3 (expressing the exon) have a worse overall survival (shorter survival time, days).



FIG. 6A: GMM analysis of mixed normal and breast cancer samples for the splicing event 16864 (PLXNB1 gene). The GMM analysis showed 4 distinct clusters (subpopulations). The x-axis indicates the exon PSI (Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 6B: Frequency (%) of tumor and normal samples across the 4 clusters identified for the splicing event 16864 (PLXNB1 gene). Cluster 4 is composed mostly of breast cancer samples. FIG. 6C: Exon splicing levels (PSI) for tumor specific cluster C4 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 74 breast cancer patients in cluster C4, while very low or absent in normal tissues. FIG. 6D: Survival analysis of breast cancer patients in cluster C4 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C4 (expressing the target exon) have a better overall survival (longer survival time, days).



FIG. 7A: GMM analysis of mixed normal and breast cancer samples for the splicing event 21181 (SH3GLB1 gene). The GMM analysis showed 4 distinct clusters (subpopulations). The x-axis indicates the exon PSI (Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 7B: Frequency (%) of tumor and normal samples across the 4 clusters identified for the splicing event 21181 (SH3GLB1 gene). Cluster 4 is composed mostly of breast cancer samples. FIG. 7C: Exon splicing levels (PSI) for tumor specific cluster C4 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 57 breast cancer patients in cluster C4, while very low or absent in normal tissues. FIG. 7D: Survival analysis of breast cancer patients in cluster C4 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C4 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 8A: GMM analysis of mixed normal and breast cancer samples for the splicing event 34793 (TCF25 gene). The GMM analysis showed 4 distinct clusters (subpopulations). The x-axis indicates the exon PSI (Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 8B: Frequency (%) of tumor and normal samples across the 4 clusters identified for the splicing event 34793 (TCF25 gene). Cluster 4 is composed mostly of breast cancer samples. FIG. 8C: Exon splicing levels (PSI) for tumor specific cluster C4 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 32 breast cancer patients in cluster C4, while very low or absent in normal tissues. FIG. 8D: Survival analysis of breast cancer patients in cluster C4 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C4 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 9A: GMM analysis of mixed normal and breast cancer samples for the splicing event 42420 (PRRS-ARHGAP8 gene). The GMM analysis showed 4 distinct clusters (subpopulations). The x-axis indicates the exon PSI (w) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 9B: Frequency (%) of tumor and normal samples across the 4 clusters identified for the splicing event 42420 (PRRS-ARHGAP8 gene). Cluster 3 is composed mostly of breast cancer samples. FIG. 9C: Exon splicing levels (PSI) for tumor specific cluster C3 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 265 breast cancer patients in cluster C3, while very low or absent in normal tissues. FIG. 9D: Survival analysis of breast cancer patients in cluster C3 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C3 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 10A: GMM analysis of mixed normal and breast cancer samples for the splicing event 4322 (WDR45B gene). The GMM analysis showed 4 distinct clusters (subpopulations). The x-axis indicates the exon PSI (w) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 10B: Frequency (%) of tumor and normal samples across the 4 clusters identified for the splicing event 4322 (WDR45B gene). Cluster 4 is composed mostly of breast cancer samples. FIG. 10C: Exon splicing levels (PSI) for tumor specific cluster C4 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 39 breast cancer patients in cluster C4, while very low or absent in normal tissues. FIG. 10D: Survival analysis of breast cancer patients in cluster C4 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C4 (expressing the target exon) have a better overall survival (longer survival time, days).



FIG. 11A: GMM analysis of mixed normal and breast cancer samples for the splicing event 44438 (VPS29 gene). The GMM analysis showed 4 distinct clusters (subpopulations). The x-axis indicates the exon PSI (Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 11B: Frequency (%) of tumor and normal samples across the 4 clusters identified for the splicing event 44438 (VPS29 gene). Cluster 4 is composed mostly of breast cancer samples. FIG. 11C: Exon splicing levels (PSI) for tumor specific cluster C4 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 54 breast cancer patients in cluster C4, while very low or absent in normal tissues. FIG. 11D: Survival analysis of breast cancer patients in cluster C4 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C4 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 12A: GMM analysis of mixed normal and breast cancer samples for the splicing event 48175 (E4F1 gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon PSI (Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 12B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 48175 (E4F1 gene). Cluster 3 is composed mostly of breast cancer samples. FIG. 12C: Exon splicing levels (PSI) for tumor specific cluster C3 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 60 breast cancer patients in cluster C3, while very low or absent in normal tissues. FIG. 12D: Survival analysis of breast cancer patients in cluster C3 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C3 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 13A: GMM analysis of mixed normal and breast cancer samples for the splicing event 49765 (TEN1-CDK3 gene). The GMM analysis showed 4 distinct clusters (subpopulations). The x-axis indicates the exon PSI (Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 13B: Frequency (%) of tumor and normal samples across the 4 clusters identified for the splicing event 49765 (TEN1-CDK3 gene). Cluster 4 is composed mostly of breast cancer samples. FIG. 13C: Exon splicing levels (PSI) for tumor specific cluster C4 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 58 breast cancer patients in cluster C4, while very low or absent in normal tissues. FIG. 13D: Survival analysis of breast cancer patients in cluster C4 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C4 (expressing the target exon) have a better overall survival (longer survival time, days).



FIG. 14A: GMM analysis of mixed normal and breast cancer samples for the splicing event 5134 (PLEKHA6 gene). The GMM analysis showed 4 distinct clusters (subpopulations). The x-axis indicates the exon PSI (Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 14B: Frequency (%) of tumor and normal samples across the 4 clusters identified for the splicing event 5134 (PLEKHA6 gene). Cluster 4 is composed mostly of breast cancer samples. FIG. 14C: Exon splicing levels (PSI) for tumor specific cluster C4 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 70 breast cancer patients in cluster C4, while very low or absent in normal tissues. FIG. 14D: Survival analysis of breast cancer patients in cluster C4 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C4 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 15A: GMM analysis of mixed normal and breast cancer samples for the splicing event 56552 (GNAZ gene). The GMM analysis showed 4 distinct clusters (subpopulations). The x-axis indicates the exon PSI (Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 15B: Frequency (%) of tumor and normal samples across the 4 clusters identified for the splicing event 56552 (GNAZ gene). Cluster 4 is composed mostly of breast cancer samples. FIG. 15C: Exon splicing levels (PSI) for tumor specific cluster C4 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 33 breast cancer patients in cluster C4, while very low or absent in normal tissues. FIG. 15D: Survival analysis of breast cancer patients in cluster C4 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C4 (expressing the target exon) have a better overall survival (longer survival time, days).



FIG. 16A: GMM analysis of mixed normal and breast cancer samples for the splicing event 5696 (TTC3 gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon PSI (Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 16B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 5696 (TTC3 gene). Cluster 3 is composed mostly of breast cancer samples. FIG. 16C: Exon splicing levels (PSI) for tumor specific cluster C3 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 31 breast cancer patients in cluster C3, while very low or absent in normal tissues. FIG. 16D: Survival analysis of breast cancer patients in cluster C3 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C3 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 17A: GMM analysis of mixed normal and breast cancer samples for the splicing event 57139 (RNF8 gene). The GMM analysis showed 2 distinct clusters (subpopulations). The x-axis indicates the exon PSI (Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 17B: Frequency (%) of tumor and normal samples across the 2 clusters identified for the splicing event 57139 (RNF8 gene). Cluster 2 is composed mostly of breast cancer samples. FIG. 17C: Exon splicing levels (PSI) for tumor specific cluster C2 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 80 breast cancer patients in cluster C2, while very low or absent in normal tissues. FIG. 17D: Survival analysis of breast cancer patients in cluster C2 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C2 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 18A: GMM analysis of mixed normal and breast cancer samples for the splicing event 57874 (ZDHHC13 gene). The GMM analysis showed 2 distinct clusters (subpopulations). The x-axis indicates the exon PSI (Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 18B: Frequency (%) of tumor and normal samples across the 2 clusters identified for the splicing event 57874 (ZDHHC13 gene). Cluster 2 is composed mostly of breast cancer samples. FIG. 18C: Exon splicing levels (PSI) for tumor specific cluster C2 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 58 breast cancer patients in cluster C2, while very low or absent in normal tissues. FIG. 18D: Survival analysis of breast cancer patients in cluster C2 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C2 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 19A: GMM analysis of mixed normal and breast cancer samples for the splicing event 60615 (SH3GLB2 gene). The GMM analysis showed 2 distinct clusters (subpopulations). The x-axis indicates the exon PSI (Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 19B: Frequency (%) of tumor and normal samples across the 2 clusters identified for the splicing event 60615 (SH3GLB2 gene). Cluster 2 is composed mostly of breast cancer samples. FIG. 19C: Exon splicing levels (PSI) for tumor specific cluster C2 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 37 breast cancer patients in cluster C2, while very low or absent in normal tissues. FIG. 19D: Survival analysis of breast cancer patients in cluster C2 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C2 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 20A: GMM analysis of mixed normal and breast cancer samples for the splicing event 62560 (ITFG1 gene). The GMM analysis showed 4 distinct clusters (subpopulations). The x-axis indicates the exon PSI (Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicate the cluster assignment of each sample. FIG. 20B: Frequency (%) of tumor and normal samples across the 4 clusters identified for the splicing event 62560 (ITFG1 gene). Cluster 4 is composed mostly of breast cancer samples. FIG. 20C: Exon splicing levels (PSI) for tumor specific cluster C4 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 53 breast cancer patients in cluster C4, while very low or absent in normal tissues. FIG. 20D: Survival analysis of breast cancer patients in cluster C4 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C4 (expressing the target exon) have a better overall survival (longer survival time, days).



FIG. 21A: GMM analysis of mixed normal and breast cancer samples for the splicing event 6785 (SPATS2 gene). The GMM analysis showed 2 distinct clusters (subpopulations). The x-axis indicates the exon PSI (Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 21B: Frequency (%) of tumor and normal samples across the 2 clusters identified for the splicing event 6785 (SPATS2 gene). Cluster 2 is composed mostly of breast cancer samples. FIG. 21C: Exon splicing levels (PSI) for tumor specific cluster C2 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 77 breast cancer patients in cluster C2, while very low or absent in normal tissues. FIG. 21D: Survival analysis of breast cancer patients in cluster C2 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C2 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 22A: GMM analysis of mixed normal and breast cancer samples for the splicing event 8742 (DHRS11 gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon PSI (Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 22B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 8742 (DHRS11 gene). Cluster 3 is composed mostly of breast cancer samples. FIG. 22C: Exon splicing levels (PSI) for tumor specific cluster C3 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 44 breast cancer patients in cluster C3, while very low or absent in normal tissues. FIG. 22D: Survival analysis of breast cancer patients in cluster C3 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C3 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 23A: GMM analysis of mixed normal and breast cancer samples for the splicing event 1506 (CENPK gene). The GMM analysis showed 4 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 23B: Frequency (%) of tumor and normal samples across the 4 clusters identified for the splicing event 1506 (CENPK gene). Clusters 1-4 are composed mostly of breast cancer samples. FIG. 23C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 37 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 23D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 24A: GMM analysis of mixed normal and breast cancer samples for the splicing event 2098 (METTL5 gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 24B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 2098 (METTL5 gene). Clusters 1-3 are composed mostly of breast cancer samples. FIG. 24C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 38 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 24D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 25A: GMM analysis of mixed normal and breast cancer samples for the splicing event 2242 (PLA2R1 gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 25B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 2242 (PLA2R1 gene). Clusters 1-3 are composed mostly of breast cancer samples. FIG. 25C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 45 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 25D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 26A: GMM analysis of mixed normal and breast cancer samples for the splicing event 7106 (RHOH gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 26B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 7106 (RHOH gene). Clusters 1-3 are composed mostly of breast cancer samples. FIG. 26C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 48 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 26D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 27A: GMM analysis of mixed normal and breast cancer samples for the splicing event 7108 (RHOH gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 27B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 7108 (RHOH gene). Clusters 1-3 are composed mostly of breast cancer samples. FIG. 27C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 44 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 27D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 28A: GMM analysis of mixed normal and breast cancer samples for the splicing event 9442 (QPRT gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 28B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 9442 (QPRT gene). Clusters 1-2 are composed mostly of breast cancer samples. FIG. 28C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 40 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 28D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 29A: GMM analysis of mixed normal and breast cancer samples for the splicing event 10439 (IL17RB gene). The GMM analysis showed 2 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 29B: Frequency (%) of tumor and normal samples across the 2 clusters identified for the splicing event 10439 (IL17RB gene). Clusters 1-2 are composed mostly of breast cancer samples. FIG. 29C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 53 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 29D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 30A: GMM analysis of mixed normal and breast cancer samples for the splicing event 11685 (STAU1 gene). The GMM analysis showed 2 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 30B: Frequency (%) of tumor and normal samples across the 2 clusters identified for the splicing event 11685 (STAU1 gene). Clusters 1-2 are composed mostly of breast cancer samples. FIG. 30C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 37 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 30D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 31A: GMM analysis of mixed normal and breast cancer samples for the splicing event 13451 (LYRM1 gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 31B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 13451 (LYRM1 gene). Clusters 1-3 are composed mostly of breast cancer samples. FIG. 31C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 34 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 31D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 32A: GMM analysis of mixed normal and breast cancer samples for the splicing event 14574 (PPARG gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 32B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 14574 (PPARG gene). Clusters 1-3 are composed mostly of breast cancer samples. FIG. 32C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 33 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 32D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a better overall survival (longer survival time, days).



FIG. 33A: GMM analysis of mixed normal and breast cancer samples for the splicing event 16269 (BORCS8-MEF2B gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 33B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 16269 (BORCS8-MEF2B gene). Clusters 1-3 are composed mostly of breast cancer samples. FIG. 33C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 43 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 33D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 34A: GMM analysis of mixed normal and breast cancer samples for the splicing event 16833 (ENOSF1 gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 34B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 16833 (ENOSF1 gene). Clusters 1-3 are composed mostly of breast cancer samples. FIG. 34C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 46 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 34D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 35A: GMM analysis of mixed normal and breast cancer samples for the splicing event 16929 (DHRS4-AS1 gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 35B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 16929 (DHRS4-AS1 gene). Clusters 1-2 are composed mostly of breast cancer samples. FIG. 35C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 83 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 35D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 36A: GMM analysis of mixed normal and breast cancer samples for the splicing event 16943 (NDUFV2 gene). The GMM analysis showed 4 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 36B: Frequency (%) of tumor and normal samples across the 4 clusters identified for the splicing event 16943 (NDUFV2 gene). Clusters 1-4 are composed mostly of breast cancer samples. FIG. 36C: Exon splicing levels (PSI) for tumor specific cluster C3 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 58 breast cancer patients in cluster C3, while very low or absent in normal tissues except bladder. FIG. 36D: Survival analysis of breast cancer patients in cluster C3 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C3 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 37A: GMM analysis of mixed normal and breast cancer samples for the splicing event 18745 (FER1L4 gene). The GMM analysis showed 4 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 37B: Frequency (%) of tumor and normal samples across the 4 clusters identified for the splicing event 18745 (FER1L4 gene). Clusters 1-4 are composed mostly of breast cancer samples. FIG. 37C: Exon splicing levels (PSI) for tumor specific cluster C2 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 89 breast cancer patients in cluster C2, while very low or absent in normal tissues. FIG. 37D: Survival analysis of breast cancer patients in cluster C2 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C2 (expressing the target exon) have a better overall survival (longer survival time, days).



FIG. 38A: GMM analysis of mixed normal and breast cancer samples for the splicing event 19824 (PHF14 gene). The GMM analysis showed 2 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 38B: Frequency (%) of tumor and normal samples across the 2 clusters identified for the splicing event 19824 (PHF14 gene). Clusters 1-2 are composed mostly of breast cancer samples. FIG. 38C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 111 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 38D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 39A: GMM analysis of mixed normal and breast cancer samples for the splicing event 19828 (PHF14 gene). The GMM analysis showed 2 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 39B: Frequency (%) of tumor and normal samples across the 2 clusters identified for the splicing event 19828 (PHF14 gene). Clusters 1-2 are composed mostly of breast cancer samples. FIG. 39C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 111 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 39D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 40A: GMM analysis of mixed normal and breast cancer samples for the splicing event 21024 (BCL2L13 gene). The GMM analysis showed 4 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 40B: Frequency (%) of tumor and normal samples across the 4 clusters identified for the splicing event 21024 (BCL2L13 gene). Clusters 1-4 are composed mostly of breast cancer samples. FIG. 40C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 35 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 40D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 41A: GMM analysis of mixed normal and breast cancer samples for the splicing event 22227 (SELENBP1 gene). The GMM analysis showed 2 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 41B: Frequency (%) of tumor and normal samples across the 2 clusters identified for the splicing event 22227 (SELENBP1 gene). Clusters 1-2 are composed mostly of breast cancer samples. FIG. 41C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 86 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 41D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a better overall survival (longer survival time, days).



FIG. 42A: GMM analysis of mixed normal and breast cancer samples for the splicing event 24742 (LINC00630 gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 42B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 24742 (LINC00630 gene). Clusters 1-3 are composed mostly of breast cancer samples. FIG. 42C: Exon splicing levels (PSI) for tumor specific cluster C2 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 38 breast cancer patients in cluster C2, while very low or absent in normal tissues except uterus. FIG. 42D: Survival analysis of breast cancer patients in cluster C2 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C2 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 43A: GMM analysis of mixed normal and breast cancer samples for the splicing event 27194 (CTBP2 gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 43B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 27194 (CTBP2 gene). Clusters 1-3 are composed mostly of breast cancer samples. FIG. 43C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 33 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 43D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 44A: GMM analysis of mixed normal and breast cancer samples for the splicing event 30244 (SLC52A2 gene). The GMM analysis showed 4 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 44B: Frequency (%) of tumor and normal samples across the 4 clusters identified for the splicing event 30244 (SLC52A2 gene). Clusters 1-3 are composed mostly of breast cancer samples. FIG. 44C: Exon splicing levels (PSI) for tumor specific cluster C3 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 310 breast cancer patients in cluster C3, while very low or absent in normal tissues. FIG. 44D: Survival analysis of breast cancer patients in cluster C3 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C3 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 45A: GMM analysis of mixed normal and breast cancer samples for the splicing event 33377 (SLC38A1 gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 45B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 33377 (SLC38A1 gene). Clusters 1-3 are composed mostly of breast cancer samples. FIG. 45C: Exon splicing levels (PSI) for tumor specific cluster C2 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 52 breast cancer patients in cluster C2, while very low or absent in normal tissues except stomach. FIG. 45D: Survival analysis of breast cancer patients in cluster C2 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C2 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 46A: GMM analysis of mixed normal and breast cancer samples for the splicing event 40521 (FAM65A gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 46B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 40521 (FAM65A gene). Clusters 1-3 are composed mostly of breast cancer samples. FIG. 46C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 32 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 46D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 47A: GMM analysis of mixed normal and breast cancer samples for the splicing event 41168 (USP25 gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 47B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 41168 (USP25 gene). Clusters 1-3 are composed mostly of breast cancer samples. FIG. 47C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 31 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 47D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 48A: GMM analysis of mixed normal and breast cancer samples for the splicing event 45885 (HMOX2 gene). The GMM analysis showed 4 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 48B: Frequency (%) of tumor and normal samples across the 4 clusters identified for the splicing event 45885 (HMOX2 gene). Clusters 1-3 are composed mostly of breast cancer samples. FIG. 48C: Exon splicing levels (PSI) for tumor specific cluster C2 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 151 breast cancer patients in cluster C2, while very low or absent in normal tissues. FIG. 48D: Survival analysis of breast cancer patients in cluster C2 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C2 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 49A: GMM analysis of mixed normal and breast cancer samples for the splicing event 50148 (MKRN2OS gene). The GMM analysis showed 4 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 49B: Frequency (%) of tumor and normal samples across the 4 clusters identified for the splicing event 50148 (MKRN2OS gene). Clusters 1-4 are composed mostly of breast cancer samples. FIG. 49C: Exon splicing levels (PSI) for tumor specific cluster C2 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 40 breast cancer patients in cluster C2, while very low or absent in normal tissues. FIG. 49D: Survival analysis of breast cancer patients in cluster C2 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C2 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 50A: GMM analysis of mixed normal and breast cancer samples for the splicing event 52249 (ATP8A2P1 gene). The GMM analysis showed 2 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 50B: Frequency (%) of tumor and normal samples across the 2 clusters identified for the splicing event 52249 (ATP8A2P1 gene). Clusters 1-2 are composed mostly of breast cancer samples. FIG. 50C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 33 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 50D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 51A: GMM analysis of mixed normal and breast cancer samples for the splicing event 53188 (HIBCH gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 51B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 53188 (HIBCH gene). Clusters 1-3 are composed mostly of breast cancer samples. FIG. 51C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 129 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 51D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 52A: GMM analysis of mixed normal and breast cancer samples for the splicing event 58853 (SLC35C2 gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 52B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 58853 (SLC35C2 gene). Clusters 1-3 are composed mostly of breast cancer samples. FIG. 52C: Exon splicing levels (PSI) for tumor specific cluster C1 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 40 breast cancer patients in cluster C1, while very low or absent in normal tissues. FIG. 52D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 53A: GMM analysis of mixed normal and breast cancer samples for the splicing event 59314 (TRIMS gene). The GMM analysis showed 3 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 53B: Frequency (%) of tumor and normal samples across the 3 clusters identified for the splicing event 59314 (TRIMS gene). Clusters 1-3 are composed mostly of breast cancer samples. FIG. 53C: Exon splicing levels (PSI) for tumor specific cluster C2 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 61 breast cancer patients in cluster C2, while very low or absent in normal tissues. FIG. 53D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).



FIG. 54A: GMM analysis of mixed normal and breast cancer samples for the splicing event 60239 (HSD17B6 gene). The GMM analysis showed 4 distinct clusters (subpopulations). The x-axis indicates the exon percent spliced in (PSI, Ψ) level within samples, and y-axis denotes the number of samples in a normalized density scale. Shading indicates the cluster assignment of each sample. FIG. 54B: Frequency (%) of tumor and normal samples across the 4 clusters identified for the splicing event 60239 (HSD17B6 gene). Clusters 1-4 are composed mostly of breast cancer samples. FIG. 54C: Exon splicing levels (PSI) for tumor specific clusters C2 and C3 and normal tissues in TCGA. This analysis indicates that the target exon is expressed in 130 breast cancer patients in cluster C2 and 214 breast cancer patients in cluster C3 while being very low or absent in normal tissues except breast. FIG. 54D: Survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA. This analysis indicates that patients in C1 (expressing the target exon) have a worse overall survival (shorter survival time, days).







DETAILED DESCRIPTION

Alternative splicing is a key mechanism of biological diversity in eukaryotes because it allows multiple mRNA isoforms to be transcribed and translated from a single gene. The human genome includes more than 20,000 genes; however, more than 95% of multi-exonic pre-mRNAs are alternatively spliced to generate nearly 200,000 isoforms. The alternative splicing isoforms translated into proteins can have distinct or even opposing functions. Alternative splicing is involved in a wide range of biological processes, including immune cell maturation and processing.


Studies examining the cancer transcriptome have enabled unprecedented insight into cancer cell heterogeneity and generated novel classifications. This progress has not yet fully translated into clinical benefit. Isoforms as well as alterations in alternative splicing are associated with numerous diseases and can contribute to cancer malignancy by regulating the expression of oncogenes and tumor suppressors. Aberrant alternative splicing profiles can arise in cancer due to mutations at the splice sites or splicing-regulatory elements, but can also reflect changes in splicing regulators. Recurrent mutations in core splicing machinery are found in myeloid leukemia, as well as in sporadic mutations in lung and breast cancer, suggesting that alternative alterations play a key role in tumorigenesis. Alterations in alternative splicing result in the generation of a repertoire of novel isoforms in tumors that, together with fusion molecules, can be viewed as another class of neoantigens.


Provided herein, in some aspects, are methods that comprise assaying a sample for a particular cancer isoform including or excluding a particular exon. In some embodiments, a sample is assayed for multiple exon inclusion or exon exclusion isoforms as provided herein. The data provided by the present disclosure demonstrates that at least one of fifty-two different exon inclusion or exon exclusion isoforms can be detected in ˜91% of all breast cancer samples tested.


Methods of Detection

Some aspects of the present disclosure comprise assaying a sample for (the presence or absence of) a nucleic acid (e.g., an exon inclusion event or an exon exclusion event) comprising a nucleotide sequence (e.g., an exon) of any one of SEQ ID NOS: 21-40 and 105-136. It should be understood that the phrase “assaying a sample for a nucleic acid comprising a nucleotide sequence of SEQ ID NO: X” encompasses assaying a sample for the presence or absence of a nucleic acid that includes the full length nucleotide sequence identified by SEQ ID NO: X (all nucleotides of SEQ ID NO: X); and the phrase also includes assaying a sample for the presence or absence of a nucleic acid that includes a fragment of the nucleotide sequence identified by SEQ ID NO: X. The length of the fragment is not limited and may be, for example, at least 50, at least 60, at least 70, at least 80, at least 90, or at least 100 nucleotides.


In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 21. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 22. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 23. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 24. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 25. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 26. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 27. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 28. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 29. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 30. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 31. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 32. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 33. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 34. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 35. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 36. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 37. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 38. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 39. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 40. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 105. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 106. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 107. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 108. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 109. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 110. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 111. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 112. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 113. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 114. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 115. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 116. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 117. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 118. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 119. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 120. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 121. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 122. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 123. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 124. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 125. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 126. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 127. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 128. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 129. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 130. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 131. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 132. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 133. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 134. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 135. In some embodiments, the methods comprise assaying a sample for a nucleic acid comprising the nucleotide sequence of SEQ ID NO: 136.


In some embodiments, methods of the present disclosure comprise assaying a sample for a (at least one) nucleic acid comprising a nucleotide sequence of any one of SEQ ID NOS: 22-24, 27-34, 36, 38, or 40. In some embodiments, the methods further comprise assaying the sample for a nucleic acid comprising a nucleotide sequence of any one of SEQ ID NOS: 21, 25, 26, 35, 37, or 39.


In some embodiments, methods of the present disclosure comprise assaying the sample for a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 21, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 22, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 23, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 24, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 25, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 26, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 27, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 28, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 29, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 30, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 31, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 32, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 33, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 34, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 35, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 36, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 37, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 38, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 39, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 40, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 105, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 106, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 107, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 108, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 109, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 110, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 111, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 112, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 113, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 114, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 115, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 116, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 117, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 118, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 119, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 120, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 121, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 122, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 123, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 124, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 125, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 126, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 127, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 128, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 129, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 130, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 131, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 132, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 133, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 134, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 135, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 136.


In some embodiments, the methods of the present disclosure comprise assaying the sample for 2 (or at least 2) of the 52 exons (selected from exons comprising a nucleotide sequence of any one of SEQ ID NOS: 21-40 and 105-136). In some embodiments, the methods of the present disclosure comprise assaying the sample for 3 (or at least 3) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 4 (or at least 4) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 5 (or at least 5) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 6 (or at least 7) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 7 (or at least 7) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 8 (or at least 8) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 9 (or at least 9) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 10 (or at least 10) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 11 (or at least 11) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 12 (or at least 12) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 13 (or at least 13) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 14 (or at least 14) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 15 (or at least 15) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 16 (or at least 16) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 17 (or at least 17) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 18 (or at least 18) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 19 (or at least 19) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 20 (or at least 20) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 21 (or at least 21) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 22 (or at least 22) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 23 (or at least 23) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 24 (or at least 24) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 25 (or at least 25) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 26 (or at least 26) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 27 (or at least 27) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 28 (or at least 28) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 29 (or at least 29) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for (or at least 30) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 31 (or at least 31) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 32 (or at least 32) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 33 (or at least 33) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 34 (or at least 34) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 35 (or at least 35) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 36 (or at least 36) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 37 (or at least 37) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 38 (or at least 38) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 39 (or at least 39) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 40 (or at least 40) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 41 (or at least 41) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 42 (or at least 42) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 43 (or at least 43) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 44 (or at least 44) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 45 (or at least 45) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 46 (or at least 46) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 47 (or at least 47) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 48 (or at least 48) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 49 (or at least 49) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 50 (or at least 50) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 51 (or at least 51) of the 52 exons. In some embodiments, the methods of the present disclosure comprise assaying the sample for 52 exons.


It should be understood that a method “comprising assaying the sample for fifty-two (52) exon splicing isoforms (e.g., exon inclusion or exon exclusion, each comprising a different nucleotide sequence of SEQ ID NOS: 21-40 and 105-136” is a method that comprises assaying for all 52 isoforms provided in Table 1,Table 2 and Table 3.


Not every sample will have more than one exon splicing isoform (e.g., exon inclusion or exon exclusion) of the present disclosure. In many embodiments, only one of the exon splicing isoforms of the present disclosure will be detected in a sample. Nonetheless, a sample may be assayed for one or more (e.g., 1 to 52) of the 52 exon splicing isoforms. For example, a single sample may include only the exon splicing isoform comprising the sequence of SEQ ID NO:1 or SEQ ID NO: 21. All 52 or a subset of the 52 (less than 52) of the exon splicing isoforms of Table 1, Table 2, and Table 3 may be assayed in order to detect that exon splicing isoform comprising the sequence of SEQ ID NO:1 or SEQ ID NO: 21


It should also be understood that the step of “assaying for an exon splicing isoform(s) (e.g., exon inclusion or exon exclusion)” or “assaying for a nucleic acid” encompasses assaying for mRNA comprising the exon splicing isoform(s) or assaying for complementary DNA (cDNA) comprising the exon splicing isoform(s) (e.g., comprising the sequence of any one of SEQ ID NOS: 21-40 and 105-136). As is known in the art, cDNA is synthesized from mRNA.


Examples of Nucleic Acid Detection Assays

There are many different known methods for assaying a sample for the presence or absence of a particular nucleotide sequence, any of which may be used in accordance with the present disclosure. For example, standard polymerase chain reaction (PCR) methods (e.g., reverse transcription PCR (RT-PCR)) may be performed using mRNA obtained from a sample. In RT-PCR, the RNA template is first converted into a complementary DNA (cDNA) using a reverse transcriptase. The cDNA is then used as a template for exponential amplification using PCR. Thus, kits provided herein may include any one or more reagents used in a PCR such as, for example, primers or probes that bind to a particular nucleic acid comprising an exon splicing event (e.g., exon inclusion or exon exclusion), polymerases, buffers, deoxyribonucleotide triphosphates (dNTPs), and salts.


In some embodiments, an Archer® FusionPlex® assay is used to assay for a nucleotide sequence (e.g., exon). This assay may include using custom designed probes with and an Anchored Multiplexed PCR (AMP™) following by next generation sequencing (NGS) (e.g., with an Illumina® platform). Thus, kits provided herein may include any one or more reagents used in a Archer® FusionPlex® assay.


In other embodiments, targeted sequencing using long-read sequencing technology (e.g., PacBio®, built on Single Molecule, Real-Time (SMRT) Sequencing technology,) is used to assay for a nucleotide sequence (e.g., exon). Thus, kits provided herein may include any one or more reagents used in a long-read sequencing technology.


In other embodiments, Droplet Digital™ PCR (ddPCR™) (BioRad®) is used to assay for a nucleotide sequence (e.g., exon). For example, combinations of primers and probes may be designed to detect selected exon splicing isoforms in single cell suspension or in cells isolated from frozen tumor tissues, e.g., using Laser Capture Microdissection. More than one isoform may be detected in the single cell, for example. Thus, kits provided herein may include any one or more reagents used in a Droplet Digital™ PCR (ddPCR™) assay.


In yet other embodiments, ViewRNA™ In Situ Hybridization (ISH) (Thermo Fisher Scientific) may be used to assay for a nucleotide sequence (e.g., exon). For example, splice junction probes may be designed to enable specific detection of the exon splicing isoforms of the present disclosure in tissue sections (e.g., breast cancer tissue sections) through Fluorescent In Situ Hybridization (FISH). More than one isoform may be detected in the same cell, for example. Thus, kits provided herein may include any one or more reagents used in an ISH assay.


In still other embodiments, nCounter® technology (nanoString™) is used to assay for a nucleotide sequence (e.g., exon). For example, the nCounter® Analysis System utilizes a novel digital barcode technology for direct multiplexed measurement of analytes and offers high levels of precision and sensitivity (<1 copy per cell). The technology uses molecular “barcodes” and single molecule imaging for the direct hybridization and detection of hundreds of unique transcripts in a single reaction. Each color-coded barcode is attached to a single target-specific probe corresponding to an analyte (e.g., exon) of interest. Combined together with invariant controls, the probes form a multiplexed CodeSet. Thus, kits provided herein may include any one or more reagents used in a nCounter® assay or other nanoString™ nucleic acid detection assay.


Other nucleic acid detection methods may be used.


Probes

Some aspects of the present disclosure comprise assaying a sample for the presence or absence of a nucleic acid (e.g., an exon inclusion event) comprising a nucleotide sequence of any one of SEQ ID NOS: 1-20, each of which include an exon inclusion event as well as a sequence directly upstream from and a sequence directly downstream from the exon inclusion event (any one of SEQ ID NOS: 21-40). Some aspects of the present disclosure comprise assaying a sample for the presence or absence of a nucleic acid (e.g., an exon exclusion event) comprising a nucleotide sequence of any one of SEQ ID NOS: 105-136, each of which include an exon exclusion event as well as a sequence directly upstream from and a sequence directly downstream from the exon exclusion event (any one of SEQ ID NOS: 41-72).


A probe is a synthetic (non-naturally-occurring) nucleic acid that is wholly or partially complementary to and thus binds to a nucleic acid of interest (e.g., a nucleic acid comprising or comprised within a nucleotide sequence of any one of SEQ ID NOS: 1-20,21-40, 41-72, or 105-136). In some embodiments, a probe comprises DNA. In some embodiments, a probe comprises RNA. In some embodiments, a probe comprise DNA and RNA. It should be understood that the term “probe” encompasses “primer,” which, as is known in the art, is a synthetic nucleic acid (e.g., DNA) used as a starting point for nucleic acid (e.g., DNA) synthesis. The length of a probe may vary, depending on the nucleic acid detection assay being used. For example, a probe may have a length of at least 15, at least 18, at least 20, at least 25, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, or at least 100 nucleotides. In some embodiments, a probe has a length of 15 to 30 nucleotides, 15 to 50 nucleotides, or 15 to 100 nucleotides. Depending on the application, a probe may be longer than 100 nucleotides.


In some embodiments, one or more probe is designed to bind directly to an exon (e.g., exon inclusion event or exon exclusion event) of any one of SEQ ID NOS: 21-40 and 105-136. The probe may bind, for example, to a 5′ region, a central region, or a 3′ region of an exon.


In some embodiments, one or more probe is designed to bind to a nucleotide sequence directly upstream (5′) from an exon of any one of SEQ ID NOS: 21-40 and 105-136. In other embodiments, one or more probe is designed to bind to nucleotide sequence directly downstream (3′) from an exon of any one of SEQ ID NOS: 21-40 and 105-136. In some embodiments, a first probe (e.g., primer) of a pair of probes is designed to bind to nucleotide sequence directly upstream (5′) from an exon of any one of SEQ ID NOS: 21-40 and 105-136, and a second probe (e.g., primer) of the pair of probes is designed to bind to nucleotide sequence directly downstream (3′) from an exon of any one of SEQ ID NOS: 21-40 and 105-136 such that the pair of probes flank the exon.


In some embodiments, one or more probe is designed to bind to an exon junction. An exon junction comprises (a) nucleotide sequence that includes a 5′ region of an exon (e.g., of any one of SEQ ID NOS: 21-40 and 105-136) and nucleotide sequence directly upstream from the 5′ region of the exon, or (b) nucleotide sequence that includes a 3′ region of an exon (e.g., of any one of SEQ ID NOS: 21-40 and 105-136) and nucleotide sequence directly downstream from the 3′ region of the exon. Table 6 provides examples of cDNA sequences that include exon inclusion events (underlined) as well as sequences directly upstream from and downstream from the exon inclusion event. Any one or more probe may be designed to bind to any region of a nucleotide sequence of Table 6 (SEQ ID NOS: 1-20), e.g., for the purpose of detecting (e.g., amplifying or labeling) the nucleotide sequence in a sample. Table 7 provides examples of cDNA sequences that include exon exclusion events (underlined) as well as sequences directly upstream from and downstream from the exon exclusion event. Any one or more probe may be designed to bind to any region of a nucleotide sequence of Table 7 (SEQ ID NOS: 41-72), e.g., for the purpose of detecting (e.g., amplifying or labeling) the nucleotide sequence in a sample.


Tissue Samples

In some embodiments, the mRNA is obtained from a biological sample. Biological samples include tissue samples or fluid samples. Non-limiting examples of tissue samples include blood samples and breast tissue samples. Non-limiting examples of fluid samples include cerebrospinal fluid (CSF) samples and urine samples.


In some embodiments, the mRNA is obtained from a breast tissue sample. The breast tissue sample, in some embodiments, is obtained from a female subject (e.g., human female subject), although it may alternatively be obtained from a male subject (e.g., human male subject).


In some embodiments, the sample is obtained from a subject diagnosed with a cancer, such as breast cancer. For example, the subject may have, may be at risk of having, or may be suspected of having a cancer of a breast duct, breast lobule, or breast tissue in between the duct and lobule. Non-limiting examples of breast cancer that may be sampled include ductal carcinoma in situ, invasive ductal carcinoma, tubular carcinoma of the breast, medullary carcinoma of the breast, mucinous carcinoma of the breast, papillary carcinoma of the breast, cribriform carcinoma of the breast, invasive lobular carcinoma, inflammatory breast cancer, Paget's disease of the nipple, Phyllodes tumors of the breast, metastatic breast cancer, and triple negative breast cancer (TNBC).


Applications

Methods of the present disclosure, in some embodiments, comprise assigning a favorable prognosis or unfavorable prognosis to a cancer patient, based on the presence of a nucleic acid in the sample (e.g., an exon inclusion event or an exon exclusion) comprising a nucleotide sequence (e.g., an exon) of any one of SEQ ID NOS: 21-40 and 105-136. Thus, in some embodiments, methods herein comprise obtaining a sample from a subject, assaying the sample for a nucleic acid comprising a nucleotide sequence of any one of SEQ ID NOS: 21-40 and 105-136, and assigning a favorable prognosis or unfavorable prognosis to the sample/patient (e.g., breast tissue sample) (see, e.g., Table 4 or Table 5). In some embodiments, a nucleic acid comprising a nucleotide sequence of any one of SEQ ID NOS: 21-40 or 105-136 is detected in the sample obtained from the patient.


In some embodiments, a favorable prognosis is assigned to the sample when a nucleic acid comprising a nucleotide sequence of any one of SEQ ID NOS: 24, 28, 31, 33, 38, 114, 119, or 123 is detected. In some embodiments, a favorable prognosis is an at least 70% probability of surviving at least 2000 days. In some embodiments, a favorable prognosis is an at least 75% probability of surviving at least 2000 days. In some embodiments, a favorable prognosis is an at least 70% probability of surviving at least 4000 days. In some embodiments, a favorable prognosis is an at least 75% probability of surviving at least 4000 days.


In other embodiments, an unfavorable prognosis is assigned to the sample when a nucleic acid comprising a nucleotide sequence of any one of SEQ ID NOS: 21-27, 29, 30, 32, 34-37, 39, 40, 105-113, 115-118, 120-122, or 124-136 is detected. In some embodiments, an unfavorable prognosis is an at least 75% probability of surviving less than 2000 days.


Additional Embodiments





    • 1. A complementary deoxyribonucleic acid (cDNA) comprising a nucleotide sequence of any one of SEQ ID NOS: 22-24, 27-34, 36, 38, or 40.

    • 2. A composition comprising the cDNA of paragraph 1.

    • 3. A composition comprising at least two cDNAs of paragraph 1.

    • 4. The composition of paragraph 2 or 3 further comprising a cDNA comprising a nucleotide sequence of any one of SEQ ID NOS: 21, 25, 26, 35, 37, or 39.

    • 5. The composition of paragraph 2 or 4 comprising a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 21, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 22, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 23, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 24, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 25, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 26, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 27, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 28, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 29, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 30, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 31, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 32, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 33, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 34, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 35, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 36, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 37, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 38, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 39, and a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 40.

    • 6. The composition of paragraph 2 further comprising a probe that binds to the cDNA, or a pair of primers that bind to the cDNA.

    • 7. The composition of any one of paragraphs 2-6, wherein the cDNA is synthesized from messenger ribonucleic acid (mRNA) obtained from a tissue sample, optionally a breast tissue sample.

    • 8. The composition of paragraph 7, wherein the breast tissue sample is obtained from a female subject.

    • 9. The composition of paragraph 7 or 8, wherein the sample is obtained from a subject diagnosed with a cancer.

    • 10. The composition of paragraph 7 or 8, wherein the sample is obtained from a subject at risk of having a cancer or suspected of having a cancer.

    • 11. A method comprising assaying a sample for a nucleic acid comprising a nucleotide sequence of any one of SEQ ID NOS: 22-24, 27-34, 36, 38, or 40.

    • 12. The method of paragraph 11 further comprising assaying the sample for a nucleic acid comprising a nucleotide sequence of any one of SEQ ID NOS: 21, 25, 26, 35, 37, or 39.

    • 13. The method of paragraph 11 comprising assaying the sample for a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 21, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 22, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 23, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 24, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 25, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 26, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 27, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 28, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 29, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 30, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 31, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 32, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 33, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 34, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 35, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 36, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 37, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 38, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 39, and a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 40.

    • 14. The method of any one of paragraphs 11-13, wherein the nucleic acid is a messenger ribonucleic acid (mRNA), optionally obtained from a breast tissue sample.

    • 15. The method of any one of paragraphs 11-13, wherein the nucleic acid is a complementary deoxyribonucleic acid (cDNA) synthesized from mRNA obtained from a breast tissue sample.

    • 16. The method of paragraph 14 or 15, wherein the breast tissue sample is obtained from a female subject.

    • 17. The method of any one of paragraphs 14-16, wherein the breast tissue sample is obtained from a subject diagnosed with a cancer.

    • 18. The method of any one of paragraphs 14-16, wherein the breast tissue sample is obtained from a subject at risk of having a cancer or suspected of having a cancer.

    • 19. The method of any one of paragraphs 11-18 further comprising detecting a nucleic acid comprising a nucleotide sequence of any one of SEQ ID NOS: 21-40.

    • 20. The method of any one of paragraphs 11-19, wherein the nucleic acid is a mRNA.

    • 21. The method of any one of paragraphs 11-19, wherein the nucleic acid is a cDNA.

    • 22. The method of any one of paragraphs 19-21 further comprising assigning to the subject from whom the sample was obtained a favorable prognosis or an unfavorable prognosis.

    • 23. The method of paragraph 22, wherein a favorable prognosis is assigned to the subject from whom the sample was obtained if a nucleic acid comprising a nucleotide sequence of any one of SEQ ID NOS: 24, 28, 21, 33, or 38 is detected.

    • 24. The method of paragraph 22, wherein an unfavorable prognosis is assigned to the subject from whom the sample was obtained if a nucleic acid comprising a nucleotide sequence of any one of SEQ ID NOS: 21-27, 29, 30, 32, 34-37, 39, or 40 is detected.

    • 25. A method comprising:

    • obtaining a sample from a subject;

    • assaying the sample for a nucleic acid comprising a nucleotide sequence of any one of SEQ ID NOS: 21-40; and

    • assigning a favorable prognosis or unfavorable prognosis to the subject.

    • 26. The method of paragraph 25 further comprising detecting in the sample a nucleic acid comprising a nucleotide sequence of any one of SEQ ID NOS: 21-40.

    • 27. The method of paragraph 26, wherein the sample is a breast tissue sample.

    • 28. The method of any one of paragraphs 25-27, wherein the assaying step comprising assaying the sample for a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 21, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 22, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 23, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 24, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 25, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 26, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 27, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 28, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 29, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 30, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 31, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 32, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 33, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 34, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 35, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 36, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 37, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 38, a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 39, and a nucleic acid comprising a nucleotide sequence of SEQ ID NO: 40.

    • 28. The method of any one of paragraphs 25-27, wherein a favorable prognosis is assigned to the subject from whom the sample was obtained if a nucleic acid comprising a nucleotide sequence of any one of SEQ ID NOS: 24, 28, 21, 33, or 38 is detected.

    • 29. The method of any one of paragraphs 25-27, wherein an unfavorable prognosis is assigned to the subject from whom the sample was obtained if a nucleic acid comprising a nucleotide sequence of any one of SEQ ID NOS: 21-27, 29, 30, 32, 34-37, 39, or 40 is detected.

    • 30. A kit comprising: a probe comprising a nucleotide sequence complementary to a nucleotide sequence of any one of SEQ ID NOS: 1-20; and at least one reagent for detecting a nucleic acid selected from buffers, salts, polymerases, and deoxyribonucleotide triphosphates (dNTPs).

    • 31. A kit comprising:

    • a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 1, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 2, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 3, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 4, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 5, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 6, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 7, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 8, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 9, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 10, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 11, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 12, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 13, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 14, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 15, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 16, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 17, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 18, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 19, and a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 20.

    • 32. A kit comprising:

    • a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 21, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 22, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 23, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 24, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 25, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 26, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 27, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 28, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 29, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 30, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 31, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 32, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 33, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 34, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 35, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 36, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 37, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 38, a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 39, and a probe comprising a nucleotide sequence complementary to a nucleotide sequence of SEQ ID NO: 40.

    • 33. The kit of paragraph 31 or 32, wherein the kit further comprises at least one reagent for detecting a nucleic acid selected from buffers, salts, polymerases, and deoxyribonucleotide triphosphates (dNTPs).





EXAMPLES
Example 1

Alternative splicing is a biological phenomenon that increases transcript and protein diversity. In one type of alternative splicing, referred to as “exon skipping,” exons are either spliced “in” or spliced “out” of the transcript based on cellular conditions (FIG. 55).


Due to alternative splicing, different transcript isoforms (exon configurations) of the same gene might be expressed in tumor and normal samples. Therefore, even though a gene is expressed in both tumor and normal tissues, transcripts might harbor an exon configuration that is distinctive to cancer.


A conventional approach for identification of cancer biomarkers is based on gene expression. Researchers aim to detect whether a gene is specifically expressed in tumors using microarrays or RNA sequencing. We took a splicing-based approach rather than a gene-based approach to identify cancer biomarkers.


Methods

To identify splicing biomarkers in cancer, we took the steps outlined below, i.e., (i) Transcript sequencing, (ii) TCGA analysis, and (iii) Clustering analysis using a novel methodology to identify splicing-based biomarkers.


Sequencing: Long read sequencing using PacBio® Single Molecule Real Time Sequencing (SMRT) technology. This technology is capable of sequencing full-length cDNA transcripts without the need of cDNA fragmentation, and therefore can be used to directly infer the connectivity of exons in transcripts without the need of computational reconstruction. We used this technology to sequence transcripts in 81 cancer and tumor samples. We obtained 298K transcripts corresponding to ˜14K genes, yielding a median of 8 30 isoforms per gene. This represents a ˜2-fold increase over the public human reference transcriptome (Gencode version 25) for those set of genes. This set of transcripts is called PacBio® Transcriptome.


Data Analysis Step 1, TCGA analysis: Quantification of exon skipping events in a large cohort of breast cancer patients available from TCGA using the PacBio®


Transcriptome as background. The aim of the step is to compute percent spliced-in (PSI) for exons undergoing alternative splicing. This step was performed using the rMATS software. rMATS identified 67,255 skipping events in the PacBio® transcriptome, and computed the PSI levels for each of those exons across all samples (n=1,748, including 1,111 breast cancer tumors and 637 normal). Given the size of the TCGA sequencing data, this step was performed using the ISB Cancer Genomics Cloud (Google Cloud) platform.


Data Analysis Step 2, Clustering: Apply a methodology of the present disclosure called ts3 (Tumor Specific Splice Site Detection) to find exons that are included (e.g., spliced in) and excluded (spliced out) only in cancer (FIG. 55). This is accomplished by using a clustering approach based on GMM.


Results

We applied our methodology based on Gaussian mixture modeling to identify exon splicing events specific to breast cancer patients from the TCGA cohort. As a result, we identified 20 exon inclusion events (spliced “in” exons) that are specifically expressed in cancer and have prognosis power. These exon inclusion events have the following properties:

    • Target exon has increased PSI levels (expression) compared to normal tissues (PSItumor−PSInormal>10%),
    • Target exon is low or absent in normal tissues (PSInormal<5%),
    • Splicing event is reliably detected in at least 30 breast cancer patients (coverage of at least 10 RNA-Seq reads in each patient),
    • Patients harboring these exon inclusion events have favorable or unfavorable survival prognosis (p<0.05, logrank test).


We also identified 32 exon exclusion events (spliced “out” exons) that are specific to breast cancer and have prognosis power. These exon exclusion events have the following properties:

    • Target exon has decreased PSI levels (expression) compared to normal tissues (PSItumor−PSInormal>−10%),
    • Target exon is high in normal tissues (PSInormal>95%),
    • Splicing event is reliably detected in at least 30 breast cancer patients (coverage of at least 10 RNA-Seq reads in each patient),
    • Patients harboring these exon exclusion events have favorable or unfavorable survival prognosis (p<0.05, logrank test).


Because they are specific to cancer, these exon events are referred to as “exon inclusion biomarkers or exon exclusion biomarkers.”


The exon splicing sequences were identified using long read SMRT PacBio® sequencing (see, e.g., Rhoads A et al. Genomics Proteomics Bioinformatics 2015; 13: 278-289, and Huddleston J et al. Genome Research 2014; 24: 688-696).


We found 2 types of exon splicing biomarkers, with favorable and unfavorable prognosis. Table 1 indicates that 15 exon inclusion events have unfavorable prognosis (worse outcome, lower survival time), and 5 exon inclusion events have favorable prognosis (better outcome, increased survival time). Table 2 indicates that 29 exon exclusion events have unfavorable prognosis, and 3 exon exclusion events have favorable prognosis.









TABLE 1







Exon inclusion biomarkers associated with breast cancer survival










Splicing

Expression
EXON


Event ID
Gene
Prognosis
SEQ ID NO:













1446
CCDC115
Unfavorable
21


4322
WDR45B
Favorable
28


5134
PLEKHA6
Unfavorable
32


5696
TTC3
Unfavorable
34


6785
SPATS2
Unfavorable
39


8742
DHRS11
Unfavorable
40


13343
ENAH
Unfavorable
22


15088
POLI
Unfavorable
23


16864
PLXNB1
Favorable
24


21181
SH3GLB1
Unfavorable
25


34793
TCF25
Unfavorable
26


42420
PRR5-ARHGAP8
Unfavorable
27


44438
VPS29
Unfavorable
29


48175
E4F1
Unfavorable
30


49765
TEN1-CDK3
Favorable
31


56552
GNAZ
Favorable
33


57139
RNF8
Unfavorable
35


57874
ZDHHC13
Unfavorable
36


60615
SH3GLB2
Unfavorable
37


62560
ITFG1
Favorable
38
















TABLE 2







Exon exclusion biomarkers associated with breast cancer survival










Splicing

Expression
EXON


Event ID
Gene
Prognosis
SEQ ID NO:













1506
CENPK
Unfavorable
73


2098
METTL5
Unfavorable
74


2242
PLA2R1
Unfavorable
75


7106
RHOH
Unfavorable
76


7108
RHOH
Unfavorable
77


9442
QPRT
Unfavorable
78


10439
IL17RB
Unfavorable
79


11685
STAU1
Unfavorable
80


13451
LYRM1
Unfavorable
81


14574
PPARG
Favorable
82


16269
BORCS8-MEF2B
Unfavorable
83


16833
ENOSF1
Unfavorable
84


16929
DHRS4-AS1
Unfavorable
85


16943
NDUFV2
Unfavorable
86


18745
FER1L4
Favorable
87


19824
PHF14
Unfavorable
88


19828
PHF14
Unfavorable
89


21024
BCL2L13
Unfavorable
90


22227
SELENBP1
Favorable
91


24742
LINC00630
Unfavorable
92


27194
CTBP2
Unfavorable
93


30244
SLC52A2
Unfavorable
94


33377
SLC38A1
Unfavorable
95


40521
FAM65A
Unfavorable
96


41168
USP25
Unfavorable
97


45885
HMOX2
Unfavorable
98


50148
MKRN2OS
Unfavorable
99


52249
ATP8A2P1
Unfavorable
100


53188
HIBCH
Unfavorable
101


58853
SLC35C2
Unfavorable
102


59314
TRIM5
Unfavorable
103


60239
HSD17B6
Unfavorable
104










FIG. 1 shows the detection of the 52 exon inclusion or exon exclusion biomarkers in The Cancer Genome Atlas (TCGA) patients. Inclusion biomarkers are depicted in white, and exclusion biomarkers are depicted in black. Biomarkers with favorable prognosis are denoted “1”, while biomarkers with unfavorable prognosis are denoted “0”. These biomarkers are detected in 2-33% of patients. For instance, the splicing event 42420 affecting the PRRS-ARHGAP8 gene is present in 22% of patients, while the biomarker 15088-POL1 is present is 9% of patients. Also, 91.5% patients have at least one biomarker (754 out of 824 patients).



FIG. 2A shows that 8.5% (70 patients) have no exon inclusion biomarkers predictors of survival, 13.6% (112 patients) have exactly one exon biomarker predictor of survival, and 77.9% (642 patients) have more than one exon inclusion biomarker predictor of survival.


In terms of exon biomarkers detection, breast cancer TCGA patients can be divided in four groups, (i) unfavorable biomarkers only (60.9% or 502 patients), (ii) favorable biomarkers only (2.9% or 24 patients), and (iii) mixed unfavorable and favorable biomarkers (27.7% or 228 patients), and (iv) no detected biomarkers (8.5% or 70 patients) (FIG. 2B).


Therefore, while it is common to detect more than one biomarker in the patient, we observed that patients tend to have the same type of exon splicing biomarker (all unfavorable or all favorable). Additional work is ongoing to devise a strategy to utilize these exon biomarkers in the clinical


Example Application: Use of 52-Exon Splicing Biomarker Panel for Prognosis

We classified patients into different groups based on the outcome (unfavorable, favorable, mixed, no prediction) and number of exon splicing biomarkers (exactly one event, more than one event). The classification is available in the Table 3. For instant, unfavorable prognosis was ascertained to 11.9% of patient (exactly one event).









TABLE 3







Exon Splicing Biomarker Outcome












Prediction
Number of exon
Number of
Percent



Outcome
splicing biomarkers
patients
Total
















Unfavorable
 1 event
98
11.9%



Unfavorable
>1 event
40
  49%



Favorable
 1 event
14
 1.7%



Favorable
>1 event
10
 1.2%



Mixed
>1 event
228
27.7%



No prediction
 0 event
70
 8.5%

















TABLE 4







Genomic Location of Exon Inclusion Biomarkers















Splicing



Exon
Exon
Exon

Gencode


event id
Gene
Chr
Strand
Target
Upstream
Downstream
RefSeq*
v.28*


















13343
ENAH
chr1

225595208-
225567249-
225600208-
No
No






225595329
225567414
225600362


1446
CCDC115
chr2

130339560-
130338250-
130340908-
Yes
Yes






130339701
130339232
130341039


15088
POLI
chr18
+
54272095-
54271360-
54273926-
No
No






54272242
54271485
54274090


16864
PLXNB1
chr3

48413458-
48413069-
48413670-
No
No






48413537
48413169
48413818


21181
SH3GLB1
chr1
+
86728403-
86724313-
86734602-
Yes
Yes






86728489
86724405
86734691


34793
TCF25
chr16
+
89878461-
89873578-
89883351-
No
Yes






89878627
89873859
89883512


42420
PRR5-
chr22
+
44809006-
44808307-
44814672-
No
No



ARHGAP8


44811304
44808438
44814758


4322
WDR45B
chr17

82625587-
82625389-
82627204-
No
No






82625762
82625483
82627291


44438
VPS29
chr12

110498820-
110496012-
110502049-
No
No






110499546
110496203
110502108


48175
E4F1
chr16
+
2226229-
2223591-
2228372-
No
No






2226317
2223770
2228523


49765
TEN1-
chr17
+
75985173-
75979275-
75986187-
No
No



CDK3


75985288
75979511
75986284


5134
PLEKHA6
chr1

204271248-
204268208-
204273626-
No
No






204271374
204268312
204273740


56552
GNAZ
chr22
+
23122192-
23095706-
23123087-
No
No






23122702
23096418
23125026


5696
TTC3
chr21
+
37075936-
37073269-
37108392-
No
No






37076066
37073364
37108446


57139
RNF8
chr6
+
37359183-
37354012-
37360446-
No
Yes






37359342
37354275
37360574


57874
ZDHHC13
chr11
+
19124904-
19117150-
19142978-
No
No






19125180
19117276
19143123


60615
SH3GLB2
chr9

129009453-
129009106-
129009771-
Yes
Yes






129009467
129009346
129009871


62560
ITFG1
chr16

47450354-
47428804-
47451396-
No
No






47450453
47428898
47451470


6785
SPATS2
chr12
+
49441730-
49371228-
49460770-
No
Yes






49441816
49371290
49461037


8742
DHRS11
chr17
+
36593449-
36591903-
36594971-
No
No






36593616
36592156
36595180






Human genome build hg38



*Yes: there exists a transcript harboring 3 exons (target, upstream and downstream), as well as transcript harboring 2 exons (upstream and downstream) reported in the database













TABLE 5







Genomic Location of Exon Exclusion Biomarkers















Splicing



Exon
Exon
Exon

Gencode


event id
Gene
Chr
Strand
Target
Upstream
Downstream
RefSeq*
v.28*


















1506
CENPK
chr5

65528919-
65528452-
65529117-
No
Yes






65529017
65528578
65529199


2098
METTL5
chr2

169815477-
169811764-
169819561-
No
No






169815528
169812506
169819643


2242
PLA2R1
chr2

159955698-
159955199-
159956510-
No
No






159955828
159955346
159956627


7106
RHOH
chr4
+
40197101-
40193489-
40242714-
Yes
Yes






40197300
40193812
40242834


7108
RHOH
chr4
+
40197121-
40193545-
40242714-
No
No






40197300
40193812
40242834


9442
QPRT
chr16
+
29695172-
29694664-
29696996-
No
No






29695199
29695096
29697127


10439
IL17RB
chr3
+
53855294-
53852871-
53856844-
No
No






53855341
53852997
53856986


11685
STAU1
chr20

49174195-
49153933-
49188116-
Yes
Yes






49174269
49154071
49188357


13451
LYRM1
chr16
+
20915556-
20902486-
20920122-
Yes
Yes






20915714
20902717
20920214


14574
PPARG
chr3
+
12416704-
12405882-
12433898-
No
Yes






12417154
12406081
12434577


16269
BORCS8-
chr19

19180686-
19150682-
19182573-
No
Yes



MEF2B


19180761
19150764
19182683


16833
ENOSF1
chr18

691204-
690549-
693882-
No
No






691276
690631
693908


16929
DHRS4-
chr14

23953774-
23940393-
23954748-
No
No



AS1


23954033
23941158
23955082


16943
NDUFV2
chr18
+
9115528-
9103092-
9117838-
No
No






9115902
9103433
9117903


18745
FER1L4
chr20

35560163-
35559341-
35560540-
No
No






35560364
35559627
35560638


19824
PHF14
chr7
+
11061791-
11051612-
11061964-
No
No






11061852
11051780
11063404


19828
PHF14
chr7
+
11061791-
11051612-
11061964-
No
No






11061851
11051780
11062085


21024
BCL2L13
chr22
+
17696141-
17683214-
17726677-
No
No






17696210
17683321
17729133


22227
SELENBP1
chr1

151369004-
151368199-
151369713-
No
No






151369189
151368319
151369769


24742
LINC00630
chrX
+
102816992-
102770352-
102825993-
No
No






102817082
102770420
102826169


27194
CTBP2
chr10

125133512-
125038997-
125162581-
No
No






125133612
125039155
125162780


30244
SLC52A2
chr8
+
144357251-
144354661-
144359184-
No
No






144357602
144354690
144359423


33377
SLC38A1
chr12

46196725-
46194651-
46197720-
No
No






46196871
46196276
46197817


40521
FAM65A
chr16
+
67544956-
67544695-
67545376-
No
No






67545117
67544830
67545534


41168
USP25
chr21
+
15777904-
15766002-
15791502-
No
No






15778027
15766141
15791664


45885
HMOX2
chr16
+
4483637-
4474771-
4505484-
No
No






4483754
4474847
4505610


50148
MKRN2OS
chr3

12543180-
12541860-
12545247-
No
No






12543229
12542022
12545524


52249
ATP8A2P1
chr10
+
37248118-
37242758-
37261864-
No
No






37248396
37242847
37261925


53188
HIBCH
chr2

190208880-
190204635-
190212956-
Yes
Yes






190208913
190205232
190213075


58853
SLC35C2
chr20

46355802-
46355073-
46356574-
No
No






46355865
46355241
46356637


59314
TRIM5
chr11

5709135-
5679761-
5937401-
No
No






5709255
5680238
5937505


60239
HSD17B6
chr12
+
56763198-
56752180-
56773834-
No
Yes






56763414
56752318
56774165






Human genome build hg38



*Yes: there exists a transcript harboring 3 exons (target, upstream and downstream), as well as transcript harboring 2 exons (upstream and downstream) reported in the database






Example 2

In this example, we analyzed the splicing events listed in Table 4 and Table 5 (see FIGS. 3A-54D). The expression (expressed as PSI) of these target exons varies substantially across cancer and normal samples (see, e.g., FIG. 3A, varying from 0 (0% inclusion) to 0.3 (30% inclusion)).


Visual inspection of data suggests the existence of a subpopulation of samples in which the target exon is included, or “spliced-in”. This subpopulation (classification “4” samples in FIG. 3A) was formally detected using a clustering methodology called GMM. The GMM analysis of splicing event 1446 (CCDC115) generated 4 subpopulations of samples (clusters).


Nonetheless, only one of the clusters (e.g., C4 of FIGS. 3A and 3B) qualifies as a tumor specific cluster, because it has the following properties:

    • cluster C4 contains more than >90% of tumor samples (see FIG. 3B);
    • cluster C4 has >10% increase expression (PSI) compared to normal (PSItumor−PSInormal>10%), see FIG. 3C; and
    • the exon inclusion event is very low or absent expression in normal tissues (PSInormal<5%), see FIG. 3C.


The cluster C4 contains 97 breast cancer patients out of 824 analyzed, which means that the exon inclusion event was detected in ˜12% of TCGA breast cancer patients. Moreover, survival analysis of breast cancer patients in cluster C4 versus the remaining breast cancer patients in TCGA indicates that patients in C4 (expressing the targeting exon) have a worse overall survival (FIG. 3D). Therefore, the exon inclusion event 1446 (CCDC115) is (i) specific to breast cancer, (ii) is detected in a subpopulation of breast cancer patients, and (iii) is associated to unfavorable overall survival.


Furthermore, the expression (expressed as PSI) of a different target exon varies substantially across cancer and normal samples (see, e.g., FIG. 23A, varying from 0 (0% exclusion) to 1.0 (100% inclusion)).


Visual inspection of data suggests the existence of a subpopulation of samples in which the target exon is excluded, or “spliced-out”. This subpopulation (classification “4” samples in FIG. 23A) was formally detected using a clustering methodology called GMM. The GMM analysis of splicing event 1506 (CENPK) generated 4 subpopulations of samples (clusters).


Nonetheless, only two of the clusters (e.g., C1 and C3 of FIGS. 23A and 23B) qualifies as a tumor specific cluster, because it has the following properties:

    • clusters C1 and C3 contains more than >90% of tumor samples (see FIG. 23B);
    • cluster C1 has >10% increase expression (PSI) compared to normal (PSItumor−PSInormal>10%), see FIG. 23C; and
    • the exon exclusion event is very low or absent expression in normal tissues (PSInormal<5%), see FIG. 23C.


The cluster C1 contains 37 breast cancer patients out of 824 analyzed, which means that the exon exclusion event was detected in ˜4% of TCGA breast cancer patients. Moreover, survival analysis of breast cancer patients in cluster C1 versus the remaining breast cancer patients in TCGA indicates that patients in C1 (the targeting exon is spliced out) have a worse overall survival (FIG. 23D). Therefore, the exon exclusion event 1506 (CENPK) is (i) specific to breast cancer, (ii) is detected in a subpopulation of breast cancer patients, and (iii) is associated to unfavorable overall survival.









TABLE 6







Exon Inclusion Event Sequences










Splicing





Event ID
Gene Name
cDNA Sequence
SEQ ID NO:





1446
CCDC115
GCCTGCAGCTGGCCGCAGACATAGCCAGCCTCC
1




AGAACCGCATTGACTGGGGTCGAAGCCAGCTCC
The underlined




GGGGACTCCAAGAGAAACTCAAGCAGCTGGAGC
exon inclusion




CTGGGGCTGCCTGACATGCGCGCAAAGAGGCAG
sequence is




GGCAGCGAGCACAGCTGTTCTCCGACATGGCTA
SEQ ID NO:




CGTGATCTCAGGCCTTCTTCCTTCACAATTAGCT
21.




CTTGCCCCTACCCCACGCCAGCTAATGCCCCTTC





TGTGTCCCTGCTCTGCATGTTTCCATTTTCCTTAG





GTGTGAAGTTTGAAGAGGCAAACAGTAATTTTG





AAAGCCACTACTTTGAAACCATTCTAAGGCCTG





AGTTCCCATAGGACACACTCACATAGGCAGGTA





CACGTTAGTCAACAATTGGAACTGCCTCTTGGAT





CACTCAGCTGTGCTTTCATGGCTGGATGATGGAA





CACTGTGCGAAGAGAGATGGGGGCCAGGAAGTA





GCGCTTCATGCTTAGTACATCCTCCAAATTGTCT





TTGCTGGAGGAGAAAACCGTACTCAGCCAAAAG





ATCAGGACAATATGACTTGAGTCCACAAGGACA





CAAACACCTGAGTAGCTGGGCAGCCCTTGGCAG





GGTCTAAGCCAGGAAGTAAAAATGATCTGGCCT





AGATATTTAAGGGAACTCTAGGAAGAGGCCTAG





GTTTTTAAAATCCTGTCTCTTTGTCTTACCATAAG





AGGCTGAGCCTCTCTTCATTTTTTTGAAGGGCCA





CTTGTGTTTTCTGTTCTGGGAACTTCATTCATTTT





TCTACTGGGTTGTTGATCTTTGCAGTAATTTCTA





GGAGCTGTTTATGTTTGGAGGTAATTGGTCCTTT





GTCCATATATATGAGATGTAAGTCTTATTTTCCA





GTTTATCTTTTTGCTTATTTTTTTTGACTTTTTATT





GTAAAATAAAACATCAAACTGCACAGAACAGTT





GAATAGCTTAATGAATAACTACAGTAAAAGCTA





TGGTAACCCCCTGCTGCTGAACAGGAGGCCGA







AGACGAGAGCTGCCCGGAGGACTGGGCAGCA









GCTGTTCCAGCAGAGACATCAGCAAAAGCCA









TCTAGAGGTGGATCCAGAGTGTGGACTAACA









GAGAAAAGAAGTGGAGGGAGAGCAG
GTCTGC






GGAGGCGCAAGGGCCCCACTAAGACCCCAGAAC





CGGAGTCCTCTGAGGCCCCTCAGGACCCCCTGA





ACTGGTTTGGAATCCTAGTTCCTCACAGTCTACG





TCAGGCTCAAGCAAGCTTCCGGGATG






13343
ENAH
TGAACAGAGTATCTGTCAGGCAAGAGCTGCTGT
2




GATGGTTTATGATGATGCCAATAAGAAGTGGGT
The underlined




GCCAGCTGGTGGCTCAACTGGATTCAGCAGAGT
exon inclusion




TCATATCTATCACCATACAGGCAACAACACATTC
sequence is




AGAGTGGTGGGCAGGAAGATTCAGGACCATCAG
SEQ ID NO:






ACAGAGTCTCGCTCTGTTGCCCAGGCTAGAG


22.






TGCAATGGCGTAATCTCAGCTCACTGCAACCT









CCGCCTCCCGTGTTCAAGCGATTCTCCTGCCT









CAGCCTCCTGAGTAGCTGGGATCACAG
ACAG






AGTCTGACTGTTGCCCAGGCTGGAGTGCAATGG





CACCAACATGGCTCACTGCAACCTTGACCTCCTG





GGCTCAAGTGATCCTCCCGGCCTCCGTCTCCCGA





ATAGCGGTCTTACTCATTTTCTACGTGTGTGTTG





AGTGCACCATTTGAGA






15088
POLI
GAGTTCATGATCAAGTGTTGCCCACACCAAATG
3




CTTCATCCAGAGTCATAGTACATGTGGATCTGGA
The underlined




TTGCTTTTATGCACAAGTAGAAATGATCTCAAAT
exon inclusion




CCAGAGCTAAAAGACAAACCTTTAGGAAAGATT
sequence is






CCTCTTTTAGTGTAAGCATAAAGAACATTTTT


SEQ ID NO:






GGTTCACTTGCTGCTACCCTCTTGTGCCCACT


23.






TTGGCTTAATAAATCCCAATCCAGCCTAGCTG









ATTTACTGAAGAACAAAGGGATGACTAGTTTT









TGCTACGCCAAG
GGGTTCAACAGAAATATTTGG






TGGTTACCTGCAACTATGAAGCTAGGAAACTTG





GAGTTAAGAAACTTATGAATGTCAGAGATGCAA





AAGAAAAGTGTCCACAGTTGGTATTAGTTAATG





GAGAAGACCTGACCCGCTACAGAGAAATGTCTT





ATAAGGTTACAG






16864
PLXNB1
GAGGAAGAGCAAGCAGGCCCTGAGGGACTATA
4




AGAAGGTTCAGATCCAGCTGGAGAATCTGGAGA
The underlined




GCAGTGTGCGGGACCGCTGCAAGAAGGAATTCA
exon inclusion




CAGGCCAAGTGGTCTCTGTTCAACAACTCAGC
sequence is






TTTGCCACTGTGGCACAAAGGCAGCCAGGGA


SEQ ID NO:






CGACATGGAAACACATGAAA
GTGCAGATGGGG

24.




AACTTGCGCTTCTCCCTGGGTCACGTGCAGTATG





ACGGCGAGAGCCCTGGGGCTTTTCCTGTGGCAG





CCCAGGTGGGCTTGGGGGTGGGCACCTCTCTTCT





GGCTCTGGGTGTCATCATCATTGTCCTCATGTAC





AG






21181
SH3GLB1
AAAGAAAGGAAACTATTGCAAAATAAGAGACTG
5




GATTTGGATGCTGCAAAAACGAGACTAAAAAAG
The underlined




GCAAAAGCTGCAGAAACTAGAAATTCACAACTA
exon inclusion






AACTCAGCTCGCCTTGAAGGAGATAACATTAT


sequence is






GGTAAATTTCTCTTACATGCTCAACTTCCTGC


SEQ ID NO:






ATGTAAAATGGCTGAAG
TCTGAACAGGAATTA

25.




AGAATAACTCAAAGTGAATTTGATCGTCAAGCA





GAGATTACCAGACTTCTGCTAGAGGGAATCAGC





AGTACACAT






34793
TCF25
ACCCCGCGCGAAGAGTGCGCAGGCGCGCCGACA
6




GCCGAGTTTTCTGCGCTTCCTTCTCCCTCTCTCCA
The underlined




GACGTCGTGGTCGTTCGGTCCTATGTCGCGCCGG
exon inclusion




GCCCTCCGGAGGCTGAGGGGGGAACAGCGCGGC
sequence is




CAGGAGCCCCTCGGGCCCGGCGCCTTGCATTTCG
SEQ ID NO:




ATCTCCGTGATGACGATGACGCGGAAGAAGAAG
26.




GGCCCAAGCGGGAGCTTGGTGTCCGGCGTCCCG





GGGGCGCAGGGAAGGAGGGCGTCCGAGTCAAC





AACCGCTTCGAGCTGGAAAAATGGACATTTTCC







TCTCCCCCTAAAAAAAGATAAAACTCCTTCCT









GGTTATTAACTGAAATGCTGATCGAGCTTTAT









CCTAAAGAAGATCAGTCGTGGACAAGAACCT









TGTGAAATGTTCCCCATTTGAGACCCTAAAAC









TAATGAAAATCACAGCTTTTGG
ATAAACATTG






ACGATCTTGAGGATGACCCTGTGGTGAACGGGG





AGAGGTCTGGCTGTGCGCTCACAGACGCTGTGG





CACCAGGGAACAAAGGAAGGGGTCAGCGTGGA





AACACAGAGAGCAAGACGGATGGAGATGACAC





CGAGACAGTGCCCTCAGAGCAG






42420
PRR5-
GTATTTGAAGTACACACTGGACCAATACGTTGA
7



ARHGAP8
GAACGATTATACCATCGTCTATTTCCACTACGGG
The underlined




CTGAACAGCCGGAACAAGCCTTCCCTGGGCTGG
exon inclusion




CTCCAGAGCGCATACAAGGAGTTCGATAGGAAA
sequence is






GACGGGGATCTCACTATGTGGCCCAGGCTGG


SEQ ID NO:






TCTCGAACTCCAAGCTCAAGCGATCCTCCCAC


27.






CTCAGCCTCCCAAAGTACTGGGATTACAGGC









AGGAGCCACCATGCCAAGCCAACACTCTTGTT









CTTAAAGGGCCAGACAGTCAGCATTTTAGCTT









TGCAGGCCTGTTGCTCTATTGCAACAACTCTG









CTGGACTGTGTTCCAGTAAAACATTATGGACG









CTGAAATGTGAATTTCATGTCATTTTCACGTG









TCATGAAATATTCTTCTGTTTTTTTTTTTCAAC









CACTTAAAAACATAAAAAGCCATTTTTAGCTT









GCAGCCTGTACCAAAGCAGGAAGCAGGCTAG









GTTCATCCTGCCTGCCCATTCTCCCACCCCTG









GTCCAGTGAATTACTGGCAAAGAAACAACTG









CATGACCGTTTCTTCACTAAAGCCTCTTCTTG









CTTTCACAGCCCTTTACAGTCTGCAAGGGGCA









TTCTGATGCCTCTTGTTGGTGAGATGGCAGCC









TCATTTTACAGATGAGGACATAGGCCCCAGG









GAGCAAGTGACTTACCCGTGGTCACTCAGCTT









GTGTGTGGTAGGGCAGGATCCCACCCCAGGC









CCCCGCCTCCCTCTCCCACCCAACGCTACTCA









CCGCTTGGCCATGGCCTGGAGCCGGCAGACT









TTTCCTGAGGGACGTCCGGCCTAATAATCAAC









TTGGCAATATATCTGGCTCGTAGACTGCGGC









GATGGGCGTTGATGTGGATATCCTAGATTCCT









CTGGGTTTTCCTTCTTCAAAGTCCTTTCAAAC









CTGTAACAGAAATCTGCTTCACAGATATCTGA









GTCAGTGGGACAGTGGAAGGCAGTGCCTGAA









TGTCCCAGAAGTCCTCCCTCCAGTTGCCTTTT









GGGTCCTGCTGTCATTATCAATAGGACCTTCG









GAGGGACTTCTTGGTTCCCCATCCTATGTCTT









AGGGAAAGAATTGTTGCTGTATTTTGCAGTCA









TTTACTGGGCACCTGTATAAGCTGGAGATGG









CCTAGCCCCAGCGCATGTCCTCCTCCAGGAA









GGCTTCCTGGGTTGTCCTGGGAGAATCAATA









GCCCCTTCCCTGCAGCCTCACTGTGCCTAAGC









AGACACCAATCCTAGCTAGCACTTAGGGGTTT









GTGAACAGGTCTGCCTCCTGCACTAGGCTGT









GATCCCGGACCTGTCTCTGCATCCCTTGCAGG









TGGGAAAGGATCTGCATATGGCAGCCTTTTTT









TTTTTTTTTTTTTTTTTGAGACAGAGTCTCATT









CTATTGCCTGGGCTGGAGCACAGTGGCGAGA









TCTCGGCTCACCACAACCTCCACCTCCCAGGT









TCAAGTGATTCTCCTGCCTCAGCCTCCTGAGT









ACCTGGGACTACAGGCGTGAGCCACCATGCC









CGGCTAATTTTTGTATTTTTAGTAGAGACGGG









GTTTCACTATGTTGGCCAGGCTGGTCTTGAAC









TCCTGACCTCGTGATCCGCCTGCCTTGGCCTC









CCAAAGTGCCGGGATTACAGGCGTGAGCCAC









TGTGCCCAGCCGGCAGGCTTTTATTAAGCGTT









AGATGGGAGGATAGAGGAGTGAAGTGGTACT









GGCAGGAAGTACCAAGGTTCCAGCTGGCGTA









ATCAGGAAGGCTGCATGGAGGAAGCAGCCTT









TGAGCTGCCTGTGGAGTGGTGGGCAGGGTGT









TGTGAAGTGGCAATCACTGGATTTTGCTTCTG









GTACGAGGTGTGGCCAGATGCAAGAAAGAGC









AGGGTGGACTTTGGTGCAATTGGTGGGGGTC









TGGTCTGTAGGGTTCCCGTGGGGAGCCGTGG









AGGGAGGCAGCAAAGGAGGGAGGGGCACAG









AGGATGCTGGACTGTGTTTAAGAGGCAGCAG









GGAGCCATGGCAGGTGCTTGAGGAGAAGCGA









GTGATGTGTTTAAAGCAGCCCTTTCAGGAGG









CTCAGGCTCACAGCAGGATGTGCACAGTAGC









CCTGTCTTGAGCTAAAGCAGATGAAGGTTTTG









CCCTCTGCACTTCCCCACGTGAGAAACGAAG









ATGCACCCGCAGATTCCTTGAGGCAGCTCCC









CCACTTCTCAGTTGCCAGAAATCAGCCCAGAG









AAACAAACCCGTAATCAGCCCAGGGTGCTTTC









CCTTCCCTTTCTCGAGGGGGCTGCTGGTTCGC









ACATAAGGAGTGGGTCACTCCCGCTTGGGAG









AAAGCAGCAGAATTCCTTCACAGCCAGGTAA









GATGTGCCAGTGGTCGATGGATGAAATCTAG









CCGGGGAGTTGGAATCTGTGTTGCCAGCAGT









GACCTGTGAGCAGTGACAAAGCCAAAG
GTAC






AAGAAGAACTTGAAGGCCCTCTACGTGGTGCAC





CCCACCAGCTTCATCAAGGTCCTGTGGAACATCT





TGAAGCCCCTCATCAG






4322
WDR45B
AATTGTGGTGGTTTTGGACTCCATGATTAAGGTG
8




TTCACATTCACACACAATCCCCATCAGTTGCACG
The underlined




TCTTCGAAACCTGCTATAACCCCAAAGATGGAG
exon inclusion






TGTTTGATGATGTCTCTCTGAACCTCAGAGAC


sequence is






GTCTCTTAGGCTGACCTTCACCCAGGCGAGA


SEQ ID NO:






AGCACTCCCTCAGCAGAGCCAGCCCACGTGC


28.






ACTCGCCGAGCTCCAGGCCTGGCGCTGGCTA









CCTGCCTCCAGAGCTTTTTCTTCAGGAACACT









CCTTTTCTGTGTG
TAATGATCTGGGATGACCTG






AAGAAGAAGACTGTTATTGAAATAGAATTTTCT





ACAGAAGTCAAGGCAGTCAAGCTGCGGCGAGAT





AG






44438
VPS29
TTGGTGTTGGTATTAGGAGATCTGCACATCCCAC
9




ACCGGTGCAACAGTTTGCCAGCTAAATTCAAAA
The underlined




AACTCCTGGTGCCAGGAAAAATTCAGCACATTC
exon inclusion




TCTGCACAGGAAACCTTTGCACCAAAGAGAGTT
sequence is




ATGACTATCTCAAGACTCTGGCTGGTGATGTTCA
SEQ ID NO:




TATTGTGAGAGGAGACTTCGATGAGGCTGGGCA
29.






CAGAGTAAGTTTCTTCACTTAGCTCCTACTAA









CAGTGGTGGTTGGGTGGCTGTTTACTGACTG









GATTTCTTACCCTTTTAAGGTCTGTTGAAAGG









AAGTAACCGAATTCCCATGCTTTGATTGGGTT









GGCTCTTTATTTTAATTTAATAAGACTGCCAT









TTCCAGGATCTTTTGCTTTCTTAAAGGACTCT









ATCATCTATGTCTATCCCGATTTGTCAAAGTG









TGGAATTTGGGCGGGAACATGTTTCAAAGTAT









GACACGTGTTATGTAACACTATTTCCCCATAA









CTTTGTCATCAGCAGGAAACCAGAGGATTCTG









TCCTAGTAAGGATCCCTACTAATTTGAAATGA









TTGTGTGGTCATTCATACAGTTATATCTTTAG









ACTGCTAATAGTCTTGAGTCTTGGAGATAATC









CACAGTACTTTATAGAATTAGGTCATCAATCA









TTATAAAGTACCATGTCTTACTAATGTTCTTT









CTGGTACATTCAGATTGAACAGCTCATTCATT









ATTAGTACCAAACATTTCAACCTGTTGTAGAC









ATATACCCTTTTATGAGTTTGGGGTGGTGGTT









GTTGTTGTTGTTCTTCTTCTTCTTTTAAATATA









GAAATCTATTATTTTTACCTTTTTCTCAAAGCA









AGATTCCCATACTAACTATGTACTTCAATCCA









TATCAGAAGGAATCCCCCTCTAAAATGAAGAT









TGTTCTATATCCAG
GAGCCTGAGGAAGAGGGC






GGCGACGGTGGTGGTGACTGAGCGGAGCCCGGT





GACAGGATG






48175
E4F1
ATCTTCCTGCGGCGCGTTGCGACATGGAGGGCG
10




CGATGGCAGTGCGGGTGACGGCCGCTCATACGG
The underlined




CAGAAGCCCAGGCCGAAGCCGGGCGGGAAGCG
exon inclusion




GGCGAGGGTGCAGTTGCGGCGGTGGCGGCGGCC
sequence is




TTGGCCCCCAGCGGCTTCCTCGGCCTCCCGGCGC
SEQ ID NO:




CCTTCAGCGAGGAAGCTTGGAGAAGGGCAGTG
30.






CCCTCATGGCGAGGAGTCCCTTTAGAGGTTG









CTGGGCCTGCTTGTGGCCTTGTCTGGTGTGA









AATGGGCTGG
ATGAGGACGATGTGCACAGATG






CGGCCGCTGCCAGGCAGAGTTCACCGCCTTGGA





GGATTTTGTTCAGCACAAGATTCAGAAGGCCTG





CCAGCGGGCCCCTCCGGAGGCCCTGCCTGCCAC





CCCTGCCACCACAGCGTTGCTGGGCCAGGAG






49765
TEN1-
GGGGCGATGTCCGCGTCGTGGCTGGGGCCGGTC
11



CDK3
GCGGGGCAGACTAATCCCCTGCTCCTGGCCAGG
The underlined




GGAGGCTCCCGAGCGGATCCTCGGGAAAGGGGC
exon inclusion




TCCGAAGGTCAAGAAACTGCCCTGCTGGGCGTC
sequence is




CGGGGAGTGGGAAAATAAAGCACTTTTTGTATC
SEQ ID NO:




CCGCCCCTCCCCCGTCACGTGACCACGCGAGGC
31.




GGAAAGAAGAAATCCGAGGACCGGCGACGCCT





AGAACAGGGTCTTACTCTATTGCCGAGGCTAC







AGTATAGTGGTGTGATCATAGCTCACTGCAGC









TTCAACCTCCTGTGGTGGTGATCCTCCTGCCT









CAGCCTCCTAAGTTGCTGGGACTACAG
GAGCC






CATGATGCTGCCCAAACCTGGGACCTATTACCTC





CCCTGGGAGGTTAGTGCAGGCCAAGTTCCTGAT





GGGAGCACGCTGAGAACATTTGGCAG






5134
PLEKHA6
GCAACTCGCACAGCCCGCAAAGCCGTCGCCTTT
12




GGCAAGCGCTCACACTCCATGAAGCGGAACCCC
The underlined




AATGCACCTGTCACCAAGGCGGGCTGGCTCTTC
exon inclusion




AAACAGTTGCTGAGTGCTTGTTATGGCTGGAT
sequence is






ACCTTGCTGGCTCTGGTGATAAAGAGATGAA


SEQ ID NO:






AAAGACAAAAGTTCCTCCCTGCAAAGAGCTCA


32.






TGGTGCAATGGAAGAGATAGAAAGCTGCATT









GTGACAG
ATCGACCTTGGACATGTCCAATAAAA






CAGGTGGGAAACGCCCGGCTACCACCAACAGTG





ACATACCCAACCACAACATGGTGTCCGAGGTCC





CTCCAGAGCGGCCCAGCGTCCGG






56552
GNAZ
GGCAAAGCTCAGAGGAAAAAGAAGCAGCCCGG
13




CGGTCCCGGAGAATTGACCGCCACCTGCGCTCA
The underlined




GAGAGCCAGCGGCAACGCCGCGAAATCAAGCTG
exon inclusion




CTCCTGCTGGGCACCAGCAACTCAGGCAAGAGC
sequence is




ACCATCGTCAAACAGATGAAGATCATCCACAGC
SEQ ID NO:




GGCGGCTTCAACCTGGAGGCCTGCAAGGAGTAC
33.




AAGCCCCTCATCATCTACAATGCCATCGACTCGC





TGACCCGCATCATCCGGGCCCTGGCCGCCCTCAG





GATCGACTTCCACAACCCCGACCGCGCCTACGA





CGCTGTGCAGCTCTTTGCGCTGACGGGCCCCGCT





GAGAGCAAGGGCGAGATCACACCCGAGCTGCTG





GGTGTCATGCGACGGCTCTGGGCCGACCCAGGG





GCACAGGCCTGCTTCAGCCGCTCCAGCGAGTAC





CACCTGGAGGACAACGCGGCCTACTACCTGAAC





GACCTGGAGCGCATCGCCGCAGCTGACTATATC





CCCACTGTCGAGGACATCCTGCGCTCCCGGGAC





ATGACCACGGGCATTGTGGAGAACAAGTTCACC





TTCAAGGAGCTCACCTTCAAGATGGTGGACGTG





GGGGGGCAGAGGTCAGAGCGCAAAAAGTGGAT





CCACTGCTTCGAGGGCGTCACAGCCATCATCTTC





TGTGTGGAGCTCAGCGGCTACGACCTGAAACTC





TACGAGGATAACCAGACAGGAAGTGGTGAACT







GGGGAGTCAGACAAGAGCATCATGCTTCTTA









AAAGCCCAGACCCCTGGCTATAACACATCGA









AGATTCTCAGAAGAGAATTGAGGAGCGGACA









GGCGCCACACTCCGTTGTGGTCACTGCCTCTT









CCTGGCCCACCACACTCCTGTCCTCTGCATGT









ACTGAGAGCTCTGTCCAGGATGCCAGGGTCC









TGCCTCGGCAGAGAGGCGGTGCCAGATGCCC









CACAGCAGCTGGTGGGAGTGCCCACAGCTGG









AGGGCAGGGGAGGAGCCTGGCCTCTGGCTGG









TGTTTCCTTCCCAGCTCTCAAGAACTGGAGAC









TTTGGTTACAGAAGTGAAGGCTGCTCCCTCAC









AGACTTCCTAGTGTCCGATGGTACCACATGGA









AGGATCAGAGTTTTGAAGGACTGGGCCAGAA









CCCAGATAGGGCACAAGGCTGCCAGCGCCTG









CATTGAGGGAGCTATGATGTGACGGGGGCTC









CTGCAGAAGATGGCCTTCCTTGTACAG
AGTCG






GATGGCAGAGAGCTTGCGCCTCTTTGACTCCATC





TGCAACAACAACTGGTTCATCAACACCTCACTCA





TCCTCTTCCTGAACAAGAAGGACCTGCTGGCAG





AGAAGATCCGCCGCATCCCGCTCACCATCTGCTT





TCCCGAGTACAAGGGCCAGAACACGTACGAGGA





GGCCGCTGTCTACATCCAGCGGCAGTTTGAAGA





CCTGAACCGCAACAAGGAGACCAAGGAGATCTA





CTCCCACTTCACCTGCGCCACCGACACCAGTAAC





ATCCAGTTTGTCTTCGACGCGGTGACAGACGTCA





TCATACAGAACAATCTCAAGTACATTGGCCTTTG





CTGAGGAGCTGGGCCCGGGGCCCGCCTGCCTAT





GGTGAAACCCACGGGGTGTCATGCCCCAACGCG





TGCTAGAGAGGCCCAATCCAGGGGCAGAAAACA





GGGGGCCTAAAGAATGTCCCCCACCCCTTGGCC





TCTGCCTCCTTGGCCCCACATTTCTGCAAACATA





AATATTTACGGATAGATTGCTAGGTAGATAGAC





ACACACACATGCACACACACACATCTGGAGATG





GCAAAATCCTCTAAAATGTCGAGGTCTCTTGAA





GACTTGAGAAGCTGTCACAAGGTCACTACAAGC





CCAACCTGCCCCTTCACTTTGCCTTCCTGAGTTG





GCCCCACTCCACTTGGGGGTCTGCATTGGATTGT





TAGGGATAGGCAGCAGGGCTGAGGCAAGGTAG





GCCAACTGCACCCCTGTCGCCTGGAGGAGGGCC





AGCTCGCTGCCCGAGCTCTGGCCTAGGGACCTTG





CCGCTGACCAAGAGGGAGGACCAGTGCAGGGTC





TGTGCACCTTCCCTGCTGGCCTGCACACAGCTGC





TCAGCACCACTTTCATTCTGGACCTGGGACCTTA





GGAGCCGGGTGACAGCACTAACCAGACCTCCAG





CCACTCACAGCTCTTTTTAAAAAACAGCTTCAAA





ATATGCAGCAAAAACCAATACAACAAAACGAGT





GGCACGATTTATTICAAACTAGGCCAGCTGGGA





TTCCAGCTTTTCTTCTACTAGTCTGATGTTTTATA





AATCAAAACCTGGTTTTCCTTCTCTGACATTTTTT





TTTTGTTTTGTTTTTTGGTTTTTTTTTTTTTTTGGC





CAAATCTCGTGGTGTTTCGCAGAAAAAAATCCA





GAAAATTTCAAATGCAGTTGAGTATTCTTTTTTA





AATGCAGATTTTCAAAACATATTTTTTTTCAGGT





GGTCTTTTTTGTGTCTGGCTTGCTGAGTGTAAAA





GTTGTTATCTGGACGATCTGTCTCTCTGCTCCAA





AGAAATTTTGGAGTGAGTGGCAGTCCTGCGCCA





GCCTCGCGGGACACGTGTTGTACATAAGCCTCTG





CAGTGTCCTCTTGTTAATGGTGGGGTTTTCTGCT





TTGTTTTTATTTAAGAAAATAAACACGACATATT





TAAAGAAGGTTCTTTCACCTGGGAGCAAATGAA





CAATAGCTAAGTGTCTTGGTATTTAAAGAGTAA





ATTATTTGTGGCTTTGCTGAGTGAAGGAAGGGG





AGCAAGGGGTGGTGCCCCTGGTCCCAGCATGCC





CCGCGCCTGAGACTGGCTGGAAATGCTCTGACT





CCTGTGAAGGCACAGCCAGCGTTGTGGCCTGAG





GGAGGCCCTGCTGGGACCCTGATCTGGGCCTTCC





TGTCCCAGGGCCTATGGGCAACTGCGTTGAAAG





GACGTTCGCCAAGGGCCGTGTGTAAATACGAAC





TGCGCCATGGAGAGGAGAGGCACTGCCGGAGCC





CTTGCCAGATCTCCCTCCCTCTCTCCGTGCAGTA





GCTGTGTGTCCGAGGTCAGTGTGCGGAATCACA





GCCAAGGACGTGAAGAGATGTACGGGGGAAAG





AGAAGCTGGGGATTGGATGAAAGTCAAAGGTTG





TCTACTTTAAGAAAATAAAATACCCTG






5696
TTC3
CCGTCGGCTGACGTGGAGGGCCGGAGGTGGCGG
14




CGGCGGCGGCGGCGGCTGCTGCTGCTGCTGCCC
The underlined




GCGTCCGAGGCTCGCGGGCGGCGGGCCCGGTAT
exon inclusion






TTGATAAATTCAAAATATATGTAAAACATATG


sequence is






CAAGCTGTATAGCAGAACAATAAAATGAACAC


SEQ ID NO:






CTATGAATTCACCACTCAATCCAATAATCAAA


34.






ATGACCAGTATTGAATGTGCTTACTTCCAGAG







AAATGCACTCGGTGATGGAAAGAGAGCCACTAT





TCTGAAGAACACTTGGCCAAAG






57139
RNF8
GGCGAGCGGAGCCTGCTTTCGCAGCGATCGCGA
15




GCGTGTGGCGATTGCTTCTGTCTGTTATTTAGAT
The underlined




ATGGAAGCTGAGGGGATGCACAGAGGCAGCCA
exon inclusion




GAACCTAGGTCAGGGTCTCGCTCGGTGCTGACC
sequence is




GCCCCCGGGGTCGAGTAGGCGATGGGGGAGCCC
SEQ ID NO:




GGCTTCTTCGTCACAGGAGACCGCGCCGGTGGC
35.




CGGAGCTGGTGCCTGCGGCGGGTGGGGATGAGC





GCCGGGTGGCTGCTGCTGGAAGATGGGTGCGAG







GGTTGTTATGAACTAGACTGGTCCAACAGGA









AAGTATGATAGATGTGAACTGGGGCTTCTTTT









CAACCTTTTCCGGAAGCTCTCAAGCTGTTCTT









GTGGATAAGACAGAGAATATGTACTCCAATG









CAAAGACTTTTGGTTGAATTATAACTGGCTGA









AG
GTGACTGTAGGACGAGGATTTGGTGTCACAT






ACCAACTGGTATCAAAAATCTGCCCCCTGATGAT





TTCTCGAAACCACTGTGTTTTGAAGCAGAATCCT





GAGGGCCAATGGACAATTATGGACAACAAG






57874
ZDHHC13
CCAGCAGGAAGTGGGAGAAGAGGCGACCCAAG
16




GCGGGCTGGCGGGCTGGCGGCAGTCGCTACTTG
The underlined




CCTAGTAGCCTCAGCCGCTGTGGGCTCCTGGGG
exon inclusion




AGATGGAGGGGCCGGGGCTGGGCTCGCAGCCTT
sequence is






GACTTGAGCCCTGGAAATAAGCATCAGTGCA


SEQ ID NO:






GACGAGTGCTCTATGAGAAGCTATCTAGTTAA


36.






AGCTCAAGGAGCCACAAAGGGATTTCCTGGC









AGCACAGTCACCAGAAACACTGAGGGAGAAC









TCTCTGAACAGAGGAATTGTGACCCCAAGAC









AGTAGTTTTTAGACGTGACACCAAAAGCACAA









TCCATAAAAGAACAAATTGATAAATTGGACTT









TTTTAAAATTTAAAACTTCTGCTCTATGAAAC









AGACTTTTAAGAGATGGGAAG
TGCAGGAATCA






CAGCCATGGCCCCCACCCTCCAGGATTTGGTCGA





TATGGCATCTGTGCACATGAAAACAAAGAACTT





GCCAATGCAAGAGAAGCTCTTCCTCTTATAGAG





GACTCTAGTAACTGTGACATTGTCAAAGCTACTC





A






60615
SH3GLB2
ATTTCCCGGCACCTTCGTGGGCACCACAGAGCCC
17




GCCTCCCCACCCCTGAGCAGCACCTCACCCACCA
The underlined




CTGCTGCGGCCACTATGCCTGTGGTGCCCTCTGT
exon inclusion




GGCCAGCCTGGCCCCTCCGGGGGAGGCCTCGCT
sequence is




CTGCCTGGAAGAGGTGGCCCCCCCTGCCAGTGG
SEQ ID NO:




GACCCGCAAAGCTCGGGTGCTCTATGACTACGA
37.




GGCAGCCGACAGCAGTGAGCTGGCCCTGCTGGC





TGATGAGCTCCCAGGGTGCCATGTGAACCACC





TGCGCTGCCTCCACGAGTTCGTCAAGTCTCAGAC





AACCTACTACGCACAGTGCTACCGCCACATGCT





GGACTTGCAGAAGCAGCTGGGCAG






62560
ITFG1
GAATTTATCATGGCATCCAGCATTGACCACTACA
18




AGTAAAATGCGAATTCCACATTCTCATGCATTTA
The underlined




TTGATCTGACTGAAGATTTTACAGCAGCCATAC
exon inclusion






CACCCTGAACGCGCCCCATCTCTTCTGATCTC


sequence is






GGAAGCTAACCAAGGTCAGACCTGGTTAGTG


SEQ ID NO:






CTTGGATGGGAGATCACCTATTACTTTTTCT
T

38.




TTCAATGGTGATCTAATTCCTGATATTTTTGGTA





TCACAAATGAATCCAACCAGCCACAGATACTAT





TAGGAGG






6785
SPATS2
CTGCTGGCTACCAATATTCTACTTTCTGTCTCTAT
19




GAATGTGACTACCCTGGTTACCTCATATTTATTT
The underlined






GCAGTGACTTAAAATTTGGAGGCAAATTTTCC


exon inclusion






TTAAGAGGATATCAAGTTCCAGTATCTTCAGA


sequence is






TGTTGATAAGCCGTTAG
AATCTCCCTGGAAAA

SEQ ID NO:




GGAGACATGAATGTCTGCAATGATACTTCCTGA
39.




CAAGAAGTTGATACAAGAAAAGGAAAGGAGAT





TAACAGCTAGTGAGCAGAATTTCGAACAGCAGG





ATTTCGTATTTTTTGCTTCCAACTGCACACTTCCG





TTGCCCACTTTTAAATCAGAGATACCTACACTCA





AAACCCAGACAAGGCAAAAGGATACTTTTCTTG





TATATTTTTTGAGATCGAAGAAACGACAATGTCC





AGGAAACAGAACCAGAAGG






8742
DHRS11
GATCGGACCCAAGCAGGTCGGCGGCGGCGGCAG
20




GAGAGCGGCCGGGCGTCAGCTCCTCGACCCCCG
The underlined




TGTCGGGCTAGTCCAGCGAGGCGGACGGGCGGC
exon inclusion




GTGGGCCCATGGCCAGGCCCGGCATGGAGCGGT
sequence is




GGCGCGACCGGCTGGCGCTGGTGACGGGGGCCT
SEQ ID NO:




CGGGGGGCATCGGCGCGGCCGTGGCCCGGGCCC
40.




TGGTCCAGCAGGGACTGAAGGTGGTGGGCTGCG





CCCGCACTGTGGGCAACATCGAGGAATTTTGAG







TCTAGAGGAGGAAGCGGGAAGATGTACACCA









GGGGAGGGGAAAGCTGCAGTCTTCCTTGCCC









ACAGTCTGCTTTGATTGATTCAGTCATTGATG









TTAAAGCAGAATTTGGGTTCTAGCTTCCTACA









GAGAAAACTCCTGTTTCCTGAAGTGATCAAAT







GAGCTGGCTGCTGAATGTAAGAGTGCAGGCTAC





CCCGGGACTTTGATCCCCTACAGATGTGACCTAT





CAAATGAAGAGGACATCCTCTCCATGTTCTCAGC





TATCCGTTCTCAGCACAGCGGTGTAGACATCTGC





ATCAACAATGCTGGCTTGGCCCGGCCTGACACC





CTGCTCTCAGGCAGCACCAGTGGTTGGAAGGAC





ATGTTCAAT
















TABLE 7







Exon Exclusion Event Sequences










Splicing





Event Id
Gene Name
cDNA Sequence
SEQ ID NO:





1506
CENPK
AATCTTTAATGAACTGAAAACTAAAATGCTTAA
41




TATAAAAGAATATAAGGAGAAACTCTTGAGTAC
The underlined




CTTGGGCGAGTTTCTAGAAGACCATTTTCCTCTG
exon exclusion




CCTGATAGAAGTGTTAAAAAGAAAAAGGGAAC
sequence is






AACGGTGGTTGGATGAACAGCAACAGATAAT


SEQ ID NO:






GGAATCTCTTAATGTACTACACAGTGAATTG


73.






AAAAATAAGGTTGAAACATTTTCTGAATCAA


The sequence






G
TTCCAAAAGCTGAGACAAGATCTTGAAATGGT

without the




ACTGTCCACTAAGGAGTCAAAGAATGAAAAGTT
underlined




AAAGGAAGACTTAGAAAG
exon exclusion





sequence is





SEQ ID NO:





105.


2098
METTL5
AACTTCGATATGACCTGCCAGCATCATACAAGTT
42




TCACAAAAAGAAATCAGTAAGTCTCTTGATTTTG
The underlined




GCTGGTCTACATTCGGTATTGAAAAGCTTTCTGG
exon exclusion




GCCGGATGTGGTGGTTCATGCCTGTAATCCCAGC
sequence is




TACTCGGGAGGCTGAGGCAAGAGAATCGCTTGA
SEQ ID NO:




ACTCAGGAGGCAGAGGTTGCAGTGAGCTGAGAT
74.




TGCCCCACTGAACTCCAGCCTGCGCGATAAGAG
The sequence




TGAGACTCAGTCTCGAAAAAGAAAAAAAAAGGA
without the




AAGCTTTGTGACAAGTAATTATTTCTAGTGTTAC
underlined




CAACTTTCCTGTGTAAATATACAAAGCCAGCCTA
exon exclusion




GGAGACACCATAAATGGCCTGTGGGAAAGGCCC
sequence is




ATCGTCAATAGCTAATATTCTAGTTCTTTCCTAA
SEQ ID NO:




ATGCTTTGGGTACAAAAAGAAAAAAAAAATCAA
106.




AAACTGTTTTTGCTCTTTTCATATAGTATATATTT





TATTAGTTAGTTTGTACTAATACATTCTCATATTA





CAAAGGCAATTTAATGGAAGAATCTTCCTTTTGA





TATTTGAATCATCTGAAATAACACAAACAGAAC





AATACATTCAAAGAAATCTCATTTGCATAACAA





AAAGACAAGTTAAACAACAAAAAAATTTTTCCT





TTCTCACAGGTGGACATTGAAGTGGACCTAATTC





GGTTTTCCTTTTAAAAGCCCCGCAAACAAAAGTC





GTTTAAAACCTATTTAAAATGAATAAAAAATTGG





TTCATGTTCAAAAGAAAGCTGCAGAATGGAAA







ATCAAGATAGATATTATAGCAG
GGACAGATAT






GGCTTTTCTAAAGACTGCTTTGGAAATGGCAAGA





ACAGCAGTATATTCCTTACACAAATCCTCAACTA





GAGAA






2242
PLA2R1
ATTCCAAGTCACAATACCACTGAAGTTCAGAAA
43




CACATTCCTCTCTGTGCCTTACTCTCAAGTAATC
The underlined




CTAATTTTCATTTCACTGGAAAATGGTATTTTGA
exon exclusion




AGACTGTGGAAAGGAAGGCTATGGGTTTGTTTGT
sequence is




GAAAAAATGCAAGCTTTCATTACTATGAATCTT
SEQ ID NO:






TTTGGCCAGACCACCAGTGTGTGGATAGGTTT


75.






ACAAAATGATGATTATGAAACATGGCTAAATG


The sequence






GAAAGCCTGTGGTATATTCTAACTGGTCTCCA


without the






TTTGATATAATAAAT
TGCCTTCTGCTGAATATCC

underlined




CCAAAGACCCAAGCAGTTGGAAGAACTGGACGC
exon exclusion




ATGCTCAACATTTCTGTGCTGAAGAAGGGGGGA
sequence is




CCCTGGTCGCCATTGAAAGTGAGGTGGAGCAAG
SEQ ID NO:





107.





7106
RHOH
AGAGAGAGAGAGAGAGAGAGGAGAGGAGGGGC
44




GGGGTGGGGGAGGAGGGGAGTGGGGAGAGAGA
The underlined




AAGAGAGAAACACCAAAAAGACATTTTCAAGGA
exon exclusion




AGGAAGAAAATTAGATGGCAACCCCCTGTCCCC
sequence is




TCCCCCTAAGAAAATCCTCTCTGAGATTAAACTG
SEQ ID NO:




TGTGAAGATTAGAGGCGTGTAGGTCAGGAGCAG
76.




GAGGAAGCCCAACGCTGGACTGTACCAGATCAT
The sequence




CTAAAACTGGCAATTCCAGGCACAGAAAACCAG
without the




TTCTTCAGAAGCAGAAGGGTGGTCAGCCAGGGG
underlined




GTGAAAGGGACAGGGGTCTCGCAGCCAGCCCAA
exon exclusion






CTGTTGTATTTTCAGTTCTTCCAGTGTGAATC


sequence is






AGTTAATATTCTCGGGAACGAGGGAGAGGTT


SEQ ID NO:






GATCCTATGAGGAAATCAACCACAGTGAAAA


108.






GGCTTGGGCCGCTTTTGTTTTCACCTGCTTTT









GTTGAACAAATTTGATTTCCGGAGTCAGTCAT









TTTACTGTCAAGACATTTCTTCGGCATTCTGC









AACAG
TTTCCAACATGGCTAGATCCATCAGAAA






CTGAAGCCGTGGAGAACGCTCTCGGGGCCTTTG





CCACTTCTTGGAGTAGAAGCCGACAGAGAGCTG





TTTGGAAACTTCTCCTTCACACACCAG






7108
RHOH
GAGAGAGAAAGAGAGAAACACCAAAAAGACAT
45




TTTCAAGGAAGGAAGAAAATTAGATGGCAACCC
The underlined




CCTGTCCCCTCCCCCTAAGAAAATCCTCTCTGAG
exon exclusion




ATTAAACTGTGTGAAGATTAGAGGCGTGTAGGT
sequence is




CAGGAGCAGGAGGAAGCCCAACGCTGGACTGTA
SEQ ID NO:




CCAGATCATCTAAAACTGGCAATTCCAGGCACA
77.




GAAAACCAGTTCTTCAGAAGCAGAAGGGTGGTC
The sequence




AGCCAGGGGGTGAAAGGGACAGGGGTCTCGCAG
without the




CCAGTTCTTCCAGTGTGAATCAGTTAATATTC
underlined






TCGGGAACGAGGGAGAGGTTGATCCTATGAG


exon exclusion






GAAATCAACCACAGTGAAAAGGCTTGGGCCG


sequence is






CTTTTGTTTTCACCTGCTTTTGTTGAACAAATT


SEQ ID NO:






TGATTTCCGGAGTCAGTCATTTTACTGTCAAG


109.






ACATTTCTTCGGCATTCTGCAACAG
TTTCCAAC






ATGGCTAGATCCATCAGAAACTGAAGCCGTGGA





GAACGCTCTCGGGGCCTTTGCCACTTCTTGGAGT





AGAAGCCGACAGAGAGCTGTTTGGAAACTTCTC





CTTCACACACCAG






9442
QPRT
GCCTGGCGCTGCTGCTGCCGCCCGTCACCCTGGC
46




AGCCCTGGTGGACAGCTGGCTCCGAGAGGACTG
The underlined




CCCAGGGCTCAACTACGCAGCCTTGGTCAGCGG
exon exclusion




GGCAGGCCCCTCGCAGGCGGCGCTGTGGGCCAA
sequence is




ATCCCCTGGGGTACTGGCAGGGCAGCCTTTCTTC
SEQ ID NO:




GATGCCATATTTACCCAACTCAACTGCCAAGTCT
78.




CCTGGTTCCTCCCCGAGGGATCGAAGCTGGTGCC
The sequence




GGTGGCCAGAGTGGCCGAGGTCCGGGGCCCTGC
without the




CCACTGCCTGCTGCTGGGGGAACGGGTGGCCCT
underlined




CAACACGCTGGCCCGCTGCAGTGGCATTGCCAG
exon exclusion




TGCTGCCGCCGCTGCAGTGGAGGCCGCCAGGGG
sequence is




GGCCGGCTGGACTGGGCACGTGGCAGGCACGAG
SEQ ID NO:




GAAGACCACGCCAGGCTTCCGGCTGGTGGAGAA
110.






TGTGGTGGCCGCCGGTGGCGTGGAGAAG
GCG






GTGCGGGCGGCCAGACAGGCGGCTGACTTCACT





CTGAAGGTGGAAGTGGAATGCAGCAGCCTGCAG





GAGGCCGTGCAGGCAGCTGAGGCTGGTGCCGAC





CTTGTCCTGCTGGACAACTTCAAGCCAGAG






10439
IL17RB
TGGACATTTTCCTACATCGGCTTCCCTGTAGAGC
47




TGAACACAGTCTATTTCATTGGGGCCCATAATAT
The underlined




TCCTAATGCAAATATGAATGAAGATGGCCCTTCC
exon exclusion




ATGTCTGTGAATTTCACCTCACCAGGCTGCCTA
sequence is






GACCACATAATGAAATATAAAAAAAAGTGTGT


SEQ ID NO:






CAAGGCCG
GAAGCCTGTGGGATCCGAACATCAC

79.




TGCTTGTAAGAAGAATGAGGAGACAGTAGAAGT
The sequence




GAACTTCACAACCACTCCCCTGGGAAACAGATA
without the




CATGGCTCTTATCCAACACAGCACTATCATCGGG
underlined




TTTTCTCAGGTGTTTGAG
exon exclusion





sequence is





SEQ ID NO:





111.





11685
STAU1
AAAGCATAACCCCTACTGTAGAACTAAATGCAC
48




TGTGCATGAAACTTGGAAAAAAACCAATGTATA
The underlined




AGCCTGTTGACCCTTACTCTCGGATGCAGTCCAC
exon exclusion




CTATAACTACAACATGAGAGGAGGTGCTTATCC
sequence is




CCCGAGAGTTTATTAACCACTTAACCTCTCAG
SEQ ID NO:






AACTGAACAAAGACAACATTGTTCCTGGAACG


80.






CCCTCTTTTTAAAAAAG
GGGCTGCGGGCGCCTG

The sequence




AGCGGCTCTTCAGCGTTTGCGCCGGCGGCTGCCG
without the




CGTCTCTCTCGGCTCCCGCTTCCTTTGACCGCCTC
underlined




CCCCCCCCGGCCCGGCGGCGCCCGCCTCCTCCAC
exon exclusion




GGCCACTCCGCCTCTTCCCTCCCTTCGTCCCTTCT
sequence is




TCCTCTCCCTTTTTTCCTTCTTCCTTCCCCTCCTCG
SEQ ID NO:




CCGCCACCGCCCAGGACCGCCGGCCGGGGGACG
112.




AGCTCGGAGCAGCAGCCAG






13451
LYRM1
AGAGTACCCAGAGAAGGAGAAGCCAGCAAAGG
49




AGACGACACAGACAAGACCTCAGAGATCAAAGG
The underlined




AAGAGGCCCCTTAATATCCTGGAATAATGGGAC
exon exclusion




CCATCCCCGTAATCAGTGAATCTCATCCACCCGC
sequence is




TTGCCAGCTTCTACCCGCAGCAAGTAGAAGCTA
SEQ ID NO:




AGTCCTGGCTCAAATCTCTTCCCTCCCTCCCTCTC
81.




CCAGCTGTCAGTGCTTTTGGACTTGTGCTCAGAT
The sequence






GACAACGGCAACACGACAAGAAGTCCTTGGC


without the






CTCTACCGCAGCATTTTCAGGCTTGCGAGGAA


underlined






ATGGCAGGCGACATCAGGGCAGATGGAAGAC


exon exclusion






ACCATCAAAGAAAAACAGTACATACTAAATGA


sequence is






AGCCAGAACGCTGTTCCGGAAAAACAAAAAT
C

SEQ ID NO:




TCACGGACACAGACCTAATTAAACAGTGTATAG
113.




ATGAATGCACAGCCAGGATTGAAATTGGACTGC





ATTACAAGATTCCTTACCCAAGGCCA






14574
PPARG
CCATCAGGTTTGGGCGGATGCCACAGGCCGAGA
50




AGGAGAAGCTGTTGGCGGAGATCTCCAGTGATA
The underlined




TCGACCAGCTGAATCCAGAGTCCGCTGACCTCCG
exon exclusion




GGCCCTGGCAAAACATTTGTATGACTCATACATA
sequence is




AAGTCCTTCCCGCTGACCAAAGCAAAGGCGAGG
SEQ ID NO:




GCGATCTTGACAGGAAAGACAACAGACAAATCA
82.






CCATTCGTTATCTATGACATGAATTCCTTAAT


The sequence






GATGGGAGAAGATAAAATCAAGTTCAAACAC


without the






ATCACCCCCCTGCAGGAGCAGAGCAAAGAGG


underlined






TGGCCATCCGCATCTTTCAGGGCTGCCAGTTT


exon exclusion






CGCTCCGTGGAGGCTGTGCAGGAGATCACAG


sequence is






AGTATGCCAAAAGCATTCCTGGTTTTGTAAAT


SEQ ID NO:






CTTGACTTGAACGACCAAGTAACTCTCCTCAA


114.






ATATGGAGTCCACGAGATCATTTACACAATGC









TGGCCTCCTTGATGAATAAAGATGGGGTTCTC









ATATCCGAGGGCCAAGGCTTCATGACAAGGG









AGTTTCTAAAGAGCCTGCGAAAGCCTTTTGGT









GACTTTATGGAGCCCAAGTTTGAGTTTGCTGT









GAAGTTCAATGCACTGGAATTAGATGACAGC









GACTTGGCAATATTTATTGCTGTCATTATTCT









CAGTGGAG
ACCGCCCAGGTTTGCTGAATGTGAA






GCCCATTGAAGACATTCAAGACAACCTGCTACA





AGCCCTGGAGCTCCAGCTGAAGCTGAACCACCC





TGAGTCCTCACAGCTGTTTGCCAAGCTGCTCCAG





AAAATGACAGACCTCAGACAGATTGTCACGGAA





CACGTGCAGCTACTGCAGGTGATCAAGAAGACG





GAGACAGACATGAGTCTTCACCCGCTCCTGCAG





GAGATCTACAAGGACTTGTACTAGCAGAGAGTC





CTGAGCCACTGCCAACATTTCCCTTCTTCCAGTT





GCACTATTCTGAGGGAAAATCTGACACCTAAGA





AATTTACTGTGAAAAAGCATTTTAAAAAGAAAA





GGTTTTAGAATATGATCTATTTTATGCATATTGTT





TATAAAGACACATTTACAATTTACTTTTAATATT





AAAAATTACCATATTATGAAATTGCTGATAGTAT





TTGAAGACTGAGTCTTGTGTGTTTCCCACCCTAG





CCCCCAGGCTTTCTTTTTTACCCCTTTTCCTTCTC





CCCTCCCTCCCTCCATCCCTCTCACTCTTCCTCCC





TCCCTTCCCTCCTTTCCTTCTTCCTTTATTTTTCTT





TTCTTTCTTAGACATTTTAAAATATGTGAGTGGA





ACTGCTGATACACTTTCATTCTCAGTAAATTAAT





TTTTTACTCAAT






16269
BORCS8-
ACAAAGATCATTCCACTCAGCCTGGGACGATGG
51



MEF2B
GGAGGAAAAAAATCCAGATCTCCCGCATCCTGG
The underlined




ACCAAAGGAATCGGCAGCCCGGAGGAACCACC
exon exclusion






CCCGCCCTCCTCAGCCTGATCCTGGAAGAGA


sequence is






CTCGGGGCCCCCCAGCCTCCGCCAACCCAG
C

SEQ ID NO:




GCCGTGAAGAACCTGGTGGACAGCAGCGTCTAC
83.




TTCCGCAGCGTGGAGGGTCTGCTCAAACAGGCC
The sequence




ATCAGCATCCGGGACCATATGAATGCCAGTGCC
without the




CAGGGCCACAG
underlined





exon exclusion





sequence is





SEQ ID NO:





115.





16833
ENOSF1
AGAAGCAAATGCTGGCACAAGGATACCCTGCTT
52




ACACGACATCGTGCGCCTGGCTGGGGTACTCAG
The underlined




ATGACACGTTGAAGCAGGATCCCAGGATGCTG
exon exclusion






GTATCCTGCATAGATTTCAGGTACATCACTGA


sequence is






TGTCCTGACTGAGGAGGATGCCCTAG
CCTGTC

SEQ ID NO:




TGGAAGTTACTTGTGGACATG
84.





The sequence





without the





underlined





exon exclusion





sequence is





SEQ ID NO:





116.





16929
DHRS4-
GTGCCACTTCGGATAAACCCTTTGGACTCCTAAC
53



AS1
TCCAATCAGGTGTCTGCTTTGTTGAGGACTCACA
The underlined




GACACAGTCTCCTTTCTTCAAGATCTTTACAATG
exon exclusion




CAAGACCTCACTAACACACAGGGATGGTCTCCC
sequence is




AGAGGGTCTGTGCTGTTCCTTCACTCAGAACATC
SEQ ID NO:




AAGATGCACTGAAGTAAGGATCCTCTATTCTACA
85.




GTTCCTGCTAGCTGAGCTATTCCATGGGGGCTTC
The sequence




AGCAGGAAATTCCAAGGTTGGCTTTGACAAGCT
without the




AAGGCCGGCTGGTGGAGCACATCGAGTTCTGGA
underlined




GGTTCATGTGTGTTTTCATGAAGATCTGTCTGCC
exon exclusion




CGTAGCAGATAAAGAGTTGTTGCCCCACTCCTCC
sequence is




TGGGGTCTTCTATTTTCCTGGGGGAATTTCTGG
SEQ ID NO:




ATTAACTGAACACACACACACACACACACACCC
117.




TTTTGAAGCATCAACAGTAATTCTGAGTTCTTAG





GGACAATGCAGATTAAATCCACAATAAGAAAGA





CAACTATGGCCAGGTGTGGTGGCTCACGCCTGTA





ATCCCAGAACTTTGGGAGGCTGAGGCGGATGGA





TCACCTGAGGTCAGGAGTTAGAGACCAACCTGA





CCAACATGGAGAAACCCCGTTTCTACTAAAAAT





GCAAAATTAGCCGGGCATGGTGGCAGGCGCCTG





TAATCCCAAATACTCGGGAGGCTGAGGCAGGAG





AATCACTTAAACCCGGGAGGCAGAGGTTGCAGT





GAGCCAAGATCGCGCCATTGCACTCCAGCGGCC







AGACTTTGGCAGCGTGTAAGGTCTGAGGACA









GGGGCACCGGAGGCCGAGGATGAGAGGCCA









GTGCCTGTTTCCAGGCAGCCAGGGCCTCAGA









AACTCCGGCCGGAGCACTCACCCGTCGGTGG









AGGCCGTTACCAGGGCCACCTTATTTGCGAG









CGGGTCCCGGCGGGTCATCCCGGAGCTGGCC









ATCCGCACCGAATTCCAAGCCCGGGCACAGA









GGCCTAGCAGCCCCGCCTTGTGCATGGATCA









GACCAGCAA
ACATGGGCCCCGTCCTGGGCCAAA






CGCCGGGCGATGGCGAAGCCGATCCTGTGAGCA





GAAAGAGACAAAGACTGCTAAGGCCTGTGCAGG





GGAAGAGGTCGACAGTATGAGCTCTGAAGTTAA





GACTGCCCGGGTTTGAATTCTGGCTCTTTCTCTA





TATAACCCCTACGTGTGCCTACTATGTGTAAAAC





AGGCTTAATGGCATGGCCATTTTTGGCATTCCTT





TACTTGTTTTTATTATGACCTGGACCACAGCCTC





AGTTCCCAAGAACTGACATCACTTTCTACAGTTC





CCACCATGGGTGACAGGCTTCATCCCCTCTTGGG





ACTGAGAG






16943
NDUFV2
CGCAGAATCTAGGCCTGCTCTGGCCAGATCAGTT
54




TCGAAGACCGTCGCTCCGAAGGAGGCACCTCTC
The underlined




GTTTCAAGCCTAGTGACCTCGATGCTTTTAGGTT
exon exclusion




GCAGCATACTGGAGAGCTCTGGCTTGCTTCGTGA
sequence is




AGGCTTAGGGAGAACTTCATTAGGGCTGGAAAA
SEQ ID NO:




GGGTGGCCAATGTTTGATTTACTGCAGTTGTGCT
86.




TTGCATATCGGAAATGCTGGCTAAATAAACGGT
The sequence




ATCAAACTAACTCTGAAAGAACGGCGCCGCAAA
without the




TAACAGCACCCAATTAAAGAACCACAGGATTTT
underlined




AGAGATTAAATGATCTTTTTGAGATCCAAGTACA
exon exclusion




TCTCATGGAAAAATACCTAGGTTAGAATTACT
sequence is






AAATTAAAAAATGGACACTTGGGGCCAGGCG


SEQ ID NO:






CAGTGGCTTACGCCTGTAATTCCACCACTTTG


118.






GGGAGCTGAGGCGGGCAGATCACTTGACATC









GAGAGTTCAAGACCAGCCTGACCAACATGGA










GAAACCCCGTCTCTACTAAAAATACAAAAAAT









TATCCAGACGTAGTGGCACATGCCTGTAATCT









CAGCTACTTGGGAGGCTGAGGTAGGAGAATC









GCTTGAACCCGGGAGGCAGAGGTTGTGGTGA









GCCGAGATCATGCCATTGAACTCCAGCCTGG









GCAACAAGAGCGAAACTCCGTCTCCAAAAAA









AAAAAAAGACACTTATTTAGGCTTTCCATATA









TCATG
GGAAGACATGTAAGGAATTTGCATAAGA






CAGTTATGCAAAATGGAGCTGGAGGAGCTTTATT





TGTG






18745
FER1L4
GATCCCTGGAGTTGCAGCTACCAGACATGGTGC
55




GTGGGGCCCGGGGCCCCGAGCTCTGCTCTGTGC
The underlined




AGCTGGCCCGCAATGGGGCCGGGCCGAGGTGCA
exon exclusion




ATCTGTTTCGCTGCTGCCGCCGCCTGAGGGGCTG
sequence is




GTGGCCGGTAGTGAAGCTGAAGGAGGCAGAGGA
SEQ ID NO:




CGTGGAGCGGGAGGCGCAGGAGGCTCAGGCTGG
87.




CAAGAAGAAGCGAAAGCAGAGGAGGAGGAAGG
The sequence




GCCGGCCAGAAGACCTGGAGTTCACAGACATGG
without the




GTGGCAATGTGTACATCCTCACGCTGGGTGAAG
underlined






GGGTTGGAGCATGACAAGCAGGAGACAGACG


exon exclusion






TTCACTTCAACTCCCTGACTGGGGAGGGGAA


sequence is






CTTCAATTGGCGCTTTGTGTTCCGCTTTGACT


SEQ ID NO:






ACCTGCCCACGGAGCGGGAGGTGAGCGTCCG


119.






GCGCAGGTCTGGACCCTTTGCCCTGGAGGAG









GCGGAGTTCCGGCAGCCTGCAGTGCTGGTCC









TGCAG
CTATGAGCTCAGAGTTGTCATCTGGAAC






ACGGAGGATGTGGTTCTGGACGACGAGAATCCA





CTCACCGGAGAGATGTCGAGTGACATCTATGTG





AAGAG






19824
PHF14
GCAGTGCTCGGAATGTGACCAGGCAGGGAGCAG
56




TGACATGGAAGCAGATATGGCCATGGAAACCCT
The underlined




ACCAGATGGAACCAAACGATCAAGGAGGCAGAT
exon exclusion




TAAGGAACCAGTGAAATTTGTTCCACAGGATGT
sequence is




GCCACCAGAACCCAAGAAGATTCCGATAAGAAA
SEQ ID NO:




CACGAGAACCAGAGGACGAAAACGAAGCTTC
88.






GTTCCTGAGGAAGAAAAACATGAGGTTGGAA


The sequence






TAAG
GAAAGAGTTCCTAGAGAGAGAAGACAAA

without the




GACAGTCTGTGTTGCAAAAGAAGCCCAAGGCTG
underlined




AAGATTTAAGAACTGAATGTGCAACTTGCAAGG
exon exclusion




GAACTGGAGACAATGAAAATCTTGTCAGGTAAG
sequence is




TTGGATGCTAAAACCTTGTCTTTAGGGGATGAAA
SEQ ID NO:




GTTCTATATTTATTTTCTCATCACAGAAAAAATG
120.




AAAAAACAATTGCAGGATAAGACCTTTCTTAAA





ATATTATATAGTGGAAACAGTACTTTAGAAACA





GATTTCATCCACTTCTTAACCTCTCACACATGGT





TATACTCTGGATTTAAATGTAAATAAGAGTGATA





ATCTGCCTGTTTAACACAGGGAATTATTTTTCTCT





TGACAAGAGAAATTGACAGTGCTCTCTATTTAGA





GGCCATGAAAGTAATTTGATCTAAACACTGTGTA





CTAAGATTATTATGTTTTATGTCAGAAAACAATA





AAGTTACTAAGCTCTGTTAGCATATTCTAAATGT





TTGAAATTTAGAAGCAATGGTGAGAAGACAGAC





TTTTTATTGACAAGAACTTAATTAGCACTTTCTTA





TTGCTTATCAAAACAAATGTGTTAAATGCTTCTC





CCTTACGAAATAAAGAAAGGTGAAAAGATGGCC





TAGGTTGATTTTATTTTTTGTTTTGTCTTTGTTTCT





TTGTTTCGTTTTGGTACTTTATTTTTTTTTAATCA





GACATAATGCTAATCAGAAATCTTAGCTGATGCT





GCACATTGGCTTTTCCCAACGGTCCAGAGGCTGC





TAATTTTAGCGGAAATGAAGACATTGATCAAAG





CTCTGGTGAGATGGGGGAGTGAGTGTGTGAACA





AAAAGAGAGCTAATTTAAAAGAGGCATCAGACT





TTCAAAGGACAGTGTCACAAAAGTTCTTACAGTT





CTTACAGGGACTTTGTAAGGGAATCCATTCTTAT





TTCTTTAAAAAATTGTCTTCTGGTAAAGCCCTGT





TAAATTAACTGAGGACACAGAAATTAAACATTT





CAAAAAGAATAAACATATTGATAAAACAAATAT





ATTAGTGTTGTTGTATGTTTTTAAATACTTACTTC





CAAATGATTTAATCTATTTTGGTCATTAAAATAT





GTCTTAATTTCTCAAAGAAAGGCATGAAGTCTTA





AATTTTATGAGTTTTTTATGCTATCAATGAGAAA





GATAAAGTAAAAATTACAGTAGAAAAAGACAAA





GTCCTTCAACAAAGTTAAGAAAGTTTATAATAAT





TGGCTAATTTTTTTGAGGTAGTTCATGTAGAGTG





TGTTGGGAGCTATCCTGAAGGTTAAGTTTATTAA





AATTTAGGGTAAAGTAGTAAGTAGTTCCAAGTTC





AGGAGATACACCTGAATAATTCTGACCACAGTA





TAAATTTTGCAATATGTCGAAAATGAAATCCCAA





GCATAAGCGTAACATAATGGAGTAAAT






19828
PHF14
GCAGTGCTCGGAATGTGACCAGGCAGGGAGCAG
57




TGACATGGAAGCAGATATGGCCATGGAAACCCT
The underlined




ACCAGATGGAACCAAACGATCAAGGAGGCAGAT
exon exclusion




TAAGGAACCAGTGAAATTTGTTCCACAGGATGT
sequence is




GCCACCAGAACCCAAGAAGATTCCGATAAGAAA
SEQ ID NO:




CACGAGAACCAGAGGACGAAAACGAAGCTTC
89.






GTTCCTGAGGAAGAAAAACATGAGGTTGGAA


The sequence






TAA
GAAAGAGTTCCTAGAGAGAGAAGACAAAG

without the




ACAGTCTGTGTTGCAAAAGAAGCCCAAGGCTGA
underlined




AGATTTAAGAACTGAATGTGCAACTTGCAAGGG
exon exclusion




AACTGGAGACAATGAAAATCTTGTCAG
sequence is





SEQ ID NO:





121.





21024
BCL2L13
GGGTTCAACTAGATATAGCTTCACAATCTCTGGA
58




TCAAGAAATTTTATTAAAAGTTAAAACTGAAATT
The underlined




GAAGAAGAGCTAAAATCTCTGGACAAAGAAATT
exon exclusion




TCTGAAGGCCAGTGACATATCAGGCATTTCGG
sequence is






GAATGTACACTGGAGACCACAGTTCATGCCA


SEQ ID NO:






GCGGCTGGAATAAG
GGCACTGTGTTTAGTCTTG

90.




AGTCAGAGGAGGAGGAATACCCTGGAATCACTG
The sequence




CAGAAGATAGCAATGACATTTACATCCTGCCCA
without the




GCGACAACTCTGGACAAGTCAGTCCCCCAGAGT
underlined




CTCCAACTGTGACCACTTCCTGGCAGTCTGAGAG
exon exclusion




CTTACCTGTGTCACTGTCAGCTAGCCAGAGTTGG
sequence is




CACACAGAAAGCCTGCCAGTGTCACTAGGCCCT
SEQ ID NO:




GAGTCCTGGCAGCAGATTGCAATGGATCCTGAA
122.




GAAGTGAAAAGCTTAGACAGCAACGGAGCTGGA





GAGAAGAGTGAGAACAACTCCTCTAATTCTGAC





ATTGTGCACGTGGAGAAAGAAGAGGTGCCCGAG





GGCATGGAAGAGGCTGCTGTGGCTTCTGTGGTCT





TGCCAGCGCGGGAGCTGCAAGAGGCACTTCCTG





AAGCCCCAGCTCCCTTGCTTCCACATATCACTGC





CACCTCCCTGCTGGGGACAAGGGAACCTGACAC





AGAAGTGATCACAGTTGAGAAATCCAGCCCTGC





TACATCTCTGTTTGTAGAACTTGATGAAGAAGAG





GTGAAAGCAGCAACAACTGAACCTACTGAAGTG





GAGGAGGTGGTCCCCGCACTGGAACCCACAGAA





ACGCTGCTGAGTGAGAAGGAGATAAACGCAAGG





GAAGAGAGCCTTGTGGAAGAGCTGTCCCCTGCC





AGCGAGAAGAAGCCCGTGCCGCCGTCTGAGGGC





AAGTCTAGACTGTCCCCCGCCGGTGAGATGAAG





CCCATGCCGCTGTCTGAGGGCAAGTCTATACTGC





TGTTTGGAGGGGCTGCTGCTGTTGCCATCCTGGC





AGTGGCCATCGGGGTAGCCCTGGCTCTGAGAAA





GAAATAGGAGGCTTTTCAGAAGAGAAAGACAGA





AGGATGTAAGGTTGGAGTTGTATTGGCTGGAATT





TGAACCTCCAGCAGCTGTCTGGACATTTGTGGAA





CACTCTGGGATAATTGGGGACTTCTGCTCAACAT





GGCAGTGGCATGTTAGGCATGTTAGGGCTTGAG





GTGGGGCATTCACATTCATCTGACTGTAAATCCC





AAGGGCCTCCGCTCATGCTAAATTGAGAATCTTA





GGGGTAAAGCACCCCCTCCAGGACCGGGTTTCT





CAGCCTTGGCACTAGTGCTGTTCTGACCATTCTC





TGTGTTGGGGCTGTCCTGTGTGTGGTGGGCTCCA





CCCACTAGATGCCAGTGGCACCCCCTCCCAGAG





ATGACAAACGAAAATGTCTCTAGACATTGCCAA





ATGTCCCGTGTGAACATCCCCTATTGAGACCCAC





TGCTTTAGCGAGAGAGGGTTTACTTAGGAAGAA





TTGGGATAGAAATTCCCAGCTGAGAGAACTTAG





CTGTGGGCTCCTCAGCTACTGACTTCTTAGCTCT





TAATCCCCTTAGAATTTCATCTTTCTCGATGAGC





AGGCTCTGCACCCACTCTTTTTTTGCCCCCCGCC





CTCATCCTGGAGTGTGAGGGTGCTCGCCCGTACT





CTCAGCTGCCTCTCAGGGACTGCACTGTTCCTCT





TCACCCCCAGGTTCCTGCTAAGATCCCACGGGCG





AGGGCTTGCTCTGGACTCAGTCTGTCAAGTCCCC





GAAGCTTCCTGCAGCTCCACCTTGTAAAAATGCT





GCCTTTGGGAATCTTCGAAATATGTACACAGAG





AAAATCACATGAAGGAGACCTGGGGTCCCCACT





TGTGAGTGCAACTGCAAGTAACTCTGGCTAGAG





AGACACATGTGTCTTGTGTCAAGGCAGGAGGAT





AACCTGGATGACCTTCTGAGGTCTCTTCAGCCCT





TTTCGCTAGTGGTCACCCACCACCATGGTTACTT





GCCAGCAACATCTCTATTGCTGGATGGTCCCTGT





CTATAACCTTGGGCTAGTATATTTTTTCCAATAT





GGGACCTTAGTCTTACTACTGATGAGTTCTATGG





GTCTCTTGCTAGGGGGTAAGGATTTTTATTCTTG





GGCTTATAGAGCCAGTTAGATCATAATTCTTATG





AAATAGAGAGTGTCCTAAATATCACTGAAATAA





AAAGTAGGAAAAAGAAGCTTGAATTTTAAGACT





GAGGCTGCTCTGCAGATTCTAGTTTGGCTTTCAG





AGTTCAAGAGTGGTGGCATCTTCACCTGAATTCT





TCAATGCCAGGGTAATAAACCAAAATAGTCCTA





ATCAGTATATGCTAGTTGAGCATCGGCATAATTT





TCTTTCCTCTGGCTGATCCCAGCCCTAAAGGAAG





GGTAGACCCGTGTCTTTCCAGCCCTAAAGGAAG





GGTAGACCCGTGTCTTTCCAGCCCTAAAGGAAG





GGCAGACCCGTGTCTTTCCATGCCCGAGGGCCAC





GACGTCACTATGCAGGGCACACGTGGCTTGGTTT





AAAAAGGTCATCTTAGATTTATCTTAGTAAATGT





AATAAATTATTTTTTAGATCTTGAAATTTATAAT





AAAAATACTTTACCTACCCTGATC






22227
SELENBP1
GTCATTGAGCCCAAGGACATCCATGCCAAGTGC
59




GAACTGGCCTTTCTCCACACCAGCCACTGCCTGG
The underlined




CCAGCGGGGAAGTGATGATCAGCTCCCTGGGAG
exon exclusion




ACGTCAAGGGCAATGGCAAAGGTCATCCACCG
sequence is






GCTGCCCATGCCCAACCTGAAGGACGAGCTG


SEQ ID NO:






CATCACTCAGGATGGAACACCTGCAGCAGCT


91.






GCTTCGGTGATAGCACCAAGTCGCGCACCAA


The sequence






GCTGGTGCTGCCCAGTCTCATCTCCTCTCGCA


without the






TCTATGTGGTGGACGTGGGCTCTGAGCCCCG


underlined






GGCCCCAAAGCTGCACAAG
CTACGAAATGTGG

exon exclusion




GAATTGTGGACCCGGCTACTCCACCCCTCTGGAG
sequence is




GCCATGAAAG
SEQ ID NO:





123.





24742
LINC00630
GTTGATTCCATACCCTGGCTATTGTGAATAATGC
60




TGCAGTGAACATGGGAGTACATACATCTGTTTGA
The underlined




GGAACTCAGAGTGGTTTTCCAGATGGGAATCA
exon exclusion






CATTGCTCTCTGTCCCTGAGATCTTGCTGGAG


sequence is






ACAGGGCTACTCAGTCCCTCTTTGCCAGGTAA


SEQ ID NO:




TCTGTTCCAGAAGAAACATGTGTCGTTCTGACTG
92.




AGCCCCTGCCTGTCTGTCACCTTAAGAGCCAGTC
The sequence




AATTCATATGGTCCCCATATCAAAGTCTCCTGTG
without the




CCCAGAGAGAGGATTTCATTTCAACCATCACCAT
underlined




CACCACCATCATCATCATCACCAAGAGATGTTGT
exon exclusion




TGA
sequence is





SEQ ID NO:





124.





27194
CTBP2
GGTTCATAGTGGCGTCATGCACGCAGACTCCTGC
61




AAGTTCCCCTAAGTTCTTAGAGGACTGCTTTGCC
The underlined




TTTTGATCTGAGAGTTGCAAAGTTCCATAAAGAA
exon exclusion




TGGCCCTTGTGGATAAGCACAAAGTCAAGAGAC
sequence is




AGCGATTGGACAGAATTTGTGAAGATGGAGAAA
SEQ ID NO:






ACAAAGGATTCAGATTGAAGGACTGCTCAGA


93.






CACCCTCCGAAGAGGTGGCCCTGCCTGCGCT


The sequence






CCTCCTGGCTGCAGAGTACCCCACCAGCGC
G

without the




AGATCCAGGGTTGCCAGAAGACGAGACAACCGT
underlined




GATTGCATGTGCGGAGGTTCCTCGATGGAAGCG
exon exclusion




CAGCCCGGCGCGCCCCTCAGCTGGCCTGGCCAG
sequence is




GCCCTATGAAGGTCACGCGAAAACCCTGCTGCG
SEQ ID NO:




GGCTTCTTAGCGACCGCATTACGTGGACTAGCGG
125




GCAAGAAAAGCCTGGTCGGCGCTGCCCTCACAG






30244
SLC52A2
AGGCGTCTGGCCAGGTGGCGCTCCGGGCAGGCC
62






TACTTGGGTGTCCCCGCCTCTGATACCTCCCT


The underlined






GCTGGAGGAAACAGCAGGAAAAGAGAACCAG


exon exclusion






GCAGGCAGGCAGACATCCCCACGGAGCAGCG


sequence is






TTGGGCCCCCAAGGTGCCTGACCCACTTCCTA


SEQ ID NO:






GAGTACTGAACAGTCCCAGAGTGTCACAGCT


94.






GATGTGCAGGACAGCCTGGAGCTCTCACCTT


The sequence






CAACACGGGGTGTACCTGAGACTTCCAGTGG


without the






ATGAGGGTCAGCCTCTGGAGCTGTGAAAACC


underlined






TGGGCCGACAGCGGAGGCAGAGCTGCACTAA


exon exclusion






TGTTCCCACACGAGTCCTTCCCACCCAACACC


sequence is






TTGGTGCAGGGAGACGGAAGGAGCCTGGAGC


SEQ ID NO:






CAGGG
CTAGAAGAAGTCTTCACTTCCCAGGAGA

126.




GCCAAAGCGTGTCTGGCCCTAGGTGGGAAAAGA





ACTGGCTGTGACCTTTGCCCTGACCTGGAAGGGC





CCAGCCTTGGGCTGAATGGCAGCACCCACGCCC





GCCCGTCCGGTGCTGACCCACCTGCTGGTGGCTC





TCTTCGGCATGGGCTCCTGGGCTGCGGTCAATGG





GATCTGGGTGGAGCTACCTGTGGTGGTCAAAGA





GCTTCCAGAGG






33377
SLC38A1
CTCTTTCTCTTCCTCCAGTTTCCAGTCCAGCCCTG
63




TTGGCTCTCAGAATGCATCATCCTTCTCCCTGCA
The underlined




GCGCTCTCACTGAACATGCTCAAGCGCAAGGAA
exon exclusion




CTTATAATCTTGTGTTCTCTGGATTCTGGATTTAG
sequence is




TAATCTGTATTAGTCTGTTCTCACACTGCTAATA
SEQ ID NO:




AAGAAATACCTGAGGTTGCTTCCAAGATAGCCA
95.




AATAGGAACAGCTCTGGTCTGCAGCTCCCAGCA
The sequence




AGATCGATGTAGAAGATGGGTGATTTCTGCATTT
without the




CCAACTGAGGTACCTGGTTCATCTCACTGGGACT
underlined




GGTTGGACAGTGGGTGCAGCCCATGGAAGGTGA
exon exclusion




GCTGAAGCAAGGTGGGGCGTCACCTCACCCAGG
sequence is




AAGCACAAGGGGTCAGGGGATTTACCTTTCCCA
SEQ ID NO:




GCCAAGGGAAGCCATGACAGACTGTAACTGGAG
127.




AAACGGTACACTCCTGACCAAATACTGCACTTTT





CCCACAGTCTTAGCAACTGGCAGACCAGGTAAT





ACCCTCCCGTGCCTGGCTCAGTGGGTTCCATGCC





AACGGAGCCTTGCTCACTGCTAGCGCAACAGTCT





AAGATCGACCTGCGACGCTGCAGCTTGATGCAG





GGAGAGGCATCCAACATTGCTGAGGCTTGAGTA





GCTCACAGTGTAAGCAAAGAGGCCCGGAAGCAC





AAGTTGGGCAGAGCTCATCGCTGCTCAGCAGGG





CCTACTGCCTCTATAGATTCCACCTCTGGAGGCA





GGGCATGGCAGAAAAAAACGCAGCAGACAGCTT





TTGCAGACTTAAACGTCCCTGTCTGATGGCTCTA





AAGAGAGCAATGGTTCTCTCAGCATGGCATTCG





AGCTCCAAGAACAGACAGACTGCCTCCCCAAGC





AGGTCCCTGACCCCCATGTAGCTGGACTGGGAA





ACACCTCCCCATCAGGGGCTGAGAGATACCTCA





AACACGTGGGTGCCCCTCTGGGACGAAGCTTCC





AGAGGAAGGATCAGGCAGCAATATTTGCTATTC





TGCAGCCTTTGCTGGTGATACCCAGGCAAACAG





ATTCTGGAGTGGACCTCCAGCAAACTCCAACAA





ACCTGCAGCTGAGGGGTCTGACTGTGGGAAGGA





AAACTAACAAAGAGAAAGCAATAGCATCAACAT





CAACAAAAAGGACATCCACACCAAATCCCCATC





TATAGGTCACCAACATCAAAGACCAAAGGTAGA





TAAAACCACAAAGATGGGGAGAGAAACCAGAG





CAGAAAAGCTGAAAATTCCAAAAAACAAGCACC





TCTTCTCCTCCAAAGGATCGCAGCTCCTTGCCAG





CAAGGGAACAAAACTAGACGGAGAATGAGTTTG





ACAAGTTGACAGAAGTAGGCTTCAGAAGGTTGG





TAATAACAAACTTCTCTGAGCTAAAGGAGCATCT





TCTAACCCATCGCAAAGAGGCTAAAAACTGTGG





AAAAAAAAAAGGTTAGATGAATGGCTAACTAGA





ATAACCAGTGTAGAGAAGACCTCAAATGACCTG





ATGAAGCTGAAACCCACAGCACAAGAACTTCGA





GACTCATGCACAAGCTTCAATAGCCGATTCGATC





AAGTGGAAGAAAGGATATCAGTGATTGAAGATC





AAATTAATGAAATAAAGTGAGAAGAATGTCTG







GTGAAGTTCAAGGGCATCTTGAACGTGGTGC









ACTTGGAGACAGTGAGGGAAGCAGGGGTGAA









GTGGCTGCTACCTGAGTCCCTTCTGGAGCTCC









ATTTTGCTTGGTCTTGGAGAAGGCTTCTCAGC









TGCCCTCCCAGCTAGT
GAGTTACATCTGCTAAC






ATGCTTATTTTCATTCTTCCTTCATCTCTTTATTTA





AAAATCACAGACCAGGATGGAGATAAAGGAACT





CAAAGAATTTGG






40521
FAM65A
AAACTGGGCACATTTGGGCCCCTGCGCTGCCAG
64




GAGGCATGGGCCCTGGAGCGGCTGCTGCGGGAA
The underlined




GCCCGAGTACTGGAGGCAGTATGCGAGTTCAGC
exon exclusion




AGGCGGTGGGAGATCCCGGCCAGCTCTGCCCAG
sequence is




GAAGTGGTGCAGTTCTCGGCCTCTCGGCCTG
SEQ ID NO:






GCTTCCTGACCTTCTGGGACCAGTGCACAGA


96.






GAGACTCAGCTGCTTCCTCTGCCCGGTGGAG


The sequence






CGGGTGCTTCTCACCTTCTGCAACCAGTATGG


without the






TGCCCGCCTCTCCCTGCGCCAGCCAGGCTTG


underlined






GCTGAGGCTG
TGTGTGTGAAGTTCCTGGAGGAT

exon exclusion




GCCCTGGGGCAGAAGCTGCCCAGAAGGCCCCAG
sequence is




CCAGGGCCTGGAGAGCAGCTCACAGTCTTCCAG
SEQ ID NO:




TTCTGGAGTTTTGTGGAAACCTTGGACAGCCCCA
128.




CCATGGAGGCCTACGTGACTGAGACCGCTGAGG





AGG






41168
USP25
TAATGGAAACTTGGAATTAGCAGTGGCTTTCCTT
65




ACTGCGAAGAATGCTAAGACCCCTCAGCAGGAG
The underlined




GAGACAACTTACTACCAAACAGCACTTCCTGGC
exon exclusion




AATGATAGATACATCAGTGTGGGAAGCCAAGCA
sequence is




GATACAAATGTGATTGATCTCACTGGAGATGA
SEQ ID NO:






TAAAGATGATCTTCAGAGAGCAATTGCCTTGA


97.






GTTTGGCCGAATCAAACAGGGCATTCAGGGA


The sequence






GACTGGAATAACTGATGAGGAACAAGCCATT


without the






AGCAG
AGTTCTTGAAGCCAGCATAGCAGAGAAT

underlined




AAAGCATGTTTGAAGAGGACACCTACAGAAGTT
exon exclusion




TGGAGGGATTCTCGAAACCCTTATGATAGAAAA
sequence is




AGACAGGACAAAGCTCCCGTTGGGCTAAAGAAT
SEQ ID NO:




GTTGGCAATACTTGTTGGTTTAGTGCTGTTATTC
129.




AG






45885
HMOX2
AACCGGATGCTACGGGTGATGACTGGGAGGAGG
66




AGAAAAATTACCTCTTTATCTTGCATGAACATCT
The underlined




TAATTTTCAGAGTCTTGCTGCGACACCCAGGC
exon exclusion






TGGAGTGCAATGGCGCTATCTCGGCTCACTG


sequence is






CAACCTCCGCTTCCCGGATTCAAGCGATTCTC


SEQ ID NO:






CTGCCTCAGCCTCCCGAGTAGGTGGGACTAC


98.






AG
GACCAGAGGAGCGAGAGCAGCAAGAACCAC

The sequence




ACCCAGCAGCAATGTCAGCGGAAGTGGAAACCT
without the




CAGAGGGGGTAGACGAGTCAGAAAAAAAGAAC
underlined




TCTGGGGCCCTAGAAAAGGAGAACCAAATGAG
exon exclusion





sequence is





SEQ ID NO:





130.





50148
MKRN2OS
GGGTTGTGTATAATTACAGTGCACATGGTGTCCA
67




GCGAGACGGAGAAGGGTGGGAAGAGAGCATAA
The underlined




GCATCCCATTACTGCAGCCCAACATGTATGGAAT
exon exclusion




GATGGAGCAATGGGACAAGTACCTGGAAGACTT
sequence is




CTCCACCTCGGGGGCCTGGCTGCCTCACAGAGA
SEQ ID NO:






GTATGATGGAAGGTCTGATCTTCATGTTGGAA


99.






TAACTAACACAAATG
GTATAATGAGGAAAAGG

The sequence




AAGTCTCCGGAAACCTCCCCTAGCATTCCAGGA
without the




GGCGAAAGCTATGCACTGCGCAGAGGCTGGGAA
underlined




GGCTTTAATTAAATTCAACCACTGTGAGAAATAC
exon exclusion




ATCTACAGCTTCAGTGTGCCCCAGTGCTGCCCTC
sequence is




TCTGCCAGCAGGACCTGGGCTCGAGGAAGCTGG
SEQ ID NO:




AGGACGCACCTGTTAGCATCGCTAATCCATTTAC
131.




TAATGGACATCAAGAAAAATGTTCATTCCTCCTC





AGACCAACTCAGGGGACATTTCTTAG






52249
ATP8A2P1
GTAAACAAATTGCTCCTGTGGAGATGATTGGCAT
68




CACATGGTGTTTTGAGCTGATACACCCAACACTT
The underlined




GAGCTCACTGCAACAGTACCAGATTTTCACCGC
exon exclusion






TATGCCTCCTTTCACTCTGGGAGTCTTCCAGA


sequence is






GGTCTTGCACTCGGGAGAGCATGCTCAGGTT


SEQ ID NO:






TCCCCAGCTCTACAAAATCACCCAGAATGCCA


100.






AAGACTTCAACACAAGGGTAAATAAGGTTGAT


The sequence






CTCAGAATTGTCACCTCAAAAAGGCCCTGCCT


without the






TCCACTGTTCAGTTCTGGTCATCTGCCTATGA


underlined






GATATCTGAAGCTTGAAAGAGAACACTTGAAA


exon exclusion






ATCACTGAGACCGTGACTCCCATCCCAGCACA


sequence is






CACAGCAAGCCAA
ATACTGTGTTGACCAGTGGT

SEQ ID NO:




CATGCCACTGCCTGTTGATTTGTTGAAAATATTG
132.




TTTACACG






53188
HIBCH
TTTTAATTGATAAAGACCAGAGTCCAAAATGGA
69




AACCAGCTGATCTAAAAGAAGTTACTGAGGAAG
The underlined




ATTTGAATAATCACTTTAAGTCTTTGGGAAGCAG
exon exclusion




TGATTTGAAATTTTGAGGTGACAGGCTTTTAAGG
sequence is




TATATTTTGTAGCATGGGTTGGCAATCTACAGCA
SEQ ID NO:




TGTGGGCCAAATCCAGCCTGCTGCCTGTTTTTAT
101.




ATACCCTGTAAGCTAAGAATGGTTTCCGCATTTT
The sequence




TAAATGGTTGGGAAAAGAAATCAAAGACTAATA
without the




ATTCATGACGTGAAAATTATCAGAATTCACAAAT
underlined




AAAGCTTTATTGGAACTAGCTATACTCATCTGTT
exon exclusion




TATATATTATCTGTGGCTGCTTTGAAATGAGTAG
sequence is




TTGCAATAGAGATGGTAAAGCCTACAAAGCCTA
SEQ ID NO:




ATTATTTACTGTCTGGTTTTTGTCAGAAAAAAGT
133.




TTGTCAATCCTTGTTTTAGAAGATGGAAAAATGT





GAAGATCTTTGGAGATTCTCTTGAGTGGTATATC





TAATTGAAATGGGATCTTCGTTTGGCTTGTATGT





TGATGAAATCAACTTAGGTATACAATATAAAAA





ATAAAGACCCTGAAAATTGTTTTGGAGAGGTCA







TGACTTTCATGAAGGCGTTAGAGCTG
GTAATT






AATAAAATGTCTCCAACATCTCTAAAGATCACAC





TAAGGCAACTCATGGAGGGGTCTTCAAAGACCT





TGCAAGAAGTACTAACTATGGAGTATCGGCTAA





GTCAAGCTTGTATG






58853
SLC35C2
CGCGCGGCACTGGTCCTGGTGGTCCTCCTCATCG
70




CCGGGGGTCTCTTCATGTTCACCTACAAGTCCAC
The underlined




ACAGTTCAACGTGGAGGGCTTCGCCTTGGTGCTG
exon exclusion




GGGGCCTCGTTCATCGGTGGCATTCGCTGGACCC
sequence is




TCACCCAGATGCTCCTGCAGAAGGCTGAACTCG
SEQ ID NO:






GACCAAATCCTCAGCTGTCCTCTTCATCTTGA


102.






TCTTCTCTCTGATCTTCAAGCTGGAGGAGCTG


The sequence




CTCTGGCGACGGCGCTTGACGTGGGCTTGTCCAA
without the




CTGGAGCTTCCTGTATGTCACCGTCTCGCT
underlined





exon exclusion





sequence is





SEQ ID NO:





134.





59314
TRIM5
GGATCTGTGAACAAGAGGAACCTCAGCAGCCAG
71




GACAGGCAGGAGCAGTGGAATAGCTACTATGGC
The underlined




TTCTGGAATCCTGGTTAATGTAAAGGAGGAGGT
exon exclusion




GACCTGCCCCATCTGCCTGGAACTCCTGACACAA
sequence is




CCCCTGAGCCTGGACTGCGGCCACAGCTTCTGCC
SEQ ID NO:




AAGCATGCCTCACTGCAAACCACAAGAAGTCCA
103.




TGCTAGACAAAGGAGAGAGTAGCTGCCCTGTGT
The sequence




GCCGGATCAGTTACCAGCCTGAGAACATACGGC
without the




CTAATCGGCATGTAGCCAACATAGTGGAGAAGC
underlined




TCAGGGAGGTCAAGTTGAGCCCAGAGGGGCAGA
exon exclusion




AAGTTGATCATTGTGCACGCCATGGAGAGAAAC
sequence is




TTCTACTCTTCTGTCAGGAGGACGGGAAGGTCAT
SEQ ID NO:




TTGCTGGCTTTGTGAGCGGTCTCAGGAGCACCGT
135.




GGTCACCACACGTTCCTCACAGAGGAGGTTGCC





CGGGAGTACCAAGATCCAGGCAATCTTTCCAG







ACACATCTACTTCCCAGTAATATTTCCCCGAA









GAGAAATATTGGCAGCCGAAGACACCAAAAG









CAGAAAAATCACATGGATTTGAATTCTTAAAT









GTGCAGCAG
GTCTAAGGCCCGCCTGTTCTGTGC






CGTGACCTGTGCTACCGAAGTCATCTGTTGCTGT





AGGGAGGCCAGGGACTCAGCCGATGCCTCAATG





GCCAACTGCAG






60239
HSD17B6
TCCTCGCCTCCATCACCTCCACCGTAGTTGAGCC
72




AGCGATAGTACTGAGAGTAGGGAAAGAGCCTCC
The underlined




GGTAATAAAGTTTAAGCAGCTCGGGCAGCTCGG
exon exclusion




TGGGGTCAAACGTCTCCATTGAGCGCGGAACTC
sequence is




GCCACGTAACAGATCTGATTCTGCAGCTGATC
SEQ ID NO:






AAGGATGACACTGGTGAGAACCCTATGAGGG


104.






AGTGAAGCAGCCTGGACTCTTACCACAAGAG


The sequence






GGAGGTGTTATAAGAGCAATGCAGAGGTTGG


without the






AGTGGGCAGCAGTTGGGGCAGGAGGAAGCCG


underlined






ACTGCTGCCTGGTCTGCAAAGAAGTCCTTTCA


exon exclusion






AGTCTCTAGGACTGGACTCTTCCTAAGCAAGT


sequence is






CCG
AGAAGGAAGCACCCTCACTATGTGGCTCTA

SEQ ID NO:




CCTGGCGGCCTTCGTGGGCCTGTACTACCTTCTG
136.




CACTGGTACCGGGAGAGGCAGGTGGTGAGCCAC





CTCCAAGACAAGTATGTCTTTATCACGGGCTGTG





ACTCGGGCTTTGGGAACCTGCTGGCCAGACAGC





TGGATGCACGAGGCTTGAGAGTGCTGGCTGCGT





GTCTGACGGAGAAGGGGGCCGAGCAGCTGAGGG





GCCAGACGTCTGACAGGCTGGAGACGGTGACCC





TGGATGTTACCAAGATGGAGAGCATCGCTGCAG





CTACTCAGTGGGTGAAGGAGCATGTGGGGGACA





GAG









All references, patents and patent applications disclosed herein are incorporated by reference with respect to the subject matter for which each is cited, which in some cases may encompass the entirety of the document.


The indefinite articles “a” and “an,” as used herein in the specification and in the claims, unless clearly indicated to the contrary, should be understood to mean “at least one.”


It should also be understood that, unless clearly indicated to the contrary, in any methods claimed herein that include more than one step or act, the order of the steps or acts of the method is not necessarily limited to the order in which the steps or acts of the method are recited.


In the claims, as well as in the specification above, all transitional phrases such as “comprising,” “including,” “carrying,” “having,” “containing,” “involving,” “holding,” “composed of,” and the like are to be understood to be open-ended, i.e., to mean including but not limited to. Only the transitional phrases “consisting of” and “consisting essentially of” shall be closed or semi-closed transitional phrases, respectively, as set forth in the United States Patent Office Manual of Patent Examining Procedures, Section 2111.03.


The terms “about” and “substantially” preceding a numerical value mean ±10% of the recited numerical value.


Where a range of values is provided, each value between the upper and lower ends of the range are specifically contemplated and described herein.

Claims
  • 1. A method comprising assaying nucleic acids of a sample for the presence or absence of: a target exon comprising a nucleotide sequence of any one of SEQ ID NOS: 22-24, 26-36, 38-40, 73-75, 77-79, 82-100, 102-104;(b) at least 2 target exons, wherein each target exon comprises a nucleotide sequence of any one of SEQ ID NOS: 23, 27, 35, 85, 88, 89, 98, 101, 102, or 104;(c) at least 3 target exons, wherein each target exon comprises a nucleotide sequence of any one of SEQ ID NOS: 21, 23, 27, 30, 31, 32, 35, 36, 39, 85, 87-89, 91, 94, 98, or 101-104; or(d) at least 8 different target exons, wherein each target exon comprises a nucleotide sequence of any one of SEQ ID NOs: 21-40 or 73-104.
  • 2. The method of claim 1, wherein the target exon comprises a nucleotide sequence of any one of SEQ ID NOS: 27, 98, 102, or 104.
  • 3. (canceled)
  • 4. The method of claim 1, wherein each target exon comprises a nucleotide sequence of any one of SEQ ID NOS: 27, 98, 101, 102, or 104.
  • 5.-6. (canceled)
  • 7. The method of claim 1, wherein the sample is a breast tissue sample.
  • 8. The method of claim 7, wherein the sample is obtained from a subject suspect of having, at risk of, or diagnosed with breast cancer.
  • 9. The method of claim 8, wherein the subject is a female subject.
  • 10. The method of claim 1 any one of claim 1, wherein the nucleic acids comprise messenger ribonucleic acid (mRNA).
  • 11. The method of claim 1, wherein the nucleic acids comprise complementary deoxyribonucleic acid (cDNA) synthesized from mRNA obtained from the sample.
  • 12. The method of claim 1, further comprising detecting the presence of a target exon comprising a nucleotide sequence of any one of SEQ ID NOs: 24, 28, 31, 33, and/or 38 or the absence of a target exon comprising a nucleotide sequence of any one of SEQ ID NOs: 82, 87 and/or 91, and assigning a favorable survival prognosis to the sample.
  • 13. The method of claim 1, further comprising detecting the presence of a target exon comprising a nucleotide sequence of any one of SEQ ID NOs: 21-23, 25-27, 29, 30, 32, and/or 34-40 or the absence of a target exon comprising a nucleotide sequence of any one of SEQ ID NOs: 73-81, 83-86, 88-90, and/or 92-104, and assigning an unfavorable survival prognosis to the sample.
  • 14. A complementary deoxyribonucleic acid (cDNA) comprising a nucleotide sequence of any one of SEQ ID NOs: 1-20 or 105-136.
  • 15. The cDNA of claim 14 comprising a nucleotide sequence of any one of SEQ ID NOs: 22-24, 27-34, 36, 38, or 40.
  • 16. A composition comprising the cDNA of claim 14.
  • 17. The composition of claim 16 further comprising a probe that binds the cDNA or a pair of primers that bind the cDNA.
  • 18. (canceled)
  • 19. A composition comprising (a) a messenger ribonucleic acid (mRNA) comprising a nucleotide sequence of any one of SEQ ID NOs: 1-20 or 105-136 and (b) a probe that binds a nucleotide sequence of any one of SEQ ID NOs: 1-20 or 105-136 or a pair of primers that bind a nucleotide sequence of any one of SEQ ID NOs: 1-20 or 105-136.
  • 20. The composition of claim 17, wherein the probe comprises a detectable label or the primers comprise a detectable label.
  • 21. (canceled)
  • 22. The composition of claim 19, wherein the probe comprises a detectable label or the primers comprise a detectable label.
  • 23. A kit comprising: a molecule that can detect the presence or absence of a target exon comprising a nucleotide sequence of any one of SEQ ID NOS: 22-24, 26-36, 38-40, 73-75, 77-79, 82-100, 102-104, and a detection reagent selected from buffers, salts, polymerases, and deoxyribonucleotide triphosphates (dNTPs),molecules that can detect the presence or absence of at least 2 target exons, wherein each of the at least 2 target exons comprises a nucleotide sequence of any one of SEQ ID NOs: 23, 27, 35, 85, 88, 89, 98, 101, 102, or 104, and a detection reagent selected from buffers, salts, polymerases, and dNTPs,molecules that can detect the presence or absence of at least 3 target exons, wherein each of the at least 3 target exons comprises a nucleotide sequence of any one of SEQ ID NOS: 21, 23, 27, 30, 31, 32, 35, 36, 39, 85, 87-89, 91, 94, 98, or 101-104, and a detection reagent selected from buffers, salts, polymerases, and dNTPs, ormolecules that can detect the presence or absence of at least 8 different target exons, wherein each of the at least 8 target exons comprises a nucleotide sequence of any one of SEQ ID NOs: 21-40 or 73-104, and a detection reagent selected from buffers, salts, polymerases, and dNTPs.
  • 24. The kit of claim 23, wherein the molecule comprise a probe or primer that bind a nucleic acid comprising a nucleotide sequence of any one of SEQ ID NOS: 22-24, 26-36, 38-40, 73-75, 77-79, 82-100, 102-104.
  • 25-27. (canceled)
  • 28. The kit of claim 24, wherein the probe or primer comprises a detectable label.
RELATED APPLICATIONS

This application is a divisional of U.S. application Ser. No. 17/253,974, filed Dec. 18, 2020, which is a national stage filing under 35 U.S.C. § 371 of international application number PCT/US2019/039794, filed Jun. 28, 2019, which claims the benefit under 35 U.S.C. § 119(e) of U.S. provisional application No. 62/692,121, filed Jun. 29, 2018, and U.S. provisional application No. 62/818,582, filed Mar. 14, 2019, each of which is incorporated by reference herein in its entirety.

Provisional Applications (2)
Number Date Country
62692121 Jun 2018 US
62818582 Mar 2019 US
Divisions (1)
Number Date Country
Parent 17253974 Dec 2020 US
Child 18302737 US