The present invention relates to the field of molecular biology. In particular, the present invention relates to methods for determining the activity of a promoter.
Gastric cancer (GC) is a major cause of global cancer mortality. Most GCs are adeno carcinomas, and recent exome and whole-genome studies have revealed new GC driver genes and mutational signatures. Besides protein-coding genes, regulatory elements in non-coding genomic regions are also likely contributors to malignancy, as these elements can profoundly influence chromatin structure and gene expression. Few studies have explored the repertoire of regulatory elements somatically altered during gastric carcinogenesis, on a genomie scale.
Regulatory elements including promoters and enhancers can be identified as regions exhibiting histone modifications (“chromatin marks”). To date, most chromatin mark studies in cancer have used immortalized cell lines, since existing protocols require significant amounts of biological material. However, cancer lines cultured in vitro can display epigenetic patterns distinct from primary tumors, and cell lines may also undergo in vitro adaption, acquiring genetic and epigenetic changes due to extensive passaging. Identification of somatically acquired alterations using cancer cell lines is also difficult, as they often lack matched normal counterparts.
There is therefore a need to provide a method to measure chromatin marks that overcomes, or at least ameliorates, one or more of the disadvantages described above.
In one aspect there is provided a method for determining the activity of at least one promoter in a cancerous biological sample relative to a non-cancerous biological sample, comprising; mapping isolated nucleic acid comprising at least one promoter sequence obtained from said cancerous biological sample against a reference nucleic acid obtained from said non-cancerous biological sample to obtain a read per kilo-base per million (RPKM) value or fragments per kilo-base per million (FPKM) value for said at least one promoter; and determining the differential activity of the at least one promoter sequence in the nucleic acid relative to the activity of the at least one promoter in the reference nucleic acid sequence using said RPKM or FPKM value.
According to another aspect there is provided a method for the determining the susceptibility of a subject to cancer comprising, mapping an isolated nucleic acid comprising at least one promoter obtained from a cancerous biological sample of the subject against a reference nucleic acid obtained from a non-cancerous biological sample to obtain an RPKM or FPKM value for said at least one promoter; and determining the differential activity of the at least one promoter in the nucleic acid relative to the activity of the at least one promoter in the reference nucleic acid using said RPKM or FPKM value, wherein an increased activity of the at least one promoter in the cancerous sample relative to that in the non-cancerous sample is indicative of the susceptibility of the subject to cancer.
In another aspect there is provided a method for determining the presence of at least one promoter associated with cancer in a subject, comprising: mapping isolated nucleic acid comprising at least one promoter obtained from a cancerous biological sample of the subject against a reference nucleic acid obtained from a non-cancerous biological sample to obtain an RPKM or FPKM value for said at least one promoter; and determining the differential activity of the at least one promoter in the nucleic acid relative to the activity of the at least one promoter in the reference nucleic acid using said RPKM or FPKM value, wherein an increased activity of the at least one promoter in the cancerous biological sample obtained from the subject relative to that in the non-cancerous sample is indicative of the presence of a promoter associated with cancer in a subject.
In another aspect there is provided a biomarker for detecting cancer in a subject, the biomarker comprising at least one promoter having increased activity in a cancerous biological sample relative to a normal non-cancerous biological sample, wherein the promoter comprises an increase of SUZ12 binding sites relative to the total promoter population.
In another aspect there is provided a method for determining the presence of at least one promoter associated with cancer in a cancerous biological sample relative to a non-cancerous biological sample, comprising; mapping isolated nucleic acid comprising at least one promoter sequence obtained from said cancerous biological sample against a reference nucleic acid obtained from said non-cancerous biological sample; generating a matrix of sequencing tag counts for said at least one promoter based on said mapping; analysing said matrix of sequencing tag counts; and determining the differential enrichment of the at least one promoter in the nucleic acid relative to the at least one promoter in the reference nucleic acid using the analysis of said matrix of sequencing tag counts, wherein the differential enrichment of the at least one promoter in the cancerous biological sample obtained from the subject relative to that in the non-cancerous sample is indicative of the presence of a promoter associated with cancer in a subject.
In another aspect there is provided a method for determining the activity of at least one promoter in a cancerous biological sample relative to a non-cancerous biological sample, comprising; mapping isolated nucleic acid comprising at least one promoter sequence obtained from said cancerous biological sample against a reference nucleic acid obtained from said non-cancerous biological sample; generating a matrix of sequencing tag counts for said at least one promoter based on said mapping; analysing said matrix of sequencing tag counts; and determining the differential activity of the at least one promoter in the nucleic acid relative to the at least one promoter in the reference nucleic acid using the analysis of said matrix of sequencing tag counts.
The term “antigen binding protein” as used herein refers to antibodies, antibody fragments and other protein constructs, such as domains, which are capable of binding to an antigen.
The term “antibody” is used herein in the broadest sense to refer to molecules with an immunoglobulin-like domain and includes monoclonal, recombinant, polyclonal, chimeric, humanised, bispecific and heteroconjugate antibodies; a single variable domain, a domain antibody, antigen binding fragments, immunologically effective fragments, single chain Fv, diabodies, Tandabs™, etc (for a summary of alternative “antibody” formats see Holliger and Hudson, Nature Biotechnology, 2005, Vol 23, No. 9, 1126-1136).
The phrase “single variable domain” refers to an antigen binding protein variable domain (for example, VH, VHH, VL) that specifically binds an antigen or epitope independently of a different variable region or domain.
A “domain antibody” or “dAb” may be considered the same as a “single variable domain” which is capable of binding to an antigen. A single variable domain may be a human antibody variable domain, but also includes single antibody variable domains from other species such as rodent (for example, as disclosed in WO 00/29004), nurse shark and Camelid VHH dAbs. Camelid VHH are immunoglobulin single variable domain polypeptides that are derived from species including camel, llama, alpaca, dromedary, and guanaco, which produce heavy chain antibodies naturally devoid of light chains. Such VHH domains may be humanised according to standard techniques available in the art, and such domains are considered to be “domain antibodies”. As used herein VH includes camelid VHH domains.
As used herein the term “domain” refers to a folded protein structure which has tertiary structure independent of the rest of the protein. Generally, domains are responsible for discrete functional properties of proteins, and in many cases may be added, removed or transferred to other proteins without loss of function of the remainder of the protein and/or of the domain.
A “single variable domain” is a folded polypeptide domain comprising sequences characteristic of antibody variable domains. It therefore includes complete antibody variable domains and modified variable domains, for example, in which one or more loops have been replaced by sequences which are not characteristic of antibody variable domains, or antibody variable domains which have been truncated or comprise N- or C-terminal extensions, as well as folded fragments of variable domains which retain at least the binding activity and specificity of the full-length domain. A domain can bind an antigen or epitope independently of a different variable region or domain.
An antigen binding fragment may be provided by means of arrangement of one or more CDRs on non-antibody protein scaffolds such as a domain. The domain may be a domain antibody or may be a domain which is a derivative of a scaffold selected from the group consisting of CTLA-4, lipocalin, SpA, an Affibody, an avimer, GroEl, transferrin, GroES and fibronectin/adnectin, which has been subjected to protein engineering in order to obtain binding to an antigen.
An antigen binding fragment or an immunologically effective fragment may comprise partial heavy or light chain variable sequences. Fragments are at least 5, 6, 8 or 10 amino acids in length. Alternatively the fragments are at least 15, at least 20, at least 50, at least 75, or at least 100 amino acids in length.
The term “specifically binds” as used throughout the present specification in relation to antigen binding proteins means that the antigen binding protein binds to an antigen with no or insignificant binding to other (for example, unrelated) proteins. The term however does not exclude the fact that the antigen binding proteins may also be cross-reactive with closely related molecules.
The terms “biological material” or “biological sample” as used herein refers to any material or sample, which includes an analyte as defined herein. Such samples may, for example, include samples derived from or comprising stool, whole blood, serum, plasma, tears, saliva, nasal fluid, sputum, ear fluid, genital fluid, breast fluid, milk, colostrum, placental fluid, amniotic fluid, perspirate, synovial fluid, ascites fluid, cerebrospinal fluid, bile, gastric fluid, aqueous humor, vitreous humor, gastrointestinal fluid, exudate, transudate, pleural fluid, pericardial fluid, semen, upper airway fluid, peritoneal fluid, fluid harvested from a site of an immune response, fluid harvested from a pooled collection site, bronchial lavage, urine, biopsy material, for example from all suitable organs, for example the lung, the muscle, brain, liver, skin, pancreas, stomach and the like, a nucleated cell sample, a fluid associated with a mucosal surface, hair, or skin.
The term “RPKM” as used herein, refers to Reads Per Kilobase per Million reads mapped. The term “FPKM” as used herein refers to Fragments Per Kilobase per Million fragments mapped. RPKM and FPKM are units to quantify abundance of any genomic feature, such as an exon, transcript or any genomic coordinates, determined by the abundance of sequencing reads aligning to it. The RPKM and FPKM measures normalize the abundance by relative length of the genomic unit as well as the total number of reads mapping to it, to facilitate transparent comparison of abundance levels within and between samples.
The term “matrix of sequencing tag counts” as used herein refers to a matrix of integer values of mapped “sequencing tags”. The matrix may be in the form of a table with a row and column, wherein the value in the row (genomic region) and the column (tissue sample) of the matrix may indicate how many reads have been mapped to a genomic region, such as a promoter region or a histone modification region, for example the H3K4me3 region. Analogously, the rows of the matrix may also correspond to binding regions with ChIP-Seq). The aforementioned “sequencing tags” as used herein refer to short DNA fragments isolated from samples which are mapped to a reference genome using an alignment tool (as mentioned in methods disclosed herein).
The term “bedtools” as used herein and in the context of the working examples refers to a set of published tools that are well known in the art for genomic analysis. For example, the “bedtools” may be used for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. Bedtools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The bedtools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. For example, the “bedtools” may refer to “BEDTools”, whereby the BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. In particular, such “bedtools” can be found at http://bioinformatics.oxfordjournals.org/content/26/6/841.full.
The term “obtained” or “derived from” as used herein is meant to be used inclusively. That is, it is intended to encompass any nucleotide sequence directly isolated from a biological sample or any nucleotide sequence derived from the sample.
The invention illustratively described herein may suitably be practiced in the absence of any element or elements, limitation or limitations, not specifically disclosed herein. Thus, for example, the terms “comprising”, “including”, “containing”, etc. shall be read expansively and without limitation. Additionally, the terms and expressions employed herein have been used as terms of description and not of limitation, and there is no intention in the use of such terms and expressions of excluding any equivalents of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the invention claimed. Thus, it should be understood that although the present invention has been specifically disclosed by preferred embodiments and optional features, modification and variation of the inventions embodied therein herein disclosed may be resorted to by those skilled in the art, and that such modifications and variations are considered to be within the scope of this invention.
The invention has been described broadly and generically herein. Each of the narrower species and subgeneric groupings falling within the generic disclosure also form part of the invention. This includes the generic description of the invention with a proviso or negative limitation removing any subject matter from the genus, regardless of whether or not the excised material is specifically recited herein.
Other embodiments are within the following claims and non-limiting examples. In addition, where features or aspects of the invention are described in terms of Markush groups, those skilled in the art will recognize that the invention is also thereby described in terms of any individual member or subgroup of members of the Markush group.
The invention will be better understood with reference to the detailed description when considered in conjunction with the non-limiting examples and the accompanying drawings, in which:
promoter and (i) enhancer (H3K4me1) regions. H3K27ac patterns are also shown. Color intensities correspond to normalized RPKM values.
(j) Survival analysis comparing patient groups with GCs exhibiting high and low expression of genes driven by cancer-associated promoters. Kaplan-Meier survival analysis of clusters within the Singaporean cohort (total n=183) with “high” (n=154) and “low” (n=29) enrichment of the target gene signature. The signature is prognostic in this cohort (log-rank p-value: 0.041), with worse prognosis observed for higher enrichment of the signature (H.R. (95% C.I.): 1.78 (1.02-3.13); p=0.044).
(i) Allele bias distribution across samples. Over- and under-representation of SNP tags in tumor tissues are marked in green and blue, respectively. (j) dbSNP sites mapping to altered promoter and enhancer regions. SNP sites are positioned according to their chromosomal position on the x-axis, and their corresponding allele bias levels along the y-axis. SNPs exhibiting allele bias (above blue horizontal line) and which are also predicted to affect protein binding based on RegulomeDB are marked in red. (k) Regulome DB predictions for allele-biased sites mapping to somatically altered regulatory elements.
(l, m) Genome browser view of the KLK1 locus showing RNAseq and H3K4me3 tracks. (m) provides a close-up view of H3K4me3 sequence tags and SNPs. A bias favoring a higher proportion of the known eQTL SNP rs2659104 (G) over the reference allele (A) is observed. (n) Quantitative PCR-pyrosequencing confirms minimal H3K4me3 signal in normal tissue, and rs2659104 SNP-biased proportions of sequence tags over the parental allele in H3K4me3 signals from tumors.
(f) Luciferase reporter assays measuring regulatory activity of wild-type and mutant alleles. DNA containing the mutant allele provides increased transcriptional activity (* p=1.1×10-4). Experiments were performed in KATO-III GC cells.
In a first aspect the present invention refers to a method for determining the activity of at least one promoter in a cancerous biological sample relative to a non-cancerous biological sample. The method may comprise mapping an isolated nucleic acid comprising at least one promoter sequence obtained from said cancerous biological sample against a reference nucleic acid obtained from said non-cancerous biological sample to obtain a read per kilo-base per million (RPKM) value or fragments per kilo-base per million (FPKM) value for said at least one promoter; and determining the differential activity of the at least one promoter sequence in the nucleic acid relative to the activity of the at least one promoter in the reference nucleic acid sequence using said RPKM or FPKM value.
The cancerous and non-cancerous biological sample described herein may comprise a single cell, multiple cells, fragments of cells, body fluid or tissue. In some embodiments the cancerous and non-cancerous biological sample may be obtained from the same subject or, alternatively, a different subject.
The nucleic acid may be isolated from said cancerous biological sample by immunoprecipitation of chromatin. The nucleic acid may comprise at least one promoter.
The immune precipitation of chromatin may be achieved by an antigen binding protein specific for a modified histone protein. The modified histone protein may comprise at least one histone modification selected from the group consisting of H3K4me3, H3K4me1 and H3K27ac.
In some embodiments the antigen binding protein may be an antibody specific to at least one histone modification selected from the group consisting of H3K4me3, H3K4me1 and H3K27ac.
The isolated nucleic acid comprising at least one promoter may be amplified with at least one primer. In some embodiments, the amplified nucleic acid may be used to construct a nucleic acid sequence library with said amplified nucleic acid.
In some embodiments the mapping step comprises calculating the RPKM values based upon the total sequence tags for the at least one promoter in the mapped nucleic acid relative to the reference nucleic acid.
In some embodiments, the mapping step comprises calculating the FPKM values based upon identified transcript sequences associated with the at least one promoter in the mapped nucleic acid relative to the reference nucleic acid.
The step of determining the differential activity of the at least one promoter sequence may comprise determining that the RKPM or FPKM value for the at least one promoter in the nucleic acid obtained from the cancerous biological sample is: i) greater than between a 1 to 20-fold, such as a 1-fold, 2-fold, 3-fold, 4-fold or 5-fold, change in mean RPKM or FPKM value relative to the RPKM or FPKM value of the at least one promoter in the reference nucleic acid obtained from the non-cancerous biological sample; and ii) greater than a 0.1 RPKM or FPKM range relative to the RPKM or FPKM value of the at least one promoter in the reference nucleic acid obtained from the non-cancerous biological sample.
The at least one promoter may comprise comprises an increase of SUZ12 binding sites relative to the total promoter population. In some embodiments, the at least one promoter may be positioned adjacent to a gene associated with cell-type specification, embryonic development or transcription factors.
In another embodiment, the at least one promoter may be positioned adjacent to a gene associated with cancer. The gene may be selected from NKX6-3, SALL4, HOXB9, MET, TNK2, KLK1, FAR2, HOXA11 or HOXA11-AS. The cancer may be gastric cancer.
In another embodiment, the at least one promoter may comprise a cryptic promoter.
There is also provided a method for the determining the susceptibility of a subject to cancer. The method comprises, mapping an isolated nucleic acid comprising at least one promoter obtained from a cancerous biological sample of the subject against a reference nucleic acid obtained from a non-cancerous biological sample to obtain an RPKM or FPKM value for said at least one promoter; and determining the differential activity of the at least one promoter in the nucleic acid relative to the activity of the at least one promoter in the reference nucleic acid using said RPKM or FPKM value, wherein an increased activity of the at least one promoter in the cancerous sample relative to that in the non-cancerous sample is indicative of the susceptibility of the subject to cancer.
There is also provided a method for determining the presence of at least one promoter associated with cancer in a subject. The method comprises: mapping isolated nucleic acid comprising at least one promoter obtained from a cancerous biological sample of the subject against a reference nucleic acid obtained from a non-cancerous biological sample to obtain an RPKM or FPKM value for said at least one promoter; and determining the differential activity of the at least one promoter in the nucleic acid relative to the activity of the at least one promoter in the reference nucleic acid using said RPKM or FPKM value, wherein an increased activity of the at least one promoter in the cancerous biological sample obtained from the subject relative to that in the non-cancerous sample is indicative of the presence of a promoter associated with cancer in a subject. In some embodiments, the at least one promoter associated with cancer is present when an RKPM or FPKM value for the at least one promoter in the nucleic acid obtained from the biological sample is: i) greater than between a 1 to 20-fold, such as a 1-fold, 2-fold, 3-fold, 4-fold or 5-fold, change in mean RPKM or FPKM value relative to the RPKM or FPKM value of the at least one promoter in the reference nucleic acid obtained from the non-cancerous biological sample; and ii) greater than a 0.1 RPKM or FPKM range relative to the RPKM or FPKM value of the at least one promoter in the reference nucleic acid obtained from the non-cancerous biological sample.
There is also provided a biomarker for detecting cancer in a subject, the biomarker comprising at least one promoter having increased activity in a cancerous biological sample relative to a normal non-cancerous biological sample, wherein the promoter comprises an increase of SUZ12 binding sites relative to the total promoter population. The at least one promoter may exhibit a low DNA methylation level relative to the total promoter population.
There is also provided a method for determining the presence of at least one promoter associated with cancer in a cancerous biological sample relative to a non-cancerous biological sample, comprising; mapping isolated nucleic acid comprising at least one promoter sequence obtained from said cancerous biological sample against a reference nucleic acid obtained from said non-cancerous biological sample; generating a matrix of sequencing tag counts for said at least one promoter based on said mapping; analysing said matrix of sequencing tag counts; and determining the differential enrichment of the at least one promoter in the nucleic acid relative to the at least one promoter in the reference nucleic acid using the analysis of said matrix of sequencing tag counts, wherein the differential enrichment of the at least one promoter in the cancerous biological sample obtained from the subject relative to that in the non-cancerous sample is indicative of the presence of a promoter associated with cancer in a subject
There is also provided a method for determining the activity of at least one promoter in a cancerous biological sample relative to a non-cancerous biological sample, comprising; mapping isolated nucleic acid comprising at least one promoter sequence obtained from said cancerous biological sample against a reference nucleic acid obtained from said non-cancerous biological sample; generating a matrix of sequencing tag counts for said at least one promoter based on said mapping; analysing said matrix of sequencing tag counts; and determining the differential activity of the at least one promoter in the nucleic acid relative to the at least one promoter in the reference nucleic acid using the analysis of said matrix of sequencing tag counts.
In one embodiment, the generating step of the above method comprises calculating the matrix based upon a sequence tag count for the at least one promoter in the mapped nucleic acid relative to the reference nucleic acid.
In one embodiment, the analysis step of the above method comprises analyzing the matrix using a DESeq2 algorithm. The DESeq2 algorithm is a genomic analysis tool known in the art for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. DESeq2 enables a quantitative analysis focused on the strength rather than the mere presence of differential expression. In particular, DESeq2 provides methods to test for differential expression by use of negative binomial generalized linear models; the estimates of dispersion and logarithmic fold changes incorporate data-driven prior distributions.
In one embodiment, the at least one promoter may be positioned adjacent to a gene associated with cancer.
In one embodiment, the gene may be RASA3, GRIN2D, TNNI3, SHD, ATP10B, SMTN, MYO15B, C2orf61, LINC00443 or ACHE.
In one embodiment, the differential enrichment is identified based upon a FDR rate of 10% and an absolute fold change of 1.5.
In one embodiment, the differential activity is identified based upon a FDR rate of 10% and an absolute fold change of greater than 1.5.
Primary patient samples were obtained from the Singhealth tissue repository, and collected with approvals from institutional research ethics review committees and signed patient informed consent.
‘Normal’ (non-malignant) samples used in this study refer to samples harvested from the stomach, from sites distant from the tumour and exhibiting no visible evidence of tumour or intestinal metaplasia/dysplasia upon surgical assessment. Tumor samples were confirmed by cryosectioning to contain >40% tumor cells.
Nano-ChIPseq was performed as previously described, with the addition of a tissue dissociation step. Fresh-frozen cancer and normal tissues were dissected using a razor blade in liquid nitrogen to obtain ˜5 mg sized pieces (˜5 μl by apparent volume). Tissue pieces were fixed in 1% formaldehyde/TBSE buffer for 10 minutes (min) at room temperature. Fixation was stopped by addition of glycine to a final concentration of 125 mM.
Tissue pieces were washed 3 times with TBSE buffer, and transferred into Lysonator cartridges (SG Microlab Devices, Singapore). Tissues were dissociated following the manufacturer's guidelines (4K Hz for 3 min), and taken directly to the lysis step in the Nano ChIP assay. Dissociated tissues were lysed in 200 μl of lysis buffer and divided into two 1.5 ml tubes for sonication (6 min) using a Bioruptor (Diagenode). For each tissue, ChIPs were performed using the following antibodies: H3K4me3 (07-473, Millipore); H3K4me1 (ab8895, Abcam); H3K27ac (ab4729, Abcam); H3K36me3 (ab9050, Abcam); H3K27me3 (07-449, Millipore), using same chromatin preparation.
After recovery of ChIP and input DNA, whole-genome-amplification was performed using the WGA4 kit (Sigma-Aldrich) and BpmI-WGA primers.
Amplified DNA was digested with BpmI (New EngliandBiolabs), ligated to a 2nd BpmI adaptor and digested again to trim the WGA primer regions and semi-random priming ends. 15 ng of amplified DNA was used for each Illumina sequencing library using the ChIPseq kit (Illumina). Each library was sequenced on one lane of HiSeq2000 to obtain either 36- or 101-base single reads.
Sequencing tags were mapped against the human reference genome (hg19) using Burrows-Wheeler Aligner (BWA) software (version 0.7.0) and the “aln” algorithm. 101-base reads were trimmed by the first and last 10-base to increase SNP call performance. Uniquely mapped tags were used for peak calling by CCAT version 3.0. Peak regions were filtered by a fold-above input cut-off of 8 for H3K4me3, H3K27ac, 5 for H3K4me3 and H3K36me3 and 1.5 for H3K27me3 marks. For H3K4me3 and H3K4me1 histone modifications, peak regions from all tissue samples were pooled, and overlapping peak regions were merged to create a total set of peak regions for that modification for promoter and enhancer analysis. Normal input vs cancer input CCAT3 region sets with the same fold-cut-off were used to remove potential amplified regions for H3K4me3 and H3K4me1 regions. To quantify peak heights, we analyzed the ChIPseq data using Cufflinks (version 2.0.2). RPKM values were estimated for H3K4me3 and H3K27ac for promoter regions, H3K4me1 and H3K27ac for enhancer regions. Batch effects were assessed using principal components analysis (PCA) using the ‘prcomp’ function in R (version 2.15), and adjusted using ComBat3 after log transformation of RPKM values from Cufflinks. 3D-PCA plots were plotted using the ‘rgl’ package in R (version 2.15).
Somatically altered promoter and enhancer sets were identified using two methods—a “threshold” method and a linear model approach. The final set of altered elements was generated by combining the results from both methods.
H3K27ac ChIPseq ComBat-adjusted FPKM values for all promoters (H3K4me3-marked) and enhancers (H3K4me1-marked, but not overlapping with H3K4me3 peak regions) were filtered by (i) greater than 2-fold (absolute) change and (ii) greater than 0.5 (absolute) difference in mean values between 5 tumour and 5 normal samples. This was also performed for the H3K4me3 and H3K4me1 ChIPseq data. Altered elements were identified from the union of regions obtained for H3K27ac and H3K4me3 analyses (promoters), or H3K27ac and H3K4me1 (enhancers).
Box plots were plotted for the log-transformed ChIPseq data to assess the normality assumption, prior to applying an empirical Bayes linear model approach4 to obtain differentially altered regions between the tumour and normal samples (
RNAseq libraries were prepared using the Illumina Tru-Seq RNA Sample Preparation v2 protocol, according to the manufacturer's instructions. Briefly, poly-A RNAs were recovered from 1 μg of total RNA using poly-T oligo attached magnetic beads. The recovered poly-A RNA was chemically fragmented and converted to cDNA using SuperScript II and random primers.
The second strand was synthesized using the Second Strand Master Mix provided in the kit followed by purification with AMPure XP beads. The ends of the cDNA were repaired using 3′ to 5′ exonuclease activity. A single adenosine was added to the 3′ end, and the adaptors were attached to the ends of the cDNA using T4 DNA ligase.
The fragments with adapters ligated onto both ends were enriched by PCR. Libraries were validated with an Agilent Bioanalyzer (Agilent Technologies, Palo Alto, Calif.). Libraries were diluted to 11 pM and applied to an Illumina flow cell using the Illumina Cluster Station. Sequencing was performed on an Illumina High Seq2000 sequencer at the Duke-NUS Genome Biology Facility, with the paired-end 76-bp read option.
Reads were aligned to the human reference genome using TopHat v1.25. Unmapped reads were then aligned to potential splice junctions that were either: (i) present in Ensembl 60 transcript annotations, or (ii) suggested by “expression islands”, i.e. clusters of reads from transcripts that were not present in the annotations. Transcript abundances by FPKM value were estimated using Cufflinks (version 1.0.0) without using reference transcripts. De novo assembled transcripts from the tumor/normal pairs were filtered against the RefSeq transcript database to identify non-RefSeq annotated regions.
RefSeq transcripts were downloaded from the UCSC browser, and RefSeq annotated TSSs were defined by extending transcript start positions by −/+500 bases. Somatically altered H3K4me3 peak regions were compared against RefSeq TSS regions to determine overlaps. H3K4me3 regions with no overlap with RefSeq TSSs (−/+500 bases) were deemed non-RefSeq promoters (aka cryptic promoters).
De novo assembly of RNAseq reads was performed by Cufflinks (version 1.0.0) without the reference transcript set. Non-RefSeq transcripts were defined by filtering the Cufflinks de novo exon output against the RefSeq exons (minimum 1-base overlap). This non-RefSeq transcript set was intersected against the cancer-associated H3K4me3 regions (minimum 1-base overlap).
Quantitative PCR was performed using a SYBR Green PCR kit (Life technologies, USA). GAPDH was used as a control gene for normalization. All PCR reactions were performed in triplicate.
5′ Rapid Amplification of cDNA Ends (5′ RACE)
5′ RACE was performed using the 5′ RACE System for Rapid Amplification of cDNA Ends (version 2) kit (Invitrogen). 1 μg of total RNA was used for each reverse transcription reaction with the Moloney Murine Leukemia Virus (M-MLV) reverse transcriptase, and gene specific primers for MET Refseq exon 3 (5′ CTTCAGTGCAGGG3′) or NKX6-3 Refseq exon 1 (5′GAAGGTAGGCTCCTC3′).
RNase H and RNase T1 were used to degrade the RNA, followed by the purification of first strand cDNAs with S.N.A.P. columns. Homopolymeric tailing of cDNAs were then used to create abridged anchor primer binding sites. Amplification of first strand cDNAs was performed using SuperTaq Plus Polymerase (Applied Biosystems) for 5′RACE outer PCR with the abridged anchor primer, and gene specific primers for MET exon 3 (5′GGCTCCAGGGTCTTCACCTCCA3′) and NKX6-3 exon 1 (5′CCAGGCTGAGCACCGAGAAGGC3′). Subsequently, 5′RACE inner nested PCR was performed with the abridged universal amplification primer (AUAP), and the gene specific primers for MET exon 3 (same as outer 5′ PCR) and NKX6-3 exon 1 (5′GCTTGCGCAGCAGCAGGCGGAT3′).
Gel electrophoresis was performed, and PCR bands of interest were excised for cloning with a TOPO TA Cloning Kit with pCR 4-TOPO vectors (Invitrogen). A minimum of five independent colonies were isolated, and the purified plasmid DNA were sequenced bi-directionally on an ABI 3730 automated sequencer (Applied Biosystems.).
200 GC and 100 matched normal gastric samples profiled on Affymetrix Human Genome U133 Plus 2.0 Genechip arrays were analyzed (GSE 15459). Data pre-processing was carried out using the ‘affyPLM’ R package (v 2.15). Outliers were excluded, giving a total of 185 GC and 89 normalsamples available for downstream analyses. Differential expression analysis between GC samples was performed using the ‘limma’ R package (v 2.15). Genes with false discovery rates (FDR)<0.05 were considered to be differentially expressed. Genes used for differential expression analysis were those emerging from the GREAT (v 2.02) analysis performed on the list of non-Refseq transcripts from RNAseq analysis. For survival analysis, the GC samples were clustered using a K-medoids approach aimed at finding K that minimizes the silhouette width. To assess correlation of different GC groups with clinico-pathological factors, a mosaic plot was plotted for categorical variables while a linear regression approach was employed for continuous variables. Significance (p<0.05) of the correlation was determined by a Pearson chi-square test or a t-test accordingly. Kaplan-Meier survival analysis was employed with overall survival as the outcome metric. The log-rank test was used to assess the significance of the Kaplan-Meier analysis. Univariate and multivariate analyses were performed using Cox regression.
ENCODE ChIPseq TFBS datasets (Txn Fac ChIP V3-Transcription Factor ChIP-seq Clusters V3, 161 targets, 189 antibodies) were obtained from the UCSC browser. Overlaps against cancer-associated promoters and enhancers (or all promoters and enhancers) were counted for each TF. TF site counts were divided by the base coverage length of each corresponding promoter, enhancer, or total set to calculate the TF site frequency per 10 kb coverage.
Ilumina Human Methylation450 (HM450) Infinium DNA methylation arrays were used to assay DNA methylation levels between the gastric tumor/normal pairs. Methylation-values were calculated and background corrected using the methylumi package in Rpackage version 2.4.0. Normalization was performed using the BMIQ method (wateRmelon package in R).
Probes containing SNPs and repeats were removed. Additionally, probes on the X and Y chromosomes were also removed. Control groups used included all 21,692 promoter regions. For each group (control, gain, and loss), we identified HM450 probes overlapping with the promoter regions (135606, 2268, 963 probes for all, cancer-gained, and cancer-lost respectively). Probes with a detection p-value >0.05 were excluded. Probes that had an average change in DNA methylation, between the tumor and normal pairs, of at least 0.2 (in either direction) were selected and plotted. A two-sample Wilcoxon test was performed.
The sequencing data were pre-processed according to the best practices workflow in Genome Analysis Toolkit (GATK version 2.6). Specifically, samtools was used to remove PCR duplicates. The remaining sequences were corrected for misalignments due to the presence of indels followed by base quality score recalibration. Single nucleotide variants (SNVs) in each GC/normal pair were called using MuTect11. We used SNV attributes reported by MuTect to classify the SNVs as either dbSNP sites or private SNVs. The dbSNP sites have the following criteria: (i) it is a known dbSNP site, (ii) the site is powered to detect a mutation (a.k.a covered site), and (iii) it passes the variant filters implemented in MuTect.
The alternate allele fraction was determined at each site by computing alternate allele frequencies.
Homozygous dbSNP sites showing an average alternate allele fraction greater than 0.9 in GC/normal pair were excluded. We focused on heterozygous sites showing an alternate allele fraction difference greater than 0.3 in a GC/normal pair (ie allele bias).
Allele-biased sites mapping to regions exhibiting cancer-associated chromatin mark alterations were assessed for functional impact using RegulomeDB12. For RegulomeDB hits, we also confirmed by quantitative pyrosequencing a lack of allele bias in input DNA populations. For somatic mutations, the focus was on private (non-dbSNP) SNPs. These “private” SNPs were identified using the following criteria: (i) it is a novel non-dbSNP variant, (ii) the alternate allele fraction in tumor is greater than 0.3 at a covered site, or 0.5 at uncovered sites, (iii) the site coverage has at least 14 reads in GC, (iv) there is no mutant allele at the uncovered site in normal tissue. Besides MuTect, private SNPs were also considered and identified using CLC Genomics Workbench (CLC Bio). Private SNPs mapping to regions exhibiting cancer-associated chromatin mark alterations were regarded as candidate somatic mutations.
Pyrosequencing was performed on a PyroMark Q24 (Qiagen). Results were analyzed with PyroMark software for allele quantification. For ChIP-qPCR-pyrosequencing, PCR primers were used for both real-time PCR quantification of ChIP DNA and allele-quantification by pyrosequencing with WGAamplified DNAs as a template. Quantification results and allele representations were combined to estimate the fraction of two alleles in the ChIP signal. Binding site predictions were performed using the TFBIND13 (http://tfbind.hgc.jp/).
Luciferase reporter assays were performed using Promega pGL3 (firefly luciferase) and RLSV40 (Renilla luciferase) plasmids. The FOS gene promoter was amplified by PCR from human genomic DNA with BglII-HindIII linker primer, and ligated into the pGL3-BASIC plasmid. HOXA11-associated fragments (˜350 bp) containing either wild-type or mutated alleles were amplified from ChIP-WGA DNA with BglII linker primers, and cloned upstream of the FOS promoter. Insert directions and allele identities were confirmed by Sanger sequencing. KATO-III GC cells were seeded at 1×106 cells per 24-well plate, transfected with the pGL3 reporter or derivatives (100 ng per well), and pRLSV40 (20 ng per well) using Lipofectamine 2000 (Invitrogen). Cells were harvested 42 hours post transfection, lysed in PLB buffer provided by the Dual-Luciferase Kit (Promega) and luciferase activity was measured. Reading of firefly luciferase activity was divided by renilla luciferase activity to normalize transfection efficiencies.
Nano-ChIPseq has been validated down to the 1,000-cell scale (
Comparison of the chromatin marks revealed that regions of active transcription (H3K36me3) were exclusive to regions of repressive chromatin (H3K27me3) (
To identify somatically altered promoters and enhancers in GC, sequencing tag densities were quantified and compared (read per kilo-base per million tags, RPKM) between GCs and normal tissues (
Gains of new promoters in primary GCs outweighed promoter losses (472 gained vs. 167 lost,
Genes located near cancer-associated promoters were significantly enriched in gene sets related to gastrointestinal neoplasms/digestive system cancers (
To validate these expression patterns, it was confirmed that genes driven by H3K4me3-marked cancer associated promoters exhibited similar tumor up regulation in an expanded microarray cohort of 185 GCs and 89 normal gastric tissues (p=5.68×10−6;
When mapped against genomic occupancy data of 161 transcription factors (ENCODE consortium), cancer-associated promoters exhibited a generalized depletion of previously-defined transcription factor binding sites (
To identify single nucleotide variants (SNVs) in the Nano-ChIPseq data, an analytical pipeline was developed based on MuTect, a sensitive mutation/variant identification algorithm. 335,918 unique SNVs were identified in the combined H3K4me3, H3K4me1, H3K27ac and input data. Supporting the accuracy of the variant calling pipeline, 99.8% of the SNVs (335,247) corresponded to known SNPs (dbSNP137). Among the identified dbSNPs, approximately ˜251,800 were heterozygous in at least one sample.
It was found that heterozygous SNPs mapping to regulatory elements could divided into non-allele biased and allele biased sites. At non-biased sites, Nano-ChIPseq sequence reads exhibited an equal proportion of reference and variant alleles. For example, GC 2000639 exhibited a cancer-associated promoter at the TNK2 gene locus (
It was reasoned that the allele-biased sites in cancer samples might be caused by either loss-of heterozygozity (LOH), or active enrichment of particular alleles for chromatin marks (allele-specific regulatory elements). To identify the allele-specific regulatory elements associated with cancer, heterozygous sites exhibiting allele bias (SNP over-representation of >30%;
Besides dbSNPs, also identified were private (non-dbSNP) SNVs overlapping with GC-associated regulatory elements. Four private SNVs were validated as bona-fide somatic mutations, being present in GCs but not normal tissues, occurring in non-coding regions associated with CHD10, HOXA5, FAR2 and HOXA11 (
Fourth, presence of this mutation is predicted to alter transcription factor binding (
Regulatory elements are estimated to occupy 1.5-10% of the human genome, and strongly influence development and disease. However, locating these elements, and defining biological states regulating their activity, remains an important challenge. Here, Nano-ChIPseq was used to perform a first-pass survey of chromatin alterations in primary GCs. In future, Nano-ChIPseq could be expanded to other tumor types and to smaller cell numbers, facilitating analysis of diagnostic biopsies and drug resistant clones. From a translational perspective, our findings also suggest that cryptic promoters, and their associated non-canonical transcripts, could be conceivably exploited as biomarkers for cancer diagnostics.
Primary patient samples were obtained from the Singhealth tissue repository, and collected with approvals from institutional research ethics review committees and signed patient informed consent.
‘Normal’ (non-malignant) samples used in this study refer to samples harvested from the stomach, from sites distant from the tumour and exhibiting no visible evidence of tumour or intestinal metaplasia/dysplasia upon surgical assessment. Tumor samples were confirmed by cryosectioning to contain >40% tumor cells.
Nano-ChIPseq was performed as previously described, with the addition of a tissue dissociation step. Fresh-frozen cancer and normal tissues were dissected using a razor blade in liquid nitrogen to obtain ˜5 mg sized pieces (˜5 μl by apparent volume). Tissue pieces were fixed in 1% formaldehyde/PBS buffer for 10 minutes (min) at room temperature. Fixation was stopped by addition of glycine to a final concentration of 125 mM.
Tissue pieces were washed 3 times with TBSE buffer, and transferred into Lysonator cartridges (SG Microlab Devices, Singapore). Tissues were dissociated following the manufacturer's guidelines (4K Hz for 6 min), and taken directly to the lysis step in the Nano ChIP assay. Dissociated tissues were lysed in 200 μl of lysis buffer and divided into two 1.5 ml tubes for sonication (6 min) using a Bioruptor (Diagenode). For each tissue, ChIPs were performed using the following antibodies: H3K4me3 (07-473, Millipore); H3K4me1 (ab8895, Abcam); H3K27ac (ab4729, Abcam); H3K36me3 (ab9050, Abcam); H3K27me3 (07-449, Millipore), using same chromatin preparation.
After recovery of ChIP and input DNA, whole-genome-amplification was performed using the WGA4 kit (Sigma-Aldrich) and BpmI-WGA primers.
Amplified DNA was digested with BpmI (New England Biolabs). 10 ng of amplified DNA was used for each Illumina sequencing library. Library preparation was done using E6240 New England Biolabs kit and were then multiplexed before sequencing using the E7335 New England Biolabs kit.
Sequencing tags were mapped against the human reference genome (hg19) using Burrows-Wheeler Aligner software (version 0.7.0) and the ‘aln’ algorithm. MAPQ Filter of 20 was applied to remove low quality reads and all PCR duplicates were also removed using MarkDup from Picard. Uniquely mapped tags were used for peak calling by CCAT version 3.0, with fragment size 200 bp and sliding window of 500 bp with moving step of 50 bp. for histone modifications. Peak regions were filtered by False Discovery Rate (FDR) 5%.
H3K4me3 and H3K27ac signal intensity plots around the transcription start site (TSS) were plotted by calculating the average coverage of each chromatin mark around all annotated TSS in Refseq. A 6 kb window around the known TSS was divided into bins of 100 bp and coverage for both H3K4me3 and H3K27ac was calculated and was then averaged out across each bin.
All H3K4me3 regions were merged across GC samples and normal samples respectively, using bedtools and overlapping regions (1 bp overlap) were counted as common regions. Regions without any overlaps were termed as Private regions. To provide a genomic null expectation for overlap between H3K4me3 regions among samples, consensus regions were shuffled over the entire reference genome, using shufflebed from bedtools, however excluding the ENCODE DAC Blacklisted regions and gap regions (These are a published set of regions from Dunham, I., et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012)). The regions were shuffled 10000 times and an empirical p value was generated using the overlap distribution.
H3K4me3 regions with differential enrichment in gastric cancer vs. normal samples were identified using the DESeq2 algorithm from Bioconductor. The enrichment is the gain of the H3k4me3 in the cancerous sample vs. the normal non-cancerous sample. A matrix of sequencing tag counts was generated by 1) combining all identified promoter regions across all GC and normal samples, and 2) determining the number of sequencing reads in each region across all samples. Both of steps 1) and 2) are carried out using bedtools. DESeq2 tests determine the differential enrichment by use of negative binomial generalized linear models on various sequencing assays, including ChIPseq. The matrix of sequencing tag counts from all samples was generated as input for the DESeq2 tests by taking a union of H3k4me3 regions identified across replicates using bedtools, and counting the no, of sequencing reads in each resulting promoter regions wherein the DESeq2 test fits a negative binomial generalized linear model to find promoter regions that are statistically different between Gastric cancer and normal samples ie. somatically altered promoters. Statistically different refers to a statistical threshold of a False Discovery Rate of 10% i.e q value 0.1 as well as an absolute fold change of 1.5.
Differential regions were identified for both tumor and normal groups as well as individual sample specific peaks.
GENCODE transcripts were downloaded from their ftp site while Refseq transcripts were downloaded from the UCSC browser. Transcription support level information for GENCODE transcripts was downloaded from UCSC ftp site. Annotated TSSs were defined by extending transcript start positions by ±500 bases. Differentially enriched H3K4me3 peak regions were compared against the TSS regions to determine overlaps. De novo assembly of RNAseq reads was performed by Cufflinks-2.2.0.12 and unannotated H3K4me3 differential regions were filtered by overlap with the 1st exons of the de novo assembly with class code ‘j’ or ‘u’.
Reads were aligned to the human reference genome using TopHat 2-2.0.12 using unique mapping. The transcriptome was assembled de-novo using Cufflinks2-2.0.12 and all GC transcript assemblies were merged using ‘cuffmerge-2.2.0’ to get a consensus transcriptome. Raw RNAseq data for TCGA stomach adenocarcinoma was downloaded from the TCGA repository (http://cancergenome.nih.gov/).
Differentially enriched regions were overlapped with CCAT3 called H3K27me3 peaks from 3 samples (1 bp overlap) to determine its presence or absence.
Using an expanded cohort of 8 primary GCs and matched normal samples, the promoter elements of gastric cancers (GC) were characterized with 2 promoter associated marks, H3K4me3 and H3K27ac, using Nano-ChIPseq. The peaks were called using CCAT3 and identified an average of 11k H3K4me3 and 34k H3K27ac peaks per sample. 70-80% of the H3K4me3 regions were common among samples in both GC and normal tissue, greater than expected by chance (p<0.001).
Both H3K4me3 and H3K27ac showed a standard bi-modal distribution enriched around transcription start sites (TSS) (
To identify genome wide somatically altered promoters, the matrix of sequencing tag counts were compared between primary GCs and matched gastric normal tissue assuming a negative binomial distribution (as analyzed by the DESeq2 algorithm). Comparing all 8 GC samples to the pool of normal tissues, 516 robustly somatically altered regions (q<0.1, Fold change>1.5) were obtained, with ˜60% of them being gained or epigenetically activated in GC (
Clustering of the H3K4me3 signals across samples for the identified regions confirmed a distinct separation (
249 of the (48%) somatically altered regions were additional to the identified 639 promoters of Example 1 that used a 2 fold FPKM based sequencing tag density comparison. Gains of new promoters in primary GCs again outweighed promoter losses i.e. 148 (60%) gained vs. 101 lost. Overall, 620 promoter regions were gained in GC (70%) vs. 260 lost in GC.
Using a more comprehensive database of transcripts, the somatically altered regions were overlapped with 1 kb windows around known GENCODE TSS to annotate them. 62% of the somatically altered promoter regions overlapped known transcripts. However, a substantial 38% lay beyond 500 bp of annotated TSSs (
H3K4me3 enrichment at specific loci helped observe patterns of alternative promoter usage in GC influencing transcript selection. 553 (63% gained in GC) somatically altered promoter regions overlapped known transcripts. A preferential activation/repression was observed of one transcript over another in multi-transcript genes, such as HNF4A. HNF4A is a well known transcription factor gene that regulates development of liver, kidney and intestines. In GC, HNF4A has been reported to be over expressed and a recent immunohistochemistry study showed its potential as a marker to distinguish GC tissues from breast cancer tissues.
H3K4me3 enrichment in GC (FC 2.52, q<0.001) was observed at a promoter almost 45 kb downstream of the canonical HNF4A isoform TSS. The canonical promoter, on the other hand, showed equal lysine trimethylation in GC and normals, highlighting a GC specific usage of the downstream promoter and thus a shorter protein coding isoform of HNF4A (
Other cancer related genes with instances of such alternate promoter usage were EPCAM (FC 1.64, q<0.001), KRT7 (FC 2.00, q<0.001), AIM1L (FC 1.95, q<0.001), among others. The FC and q value statistics are derived from the DESeq2 analysis, where FC defines the fold change.
Somatically altered promoters also often overlapped only one transcript in genes associated with multiple transcripts, marking the primary promoter and a cancer-specific isoform (
Instances of somatically altered regions overlapping GENCODE transcripts that had very poor transcriptional supporting evidence (tsl 2 or more) was also observed. Such GENCODE transcript annotations, with little or no mRNA support, are often not included in more curated databases such as Refseq.
109 enriched regions overlapping TSS of such transcripts was observed, supported by RNA expression in GC, which highlighted the GC specific usage of these otherwise, unsupported isoforms. One such example was MYO15B, a transcribed pseudogene that showed significant gain (FC 2.16, q<0.01) of H3K4me3 in GC at its unsupported isoform promoter while there was complete absence of H3K4me3 at its canonical isoform. (
Additional cryptic promoters were identified marking novel 5′ start sites of GC specific isoforms which were associated with bona fide RNA transcripts. A prominent example was Ras GTPase-activating protein 3 (RASA3), which showed differential H3K4me3 enrichment in GC samples at a promoter region almost 127 kb downstream from the canonical transcript start site forming a much shorter novel isoform being transcribed only in GC tissues. The canonical isoform showed an equal amount of H3K4me3 in both GC and normal tissues. Other examples of such novel 5′ start site isoforms were GRIN2D (FC 2.52, q<0.001), ONECUT3 (FC 2.52, q<0.001), and TNNI3 (FC 2.52, q<0.001).
Alternative promoter usage also changes in the protein composition of alternate GC specific isoforms. Using genomic sequence of known or denovo assembled isoforms arising from alternate promoters, the presence of protein domains was predicted and the domain composition compared to that of the canonical isoform to find instances of protein alterations.
Cases where the alternate isoform was supported by RNAseq were then selected for protein composition changes. 10 such high confidence genes were identified that showed protein domain diversity including RASA3 (
The GC specific shorter isoform lacks the RasGAP domain that acts as a molecular switch downregulating the activity of Ras. In the absence of this domain, it could lead to increase in expression of GTP bound RAS and thus aberrant cellular proliferation.
Also observed was the presence of the H3K27me3 mark in somatically altered regions. In many cases, H3K27me3 mark was observed in either GC or normal tissues potentially marking transition of promoters from monovalent state to a bivalent poised state or vice-versa. For example, TNFSF9, a cytokine involved in tumor necrosis factor binding and shown to be expressed in Epstein Barr Virus (EBV) associated GC the showed gain of H3K4me3 and also presence of H3K27me3 in GC while the repressive trimethylation mark was absent in normal tissue. TNFS9 had concordant low levels of RNAseq expression (FPKM 4.9) with its poised epigenetic state in GC.
The identified somatically altered promoters highlight and confirm widespread alternate promoter usage in GC, as well as elucidating that alternate promoter usage can impact the protein domain composition of resulting proteins, that may be specific to GC.
As such, based upon the above observations and experimental data, using an expanded cohort of 8 primary Gastric Cancer (GC) in comparison to matched normal samples, the calculated read count matrix based algorithm (Deseq2) was able to identify additional somatically altered promoter regions. The identification of somatically altered promoter regions highlight and confirm widespread alternate promoter usage specific to cancer, as exemplified with respect to gastric cancer, either through preferential alteration of the promoter of one transcript or by alteration of the promoter of the primary transcript in use in multi-transcript genes. Further, additional ‘cryptic promoters’ were identified in the expanded cohort marking 5′ start sites of non-canonical isoforms.
The above non-canonical isoforms showed change in the domain composition of resulting proteins that may be specific to a particular cancer, for example in Table 1 illustrates the proteins that may be specific to GC. These cryptic promoters and associated non-canonical transcripts can be used as biomarkers for targeted therapy and cancer diagnostics. As such, the present invention and methods disclosed herein are advantageous in identifying and providing yet further biomarkers for the possible detection and diagnosis of cancer in a subject.
Number | Date | Country | Kind |
---|---|---|---|
201309689-6 | Dec 2013 | SG | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/SG2014/000625 | 12/30/2014 | WO | 00 |