The present invention relates generally to nucleic acid molecules, the RNA and protein expression profiles of which are indicative of the onset, predisposition to the onset and/or progression of a neoplasm. More particularly, the present invention is directed to nucleic acid molecules, the expression profiles of which are indicative of the onset and/or progression of a large intestine neoplasm, such as an adenoma or an adenocarcinoma. The expression profiles of the present invention are useful in a range of applications including, but not limited to, those relating to the diagnosis and/or monitoring of colorectal neoplasms, such as colorectal adenocarcinomas. Accordingly, in a related aspect the present invention is directed to a method of screening a subject for the onset, predisposition to the onset and/or progression of a neoplasm by screening for modulation in the expression profile of one or more nucleic acid molecule markers.
The Sequence Listing in the ASCII text file, named as 26139ABC_SeqListing.txt of 146 KB, created on November 9, 2017 and submitted to the United States Patent and Trademark Office via EFS-Web, is incorporated herein by reference.
Bibliographic details of the publications referred to by author in this specification are collected alphabetically at the end of the description.
The reference in this specification to any prior publication (or information derived from it), or to any matter which is known, is not, and should not be taken as an acknowledgment or admission or any form of suggestion that that prior publication (or information derived from it) or known matter forms part of the common general knowledge in the field of endeavour to which this specification relates.
Adenomas are benign tumours, or neoplasms, of epithelial origin which are derived from glandular tissue or exhibit clearly defined glandular structures. Some adenomas show recognisable tissue elements, such as fibrous tissue (fibroadenomas) and epithelial structure, while others, such as bronchial adenomas, produce active compounds that might give rise to clinical syndromes.
Adenomas may progress to become an invasive neoplasm and are then termed adenocarcinomas. Accordingly, adenocarcinomas are defined as malignant epithelial tumours arising from glandular structures, which are constituent parts of many organs of the body. The term adenocarcinoma is also applied to tumours showing a glandular growth pattern. These tumours may be sub-classified according to the substances that they produce, for example mucus secreting and serous adenocarcinomas, or to the microscopic arrangement of their cells into patterns, for example papillary and follicular adenocarcinomas. These carcinomas may be solid or cystic (cystadenocarcinomas). Each organ may produce tumours showing a variety of histological types, for example the ovary may produce both mucinous and cystadenocarcinoma.
Adenomas in different organs behave differently. In general, the overall chance of carcinoma being present within an adenoma (i.e. a focus of cancer having developed within a benign lesion) is approximately 5%. However, this is related to size of an adenoma. For instance, in the large bowel (colon and rectum specifically) occurrence of a cancer within an adenoma is rare in adenomas of less than 1 centimetre. Such a development is estimated at 40 to 50% in adenomas which are greater than 4 centimetres and show certain histopathological change such as villous change, or high grade dysplasia. Adenomas with higher degrees of dysplasia have a higher incidence of carcinoma. In any given colorectal adenoma, the predictors of the presence of cancer now or the future occurrence of cancer in the organ include size (especially greater than 9 mm) degree of change from tubular to villous morphology, presence of high grade dysplasia and the morphological change described as “serrated adenoma”. In any given individual, the additional features of increasing age, familial occurrence of colorectal adenoma or cancer, male gender or multiplicity of adenomas, predict a future increased risk for cancer in the organ—so-called risk factors for cancer. Except for the presence of adenomas and its size, none of these is objectively defined and all those other than number and size are subject to observer error and to confusion as to precise definition of the feature in question. Because such factors can be difficult to assess and define, their value as predictors of current or future risk for cancer is imprecise.
Once a sporadic adenoma has developed, the chance of a new adenoma occurring is approximately 30% within 26 months.
Colorectal adenomas represent a class of adenomas which are exhibiting an increasing incidence, particularly in more affluent countries. The causes of adenoma, and of progression to adenocarcinoma, are still the subject of intensive research. To date it has been speculated that in addition to genetic predisposition, environmental factors (such as diet) play a role in the development of this condition. Most studies indicate that the relevant environmental factors relate to high dietary fat, low fibre, low vegetable intake, smoking, obesity, physical inactivity and high refined carbohydrates.
Colonic adenomas are localised areas of dysplastic epithelium which initially involve just one or several crypts and may not protrude from the surface, but with increased growth in size, usually resulting from an imbalance in proliferation and/or apoptosis, they may protrude. Adenomas can be classified in several ways. One is by their gross appearance and the major descriptors include degrees of protrusion: flat sessile (i.e. protruding but without a distinct stalk) or pedunculated (i.e. having a stalk). Other gross descriptors include actual size in the largest dimension and actual number in the colon/rectum. While small adenomas (less than say 5 or 10 millimetres) exhibit a smooth tan surface, pedunculated and especially larger adenomas tend to have a cobblestone or lobulated red-brown surface. Larger sessile adenomas may exhibit a more delicate villous surface. Another set of descriptors include the histopathological classification; the prime descriptors of clinical value include degree of dysplasia (low or high), whether or not a focus of invasive cancer is present, degree of change from tubular gland formation to villous gland formation (hence classification is tubular, villous or tubulovillous), presence of admixed hyperplastic change and of so-called “serrated” adenomas and its subgroups. Adenomas can be situated at any site in the colon and/or rectum although they tend to be more common in the rectum and distal colon. All of these descriptors, with the exception of number and size, are relatively subjective and subject to interobserver disagreement.
The various descriptive features of adenomas are of value not just to ascertain the neoplastic status of any given adenomas when detected, but also to predict a person's future risk of developing colorectal adenomas or cancer. Those features of an adenoma or number of adenomas in an individual that point to an increased future risk for cancer or recurrence of new adenomas include: size of the largest adenoma (especially 10 mm or larger), degree of villous change (especially at least 25% such change and particularly 100% such change), high grade dysplasia, number (3 or more of any size or histological status) or presence of serrated adenoma features. None except size or number is objective and all are relatively subjective and subject to interobserver disagreement. These predictors of risk for future neoplasia (hence “risk”) are vital in practice because they are used to determine the rate and need for and frequency of future colonoscopic surveillance. More accurate risk classification might thus reduce workload of colonoscopy, make it more cost-effective and reduce the risk of complications from unnecessary procedures.
Adenomas are generally asymptomatic, therefore rendering difficult their diagnosis and treatment at a stage prior to when they might develop invasive characteristics and so became cancer. It is technically impossible to predict the presence or absence of carcinoma based on the gross appearance of adenomas, although larger adenomas are more likely to show a region of malignant change than are smaller adenomas. Sessile adenomas exhibit a higher incidence of malignancy than pedunculated adenomas of the same size. Some adenomas result in blood loss which might be observed or detectable in the stools; while sometimes visible by eye, it is often, when it occurs, microscopic or “occult”. Larger adenomas tend to bleed more than smaller adenomas. However, since blood in the stool, whether overt or occult, can also be indicative of non-adenomatous conditions, the accurate diagnosis of adenoma is rendered difficult without the application of highly invasive procedures such as colonoscopy combined with tissue acquisition by either removal (i.e. polypectomy) or biopsy and subsequent histopathological analysis.
Accordingly, there is an on-going need to elucidate the causes of adenoma and to develop more informative diagnostic protocols or aids to diagnosis that enable one to direct colonoscopy at people more likely to have adenomas. These adenomas may be high risk, advanced or neither of these. Furthermore, it can be difficult after colonoscopy to be certain that all adenomas have been removed, especially in a person who has had multiple adenomas. An accurate screening test may minimise the need to undertake an early second colonoscopy to ensure that the colon has been cleared of neoplasms. Accordingly, the identification of molecular markers for adenomas would provide means for understanding the cause of adenomas and cancer, improving diagnosis of adenomas including development of useful screening tests, elucidating the histological stage of an adenoma, characterising a patient's future risk for colorectal neoplasia on the basis of the molecular state of an adenoma and facilitating treatment of adenomas.
To date, research has focused on the identification of gene mutations which lead to the development of colorectal neoplasms. In work leading up to the present invention, however, it has been determined that changes in expression profiles of genes which are also expressed in healthy individuals are indicative of the development of neoplasms of the large intestine, such as adenomas and adenocarcinomas. It has been further determined that in relation to neoplasms of the large intestine, diagnosis can be made based on screening for one or more of a panel of these differentially expressed genes. In a related aspect, it has still further been determined that to the extent that neoplastic tissue has been identified either by the method of the invention or by some other method, the present invention provides still further means of characterising that tissue as an adenoma or a cancer. In yet another aspect, it has been determined that a proportion of these genes are characterised by gene expression which occurs in the context of a neoplastic state but not in the context of a non-neoplastic state, thereby facilitating the development of qualitative analyses which do not require a relative analysis to be performed against a non-neoplastic or normal control reference level. Accordingly, the inventors have identified a panel of genes which facilitate the diagnosis of adenocarcinoma and adenoma development and/or the monitoring of conditions characterised by the development of these types of neoplasms.
Throughout this specification and the claims which follow, unless the context requires otherwise, the word “comprise”, and variations such as “comprises” and “comprising”, will be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integers or steps.
As used herein, the term “derived from” shall be taken to indicate that a particular integer or group of integers has originated from the species specified, but has not necessarily been obtained directly from the specified source. Further, as used herein the singular forms of “a”, “and” and “the” include plural referents unless the context clearly dictates otherwise.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The subject specification contains amino acid and nucleotide sequence information prepared using the programme PatentIn Version 3.4, presented herein after the bibliography. Each amino acid and nucleotide sequence is identified in the sequence listing by the numeric indicator <210 > followed by the sequence identifier (eg. <210>1, <210>2, etc). The length, type of sequence (amino acid, DNA, etc.) and source organism for each sequence is indicated by information provided in the numeric indicator fields <211>m <212> and <213>, respectively. Amino acid and nucleotide sequences referred to in the specification are identified by the indicator SEQ ID NO: followed by the sequence identifier (eg. SEQ ID NO:1, SEQ ID NO: 2, etc). The sequence identifier referred to in the specification correlates to the information provided in numeric indicator field <400> in the sequence listing, which is followed by the sequence identifier (eg. <400>1, <400>2, etc). That is SEQ ID NO: 1 as detailed in the specification correlates to the sequence indicated as <400>1 in the sequence listing.
One aspect of the present invention is directed to a method of screening for the onset or predisposition to the onset of a large intestine neoplasm in an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or group (ii) relative to control levels is indicative of a neoplastic large intestine cell or a cell predisposed to the onset of a neoplastic state.
In another aspect there is provided a method of screening for the onset or predisposition to the onset of a large intestine neoplasm in an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
In yet another aspect there is provided a method of screening for the onset or predisposition to the onset of a large intestine neoplasm in an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
In still another aspect there is provided a method of screening for the onset or predisposition to the onset of a large intestine neoplasm in an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
Preferably, said control level is a non-neoplastic level.
In still yet another aspect there is provided a method of screening for the onset or predisposition to the onset of a large intestine neoplasm in an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or group (ii) relative to control levels is indicative of a neoplastic large intestine cell or a cell predisposed to the onset of a neoplastic state.
In a further aspect the present invention is directed to a method of screening for the onset or predisposition to the onset of a large intestine neoplasm in an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or group (ii) relative to control levels is indicative of an adenoma cell or a cell predisposed to the onset of an adenoma state.
In another further aspect of the present invention there is provided a method of screening for the onset or predisposition to the onset of a large intestine neoplasm in an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or group (ii) relative to control levels is indicative of a cancer cell or a cell predisposed to the onset of a cancerous state.
In yet another further aspect there is provided a method of screening for the onset or predisposition to the onset of a large intestine neoplasm in an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or group (ii) relative to background levels is indicative of a neoplastic cell or a cell predisposed to the onset of a neoplastic state.
Yet another aspect of the present invention provides a method of screening for the onset or predisposition to the onset of a large intestine neoplasm in an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or group (ii) relative to background levels is indicative of an adenoma cell or a cell predisposed to the onset of an adenoma state.
In yet still another aspect there is provided a method of screening for the onset or predisposition to the onset of a large intestine neoplasm in an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or group (ii) relative to background levels is indicative of a cancer cell or a cell predisposed to the onset of a cancerous state.
In still yet another aspect of the present invention, there is provided a method of characterising a neoplastic cell or cellular population, which cell or cellular population is derived from the large intestine of an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in said cell or cellular population wherein a higher level of expression of the genes or transcripts of group (i) and/or group (ii) relative to a gastrointestinal cancer cell level is indicative of an adenoma cell or a cell predisposed to the onset of an adenoma state.
In another aspect there is provided a method of characterising a neoplastic cell or cellular population, which cell or cellular population is derived from the large intestine of an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or group (ii) relative to a gastrointestinal adenoma cell level is indicative of a cancer or a cell predisposed to the onset of a cancerous state.
In a further aspect there is provided a method of characterising a neoplastic cell or cellular population, which cell or cellular population is derived from the large intestine of an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
In another aspect there is provided a method of characterising a neoplastic cell or cellular population, which cell or cellular population is derived from the large intestine of an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or group (ii) relative to a gastrointestinal cancer control level is indicative of an adenoma cell or a cell predisposed to the onset of an adenoma state.
A further aspect of the present invention is directed to a method of characterising a cell or cellular population, which cell or cellular population is derived from the large intestine of an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
In yet another further aspect the present invention is directed to a method of characterising a cell or cellular population, which cell or cellular population is derived from the large intestine of an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
In still yet another further aspect the present invention is directed to a method of characterising a cell or cellular population, which cell or cellular population is derived from the large intestine of an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or (ii) relative to a gastrointestinal adenoma control level is indicative of a cancer or a cell predisposed to the onset of a cancerous state.
Yet another aspect of the present invention provides a method of characterising a neoplastic cell or cellular population, which cell or cellular population is derived from the large intestine of an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or (ii) relative to neoplastic tissue background levels is indicative of an adenoma cell or a cell predisposed to the onset of an adenoma state.
In still another aspect the present invention provides a method of characterising a neoplastic cell or cellular population, which cell or cellular population is derived from the large intestine of an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
A related aspect of the present invention provides a molecular array, which array comprises a plurality of:
wherein the level of expression of said marker genes of (i) or proteins of (iv) is indicative of the neoplastic state of a cell or cellular subpopulation derived from the large intestine.
The present invention is predicated, in part, on the elucidation of gene expression profiles which characterise large intestine cellular populations in terms of their neoplastic state and, more particularly, whether they are malignant or pre-malignant. This finding has now facilitated the development of routine means of screening for the onset or predisposition to the onset of a large intestine neoplasm or characterising cellular populations derived from the large intestine based on screening for upregulation of the expression of these molecules, relative to control expression patterns and levels. To this end, in addition to assessing expression levels of the subject genes relative to normal or non-neoplastic levels, it has been determined that a proportion of these genes are expressed only in the diseased state, thereby facilitating the development of a simple qualitative test based on requiring assessment only relative to test background levels.
In accordance with the present invention, it has been determined that the genes detailed above are modulated, in terms of differential changes to their levels of expression, depending on whether the cell expressing that gene is neoplastic or not. It should be understood that reference to a gene “expression product” or “expression of a gene” is a reference to either a transcription product (such as primary RNA or mRNA) or a translation product such as protein. These genes and their expression products, whether they be RNA transcripts or encoded proteins, are collectively referred to as “neoplastic markers”.
Accordingly, one aspect of the present invention is directed to a method of screening for the onset or predisposition to the onset of a large intestine neoplasm in an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or group (ii) relative to control levels is indicative of a neoplastic large intestine cell or a cell predisposed to the onset of a neoplastic state.
Reference to “large intestine” should be understood as a reference to a cell derived from one of the six anatomical regions of the large intestine, which regions commence after the terminal region of the ileum, these being:
Reference to “neoplasm” should be understood as a reference to a lesion, tumour or other encapsulated or unencapsulated mass or other form of growth which comprises neoplastic cells. A “neoplastic cell” should be understood as a reference to a cell exhibiting abnormal growth. The term “growth” should be understood in its broadest sense and includes reference to proliferation. In this regard, an example of abnormal cell growth is the uncontrolled proliferation of a cell. Another example is failed apoptosis in a cell, thus prolonging its usual life span. The neoplastic cell may be a benign cell or a malignant cell. In a preferred embodiment, the subject neoplasm is an adenoma or an adenocarcinoma. Without limiting the present invention to any one theory or mode of action, an adenoma is generally a benign tumour of epithelial origin which is either derived from epithelial tissue or exhibits clearly defined epithelial structures. These structures may take on a glandular appearance. It can comprise a malignant cell population within the adenoma, such as occurs with the progression of a benign adenoma to a malignant adenocarcinoma.
Preferably, said neoplastic cell is an adenoma or adenocarcinoma and even more preferably a colorectal adenoma or adenocarcinoma.
Each of the genes and transcripts detailed in sub-paragraphs (i) and (ii), above, would be well known to the person of skill in the art, as would their encoded proteins. The identification of the expression products of these genes and transcripts as markers of neoplasia occurred by virtue of differential expression analysis using Affymetrix HGU133A or HGU133B gene chips. To this end, each gene chip is characterised by approximately 45,000 probe sets which detect the RNA transcribed from the genome. On average, approximately 11 probe pairs detect overlapping or consecutive regions of the RNA transcript. In general, the genes from which the RNA transcripts described herein are identified by the Affymetrix probesets are well known and characterised genes. However, to the extent that some of the probesets detect RNA transcripts which are not yet defined, these transcripts are indicated as “the gene, genes or transcripts detected by Affymetrix probe x”. In some cases a number of genes and/or transcripts may be detectable by a single probeset. It should be understood, however, that this is not intended as a limitation as to how the expression level of the subject gene or transcript can be detected. In the first instance, it would be understood that the subject gene transcript is also detectable by other probesets which would be present on the Affymetrix gene chip. The reference to a single probesets is merely included as an identifier of the gene transcript of interest. In terms of actually screening for the transcript, however, one may utilise a probe or probeset directed to any region of the transcript and not just to the 3′ terminal 600 bp transcript region to which the Affymetrix probesets are often directed.
Reference to each of the genes and transcripts detailed above and their transcribed and translated expression products should therefore be understood as a reference to all forms of these molecules and to fragments or variants thereof. As would be appreciated by the person of skill in the art, some genes are known to exhibit allelic variation between individuals. Accordingly, the present invention should be understood to extend to such variants which, in terms of the present diagnostic applications, achieve the same outcome despite the fact that minor genetic variants between the actual nucleic acid sequences may exist between individuals or that within one individual there may exist two or more splice variants of one subject gene. The present invention should therefore be understood to extend to all forms of RNA (eg mRNA, primary RNA transcript, miRNA, etc), cDNA and peptide isoforms which arise from alternative splicing or any other mutation, polymorphic or allelic variation. It should also be understood to include reference to any subunit polypeptides such as precursor forms which may be generated, whether existing as a monomer, multimer, fusion protein or other complex.
For example, in one embodiment of the invention, the subject gene is CDH3. Analysis of the AceView Database reveals that there exist 12 CDH3 alternative mRNA transcripts. Nine are generated by alternative splicing while three are unspliced forms. In terms of the genes encompassed by the present invention, means for determining the existence of such variants and characterising same, are described in Example 6. To the extent that the genes of the present invention are described by reference to an Affymetrix probeset, Table 9 provides details of the nucleic acid sequence to which each probeset is directed. Based on this information, the skilled person could, as a matter of routine procedure, identify the gene in respect of which that sequence forms part. A typical protocol for doing this is also outlined in Example 6.
It should be understood that the “individual” who is the subject of testing may be any human or non-human mammal. Examples of non-human mammals includes primates, livestock animals (e.g. horses, cattle, sheep, pigs, donkeys), laboratory test animals (e.g. mice, rats, rabbits, guinea pigs), companion animals (e.g. dogs, cats) and captive wild animals (e.g. deer, foxes). Preferably the mammal is a human.
The method of the present invention is predicated on the comparison of the level of the neoplastic markers of a biological sample with the control levels of these markers. The “control level” may be either a “normal level”, which is the level of marker expressed by a corresponding large intestine cell or cellular population which is not neoplastic, or the background level which is detectable in a negative control sample.
The normal (or “non-neoplastic”) level may be determined using tissues derived from the same individual who is the subject of testing. However, it would be appreciated that this may be quite invasive for the individual concerned and it is therefore likely to be more convenient to analyse the test results relative to a standard result which reflects individual or collective results obtained from individuals other than the patient in issue. This latter form of analysis is in fact the preferred method of analysis since it enables the design of kits which require the collection and analysis of a single biological sample, being a test sample of interest. The standard results which provide the normal level may be calculated by any suitable means which would be well known to the person of skill in the art. For example, a population of normal tissues can be assessed in terms of the level of the neoplastic markers of the present invention, thereby providing a standard value or range of values against which all future test samples are analysed. It should also be understood that the normal level may be determined from the subjects of a specific cohort and for use with respect to test samples derived from that cohort. Accordingly, there may be determined a number of standard values or ranges which correspond to cohorts which differ in respect of characteristics such as age, gender, ethnicity or health status. Said “normal level” may be a discrete level or a range of levels. An increase in the expression level of the subject genes relative to normal levels is indicative of the tissue being neoplastic.
Without limiting the present invention to any one theory or mode of action, although each of the genes hereinbefore described is differentially expressed, either singly or in combination, as between neoplastic versus non-neoplastic cells of the large intestine, and is therefore diagnostic of the existence of a large intestine neoplasm, the expression of some of these genes was found to exhibit particularly significant levels of sensitivity, specificity and positive and negative predictive value. Accordingly, in a preferred embodiment one would screen for and assess the expression level of one or more of these genes. To this end, and without limiting the present invention to any one theory or mode of action, the following markers were determined to be expressed in neoplastic tissue at a level of 3, 4, 5 or 7 fold greater than non-neoplastic tissue when assessed by virtue of the method exemplified herein.
There is therefore more particularly provided a method of screening for the onset or predisposition to the onset of a large intestine neoplasm in an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
Preferably, said control level is a non-neoplastic level.
In another embodiment, there is provided a method of screening for the onset or predisposition to the onset of a large intestine neoplasm in an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
Preferably, said control level is a non-neoplastic level.
In yet another embodiment there is provided a method of screening for the onset or predisposition to the onset of a large intestine neoplasm in an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
Preferably, said control level is a non-neoplastic level.
In still yet another preferred embodiment, there is provided a method of screening for the onset or predisposition to the onset of a large intestine neoplasm in an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or group (ii) relative to control levels is indicative of a neoplastic large intestine cell or a cell predisposed to the onset of a neoplastic state.
Preferably, said control level is a non-neoplastic level.
According to these aspects of the present invention, said large intestine tissue is preferably colorectal tissue.
The detection method of the present invention can be performed on any suitable biological sample. To this end, reference to a “biological sample” should be understood as a reference to any sample of biological material derived from an animal such as, but not limited to, cellular material, biofluids (eg. blood), faeces, tissue biopsy specimens, surgical specimens or fluid which has been introduced into the body of an animal and subsequently removed (such as, for example, the solution retrieved from an enema wash). The biological sample which is tested according to the method of the present invention may be tested directly or may require some form of treatment prior to testing. For example, a biopsy or surgical sample may require homogenisation prior to testing or it may require sectioning for in situ testing of the qualitative expression levels of individual genes. Alternatively, a cell sample may require permeabilisation prior to testing. Further, to the extent that the biological sample is not in liquid form, (if such form is required for testing) it may require the addition of a reagent, such as a buffer, to mobilise the sample.
To the extent that the neoplastic marker gene expression product is present in a biological sample, the biological sample may be directly tested or else all or some of the nucleic acid or protein material present in the biological sample may be isolated prior to testing. In yet another example, the sample may be partially purified or otherwise enriched prior to analysis. For example, to the extent that a biological sample comprises a very diverse cell population, it may be desirable to enrich for a sub-population of particular interest. It is within the scope of the present invention for the target cell population or molecules derived therefrom to be treated prior to testing, for example, inactivation of live virus or being run on a gel. It should also be understood that the biological sample may be freshly harvested or it may have been stored (for example by freezing) prior to testing or otherwise treated prior to testing (such as by undergoing culturing).
The choice of what type of sample is most suitable for testing in accordance with the method disclosed herein will be dependent on the nature of the situation. Preferably, said sample is a faecal (stool) sample, enema wash, surgical resection, tissue biopsy or blood sample.
In a related aspect, it has been determined that certain of the markers hereinbefore defined are more indicative of adenoma development versus cancer development or vice versa. This is an extremely valuable finding since it enables one to more specifically characterise the likely nature of a neoplasm which is detected by virtue of the method of the present invention.
Accordingly, in a related aspect the present invention is directed to a method of screening for the onset or predisposition to the onset of a large intestine neoplasm in an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or group (ii) relative to control levels is indicative of an adenoma cell or a cell predisposed to the onset of an adenoma state.
In another preferred embodiment of this aspect of the present invention there is provided a method of screening for the onset or predisposition to the onset of a large intestine neoplasm in an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or group (ii) relative to control levels is indicative of a cancer cell or a cell predisposed to the onset of a cancerous state.
According to these aspects, said control levels are preferably non-neoplastic levels and said large intestine tissue is colorectal tissue. Even more preferably, said biological sample is a stool sample or blood sample.
In a related aspect, it has been determined that a subpopulation of the markers of the present invention are not only expressed at levels higher than normal levels, their expression pattern is uniquely characterised by the fact that expression levels above that of background control levels are not detectable in non-neoplastic tissue. This determination has therefore enabled the development of qualitative screening systems which are simply designed to detect marker expression relative to a control background level. In accordance with this aspect of the present invention, said “control level” is therefore the “background level”. Preferably, said background level is of the chosen testing methodology.
According to this aspect, there is therefore provided a method of screening for the onset or predisposition to the onset of a large intestine neoplasm in an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or group (ii) relative to background levels is indicative of a neoplastic cell or a cell predisposed to the onset of a neoplastic state.
In a most preferred embodiment, said genes or transcripts are selected from:
Preferably, said neoplasm is an adenoma or an adenocarcinoma and said gastrointestinal tissue is colorectal tissue.
In yet another embodiment, it has been determined that a further subpopulation of these markers are more characteristic of adenoma development, while others are more characteristic of cancer development. Accordingly, there is provided a convenient means of qualitatively obtaining indicative information in relation to the characteristics of the subject neoplasm.
According to this embodiment there is provided a method of screening for the onset or predisposition to the onset of a large intestine neoplasm in an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or group (ii) relative to background levels is indicative of an adenoma cell or a cell predisposed to the onset of an adenoma state.
In yet still another preferred embodiment there is provided a method of screening for the onset or predisposition to the onset of a large intestine neoplasm in an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or group (ii) relative to background levels is indicative of a cancer cell or a cell predisposed to the onset of a cancerous state.
Preferably, said large intestine tissue is colorectal tissue.
More preferably, said biological sample is a blood sample or stool sample.
As detailed hereinbefore, the present invention is designed to screen for a neoplastic cell or cellular population, which is located in the large intestine. Accordingly, reference to “cell or cellular population” should be understood as a reference to an individual cell or a group of cells. Said group of cells may be a diffuse population of cells, a cell suspension, an encapsulated population of cells or a population of cells which take the form of tissue.
Reference to “expression” should be understood as a reference to the transcription and/or translation of a nucleic acid molecule. In this regard, the present invention is exemplified with respect to screening for neoplastic marker expression products taking the form of RNA transcripts (eg primary RNA or mRNA). Reference to “RNA” should be understood to encompass reference to any form of RNA, such as primary RNA or mRNA. Without limiting the present invention in any way, the modulation of gene transcription leading to increased or decreased RNA synthesis will also correlate with the translation of some of these RNA transcripts (such as mRNA) to produce a protein product. Accordingly, the present invention also extends to detection methodology which is directed to screening for modulated levels or patterns of the neoplastic marker protein products as an indicator of the neoplastic state of a cell or cellular population. Although one method is to screen for mRNA transcripts and/or the corresponding protein product, it should be understood that the present invention is not limited in this regard and extends to screening for any other form of neoplastic marker expression product such as, for example, a primary RNA transcript. It is well within the skill of the person of skill in the art to determine the most appropriate screening target for any given situation. To this end, the genes which are known to encode an expression product which is either secreted by the cell or membrane bound is detailed in the table, below. It would be appreciated that screening for neoplastic markers which are secreted or membrane bound may provide particular advantages in terms of the design of a diagnostic screening product.
Reference to “nucleic acid molecule” should be understood as a reference to both deoxyribonucleic acid molecules and ribonucleic acid molecules and fragments thereof. The present invention therefore extends to both directly screening for mRNA levels in a biological sample or screening for the complementary cDNA which has been reverse-transcribed from an mRNA population of interest. It is well within the skill of the person of skill in the art to design methodology directed to screening for either DNA or RNA. As detailed above, the method of the present invention also extends to screening for the protein product translated from the subject mRNA.
Preferably, the level of gene expression is measured by reference to genes which encode a protein product and, more particularly, said level of expression is measured at the protein level. Accordingly, to the extent that the present invention is directed to screening for markers which are detailed in the preceding table, said screening is preferably directed to the encoded protein.
As detailed hereinbefore, it should be understood that although the present invention is exemplified with respect to the detection of expressed nucleic acid molecules (e.g. mRNA), it also encompasses methods of detection based on screening for the protein product of the subject genes. The present invention should also be understood to encompass methods of detection based on identifying both proteins and/or nucleic acid molecules in one or more biological samples. This may be of particular significance to the extent that some of the neoplastic markers of interest may correspond to genes or gene fragments which do not encode a protein product. Accordingly, to the extent that this occurs it would not be possible to test for a protein and the subject marker would have to be assessed on the basis of transcription expression profiles.
In terms of screening for the upregulation of expression of a marker it would also be well known to the person of skill in the art that changes which are detectable at the DNA level are indicative of changes to gene expression activity and therefore changes to expression product levels. Such changes include but are not limited to, changes to DNA methylation. Accordingly, reference herein to “screening the level of expression” and comparison of these “levels of expression” to control “levels of expression” should be understood as a reference to assessing DNA factors which are related to transcription, such as gene/DNA methylation patterns.
The term “protein” should be understood to encompass peptides, polypeptides and proteins (including protein fragments). The protein may be glycosylated or unglycosylated and/or may contain a range of other molecules fused, linked, bound or otherwise associated to the protein such as amino acids, lipids, carbohydrates or other peptides, polypeptides or proteins. Reference herein to a “protein” includes a protein comprising a sequence of amino acids as well as a protein associated with other molecules such as amino acids, lipids, carbohydrates or other peptides, polypeptides or proteins.
The proteins encoded by the neoplastic markers of the present invention may be in multimeric form meaning that two or more molecules are associated together. Where the same protein molecules are associated together, the complex is a homomultimer. An example of a homomultimer is a homodimer. Where at least one marker protein is associated with at least one non-marker protein, then the complex is a heteromultimer such as a heterodimer.
Reference to a “fragment” should be understood as a reference to a portion of the subject nucleic acid molecule or protein. This is particularly relevant with respect to screening for modulated RNA levels in stool samples since the subject RNA is likely to have been degraded or otherwise fragmented due to the environment of the gut. One may therefore actually be detecting fragments of the subject RNA molecule, which fragments are identified by virtue of the use of a suitably specific probe.
Reference to the “onset” of a neoplasm, such as adenoma or adenocarcinoma, should be understood as a reference to one or more cells of that individual exhibiting dysplasia. In this regard, the adenoma or adenocarcinoma may be well developed in that a mass of dysplastic cells has developed. Alternatively, the adenoma or adenocarcinoma may be at a very early stage in that only relatively few abnormal cell divisions have occurred at the time of diagnosis. The present invention also extends to the assessment of an individual's predisposition to the development of a neoplasm, such as an adenoma or adenocarcinoma. Without limiting the present invention in any way, changed levels of the neoplastic markers may be indicative of that individual's predisposition to developing a neoplasia, such as the future development of an adenoma or adenocarcinoma or another adenoma or adenocarcinoma.
In yet another related aspect of the present invention, markers have been identified which enable the characterisation of neoplastic tissue of the large intestine in terms of whether it is an adenoma or a cancer. This development now provides a simple yet accurate means of characterising tissue using means other than the traditional methods which are currently utilised.
According to this aspect of the present invention, there is provided a method of characterising a neoplastic cell or cellular population, which cell or cellular population is derived from the large intestine of an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in said cell or cellular population wherein a higher level of expression of the genes or transcripts of group (i) and/or group (ii) relative to a gastrointestinal cancer cell level is indicative of an adenoma cell or a cell predisposed to the onset of an adenoma state.
In another aspect there is provided a method of characterising a neoplastic cell or cellular population, which cell or cellular population is derived from the large intestine of an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or group (ii) relative to a gastrointestinal adenoma cell level is indicative of a cancer or a cell predisposed to the onset of a cancerous state.
Preferably, said gastrointestinal tissue is colorectal tissue.
Reference to an “adenoma control level” or “cancer control level” should be understood as a reference to the level of said gene expression in a population of adenoma or cancer gastrointestinal cells, respectively. As discussed hereinbefore in relation to “normal levels”, the subject level may be a discrete level or a range of levels. Accordingly, the definition of “adenoma control level” or “cancer control level” should be understood to have a corresponding definition to “normal level”, albeit in the context of the expression of genes by a neoplastic population of large intestine cells.
In terms of this aspect of the present invention, the subject analysis is performed on a population of neoplastic cells. These cells may be derived in any manner, such as sloughed of neoplastic cells which have been collected via an enema wash or from a gastrointestinal sample, such as a stool sample. Alternatively, the subject cells may have been obtained via a biopsy or other surgical technique.
Without limiting this aspect of the invention in any way, several of the markers of this aspect of the present invention have been determined to be expressed at particularly significant levels above those of neoplastic cells. For example, increased expression levels of 3- and 5-fold have been observed in respect of the following markers, when assessed by the method exemplified herein, which are indicative of gastrointestinal adenomas.
In another example, increased expression levels of between 3- and 9-fold have been observed in respect of the following markers which are indicative of gastrointestinal cancers, when assessed by the method herein exemplified:
According to this embodiment, there is therefore provided a method of characterising a neoplastic cell or cellular population, which cell or cellular population is derived from the large intestine of an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
In another embodiment, there is provided a method of characterising a neoplastic cell or cellular population, which cell or cellular population is derived from the large intestine of an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or group (ii) relative to a gastrointestinal cancer control level is indicative of an adenoma cell or a cell predisposed to the onset of an adenoma state.
Preferably, said gastrointestinal tissue is colorectal tissue.
Still more preferably, said biological sample is a tissue sample.
In another preferred embodiment the present invention is directed to a method of characterising a cell or cellular population, which cell or cellular population is derived from the large intestine of an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or (ii) relative to a gastrointestinal adenoma control level is indicative of a cancer or a cell predisposed to the onset of a cancerous state.
In yet another preferred embodiment the present invention is directed to a method of characterising a cell or cellular population, which cell or cellular population is derived from the large intestine of an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or (ii) relative to a gastrointestinal adenoma control level is indicative of a cancer or a cell predisposed to the onset of a cancerous state.
In still yet another preferred embodiment the present invention is directed to a method of characterising a cell or cellular population, which cell or cellular population is derived from the large intestine of an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or (ii) relative to a gastrointestinal adenoma control level is indicative of a cancer or a cell predisposed to the onset of a cancerous state.
Preferably, said gastrointestinal tissue is colorectal tissue.
Even more preferably, said biological sample is a tissue sample.
In still another related aspect it has been determined that a subset of the markers of this aspect of the present invention are useful as qualitative markers of neoplastic tissue characterisation in that these markers, if detectable above background levels in neoplastic tissue are indicative of either adenoma or cancerous tissue.
According to this aspect, the present invention provides a method of characterising a neoplastic cell or cellular population, which cell or cellular population is derived from the large intestine of an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or (ii) relative to neoplastic tissue background levels is indicative of an adenoma cell or a cell predisposed to the onset of an adenoma state.
In another aspect the present invention provides a method of characterising a neoplastic cell or cellular population, which cell or cellular population is derived from the large intestine of an individual, said method comprising measuring the level of expression of one or more genes or transcripts selected from:
in a biological sample from said individual wherein a higher level of expression of the genes or transcripts of group (i) and/or (ii) relative to neoplastic tissue background neoplastic cell levels is indicative of a cancer or a cell predisposed to the onset of a cancerous state.
Preferably, said gastrointestinal tissue is colorectal tissue.
Still more preferably, said biological sample is a tissue sample.
In a most preferred embodiment, the methods of the present invention are preferably directed to screening for proteins encoded by the markers of the present invention.
Although the preferred method is to detect the expression products of the neoplastic markers for the purpose of diagnosing neoplasia development or predisposition thereto, the detection of converse changes in the levels of said markers may be desired under certain circumstances, for example, to monitor the effectiveness of therapeutic or prophylactic treatment directed to modulating a neoplastic condition, such as adenoma or adenocarcinoma development. For example, where elevated levels of the subject markers indicate that an individual has developed a condition characterised by adenoma or adenocarcinoma development, for example, screening for a decrease in the levels of these markers subsequently to the onset of a therapeutic regime may be utilised to indicate reversal or other form of improvement of the subject individual's condition.
The method of the present invention is therefore useful as a one-time test or as an on-going monitor of those individuals thought to be at risk of neoplasia development or as a monitor of the effectiveness of therapeutic or prophylactic treatment regimes directed to inhibiting or otherwise slowing neoplasia development. In these situations, mapping the modulation of neoplastic marker expression levels in any one or more classes of biological samples is a valuable indicator of the status of an individual or the effectiveness of a therapeutic or prophylactic regime which is currently in use. Accordingly, the method of the present invention should be understood to extend to monitoring for increases or decreases in marker expression levels in an individual relative to their normal level (as hereinbefore defined), background control levels, cancer levels, adenoma levels or relative to one or more earlier marker expression levels determined from a biological sample of said individual.
Means of testing for the subject expressed neoplasm markers in a biological sample can be achieved by any suitable method, which would be well known to the person of skill in the art, such as but not limited to:
(i) In vivo detection.
(ii) Detection of up-regulation of RNA expression in the cells by Fluorescent In Situ Hybridization (FISH), or in extracts from the cells by technologies such as Quantitative Reverse Transcriptase Polymerase Chain Reaction (QRTPCR) or Flow cytometric qualification of competitive RT-PCR products (Wedemeyer et al., Clinical Chemistry 48:9 1398-1405, 2002).
(iii) Assessment of expression profiles of RNA, for example by array technologies (Alon et al., Proc. Natl. Acad. Sci. USA: 96, 6745-6750, June 1999).
(iv) Measurement of altered neoplastic marker protein levels in cell extracts, for example by immunoassay.
(v) Without limiting the present invention to any one theory or mode of action, during development gene expression is regulated by processes that alter the availability of genes for expression in different cell lineages without any alteration in gene sequence, and these states can be inherited through a cell division—a process called epigenetic inheritance. Epigenetic inheritance is determined by a combination of DNA methylation (modification of cytosine to give 5-methyl cytosine, 5meC) and by modifications of the histone chromosomal proteins that package DNA. Thus methylation of DNA at CpG sites and modifications such as deacetylation of histone H3 on lysine 9, and methylation on lysine 9 or 27 are associated with inactive chromatin, while the converse state of a lack of DNA methylation, acetylation of lysine 9 of histone H3 is associated with open chromatin and active gene expression. In cancer, this epigenetic regulation of gene expression is frequently found to be disrupted (Esteller & Herman, 2000; Jones & Baylin, 2002). Genes such as tumour suppressor or metastasis suppressor genes are often found to be silenced by DNA methylation, while other genes may be hypomethylated and inappropriately expressed. Thus, among genes that elevated or inappropriate expression in cancer, this in some instances is characterised by a loss of methylation of the promoter or regulatory region of the gene.
(vi) Determining altered expression of protein neoplastic markers on the cell surface, for example by immunohistochemistry.
(vii) Determining altered protein expression based on any suitable functional test, enzymatic test or immunological test in addition to those detailed in points (iv) and (v) above.
A person of ordinary skill in the art could determine, as a matter of routine procedure, the appropriateness of applying a given method to a particular type of biological sample.
Without limiting the present invention in any way, and as detailed above, gene expression levels can be measured by a variety of methods known in the art. For example, gene transcription or translation products can be measured. Gene transcription products, i.e., RNA, can be measured, for example, by hybridization assays, run-off assays, Northern blots, or other methods known in the art.
Hybridization assays generally involve the use of oligonucleotide probes that hybridize to the single-stranded RNA transcription products. Thus, the oligonucleotide probes are complementary to the transcribed RNA expression product. Typically, a sequence-specific probe can be directed to hybridize to RNA or cDNA. A “nucleic acid probe”, as used herein, can be a DNA probe or an RNA probe that hybridizes to a complementary sequence. One of skill in the art would know how to design such a probe such that sequence specific hybridization will occur. One of skill in the art will further know how to quantify the amount of sequence specific hybridization as a measure of the amount of gene expression for the gene was transcribed to produce the specific RNA.
The hybridization sample is maintained under conditions that are sufficient to allow specific hybridization of the nucleic acid probe to a specific gene expression product. “Specific hybridization”, as used herein, indicates near exact hybridization (e.g., with few if any mismatches). Specific hybridization can be performed under high stringency conditions or moderate stringency conditions. In one embodiment, the hybridization conditions for specific hybridization are high stringency. For example, certain high stringency conditions can be used to distinguish perfectly complementary nucleic acids from those of less complementarity. “High stringency conditions”, “moderate stringency conditions” and “low stringency conditions” for nucleic acid hybridizations are explained on pages 2.10.1-2.10.16 and pages 6.3.1-6.3.6 in Current Protocols in Molecular Biology (Ausubel, F. et al., “Current Protocols in Molecular Biology”, John Wiley & Sons, (1998), the entire teachings of which are incorporated by reference herein). The exact conditions that determine the stringency of hybridization depend not only on ionic strength (e.g., 0.2.times.SSC, 0.1.times.SSC), temperature (e.g., room temperature, 42° C., 68° C.) and the concentration of destabilizing agents such as formamide or denaturing agents such as SDS, but also on factors such as the length of the nucleic acid sequence, base composition, percent mismatch between hybridizing sequences and the frequency of occurrence of subsets of that sequence within other non-identical sequences. Thus, equivalent conditions can be determined by varying one or more of these parameters while maintaining a similar degree of identity or similarity between the two nucleic acid molecules. Typically, conditions are used such that sequences at least about 60%, at least about 70%, at least about 80%, at least about 90% or at least about 95% or more identical to each other remain hybridized to one another. By varying hybridization conditions from a level of stringency at which no hybridization occurs to a level at which hybridization is first observed, conditions that will allow a given sequence to hybridize (e.g., selectively) with the most complementary sequences in the sample can be determined.
Exemplary conditions that describe the determination of wash conditions for moderate or low stringency conditions are described in Kraus, M. and Aaronson, S., 1991. Methods Enzymol., 200:546-556; and in, Ausubel et al., Current Protocols in Molecular Biology, John Wiley & Sons, (1998). Washing is the step in which conditions are usually set so as to determine a minimum level of complementarity of the hybrids. Generally, starting from the lowest temperature at which only homologous hybridization occurs, each ° C. by which the final wash temperature is reduced (holding SSC concentration constant) allows an increase by 1% in the maximum mismatch percentage among the sequences that hybridize. Generally, doubling the concentration of SSC results in an increase in Tm of about 17° C. Using these guidelines, the wash temperature can be determined empirically for high, moderate or low stringency, depending on the level of mismatch sought. For example, a low stringency wash can comprise washing in a solution containing 0.2.times.SSC/0.1% SDS for 10 minutes at room temperature; a moderate stringency wash can comprise washing in a pre-warmed solution (42° C.) solution containing 0.2.times.SSC/0.1% SDS for 15 minutes at 42° C.; and a high stringency wash can comprise washing in pre-warmed (68° C.) solution containing 0.1.times.SSC/0.1% SDS for 15 minutes at 68° C. Furthermore, washes can be performed repeatedly or sequentially to obtain a desired result as known in the art. Equivalent conditions can be determined by varying one or more of the parameters given as an example, as known in the art, while maintaining a similar degree of complementarity between the target nucleic acid molecule and the primer or probe used (e.g., the sequence to be hybridized).
A related aspect of the present invention provides a molecular array, which array comprises a plurality of:
Preferably, said percent identity is at least 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99%.
Low stringency includes and encompasses from at least about 1% v/v to at least about 15% v/v formamide and from at least about 1M to at least about 2M salt for hybridisation, and at least about 1M to at least about 2M salt for washing conditions. Alternative stringency conditions may be applied where necessary, such as medium stringency, which includes and encompasses from at least about 16% v/v at least about 30% v/v formamide and from at least about 0.5M to at least about 0.9M salt for hybridisation, and at least about 0.5M to at least about 0.9M salt for washing conditions, or high stringency, which includes and encompasses from at least about 31% v/v to at least about 50% v/v formamide and from at least about 0.01M to at least about 0.15M salt for hybridisation, and at least about 0.01M to at least about 0.15M salt for washing conditions. In general, washing is carried out at Tm=69.3 +0.41 (G+C) % [19]=−12° C. However, the Tm of a duplex DNA decreases by 1° C. with every increase of 1% in the number of mismatched based pairs (Bonner et al (1973) J. Mol. Biol. 81:123).
Preferably, the subject probes are designed to bind to the nucleic acid or protein to which they are directed with a level of specificity which minimises the incidence of non-specific reactivity. However, it would be appreciated that it may not be possible to eliminate all potential cross-reactivity or non-specific reactivity, this being an inherent limitation of any probe based system.
In terms of the probes which are used to detect the subject proteins, they may take any suitable form including antibodies and aptamers.
A library or array of nucleic acid or protein probes provides rich and highly valuable information. Further, two or more arrays or profiles (information obtained from use of an array) of such sequences are useful tools for comparing a test set of results with a reference, such as another sample or stored calibrator. In using an array, individual probes typically are immobilized at separate locations and allowed to react for binding reactions. Primers associated with assembled sets of markers are useful for either preparing libraries of sequences or directly detecting markers from other biological samples.
A library (or array, when referring to physically separated nucleic acids corresponding to at least some sequences in a library) of gene markers exhibits highly desirable properties. These properties are associated with specific conditions, and may be characterized as regulatory profiles. A profile, as termed here refers to a set of members that provides diagnostic information of the tissue from which the markers were originally derived. A profile in many instances comprises a series of spots on an array made from deposited sequences.
A characteristic patient profile is generally prepared by use of an array. An array profile may be compared with one or more other array profiles or other reference profiles. The comparative results can provide rich information pertaining to disease states, developmental state, receptiveness to therapy and other information about the patient.
Another aspect of the present invention provides a diagnostic kit for assaying biological samples comprising an agent for detecting one or more neoplastic markers and reagents useful for facilitating the detection by said agent. Further means may also be included, for example, to receive a biological sample. The agent may be any suitable detecting molecule.
The present invention is further described by the following non-limiting examples:
Methods and Materials
Affymetrix GeneChip Data
Gene expression profiling data and accompanying clinical data was purchased from GeneLogic Inc (Gaithersburg, Md. USA). For each tissue analyzed, oligonucleotide microarray data for 44,928 probesets (Affymetrix HGU133A & HGU133B, combined), experimental and clinical descriptors, and digitally archived microscopy images of histological preparations were received. A quality control analysis was performed to remove arrays not meeting essential quality control measures as defined by the manufacturer.
Transcript expression levels were calculated by both Microarray Suite (MAS) 5.0 (Affymetrix) and the Robust Multichip Average (RMA) normalization techniques (Affymetrix. GeneChip expression data analysis fundamentals. Affymetrix, Santa Clara, Calif. USA, 2001; Hubbell et al. Bioinformatics, 18:1585-1592, 2002; Irizarry et al. Nucleic Acid Research, 31, 2003)MAS normalized data was used for performing standard quality control routines and the final data set was normalized with RMA for all subsequent analyses.
Univariate Differential Expression
Differentially expressed gene transcripts were identified using a moderated t-test implemented in the limma library downloaded from the Bioconductor repository for R. (G. K. Smyth. Statistical Applications in Genetics and Molecular Biology, 3(1):Article 3, 2004; G K Smyth. Bioinformatics and Computational Biology Solutions using R and Bioconductor. Springer, N.Y., 2005). Significance estimates (p-values) were corrected to adjust for multiple hypothesis testing using the Bonferonni correction.
Tissue Specific Expression Patterns
To construct a filter for hypothetically ‘turned on’ gene expression the mean expression level for all 44,928 probesets across the full range of 454 tissues was first estimated. To estimate an expression on/off threshold, the 44,928 mean values were ranked and the expression value equivalent to the 30th percentile across the dataset calculated. This arbitrary threshold was chosen because it was theorized that the majority of transcripts (and presumably more than 30%) in a given specimen should be transcriptionally silenced. Thus this threshold represents a conservative upper bound for what is estimated as non-specific, or background, signal.
Gene Symbol Annotations
To map Affymetrix probeset names to official gene symbols the annotation metadata available from Bioconductor was used. hgu133plus2 library version 1.16.0, which was assembled using Entrez Gene data downloaded on 15 Mar. 2007, was used.
Estimates of Performance Characteristics
Diagnostic utility for each table of markers shown herein was estimated including: sensitivity, specificity, positive predictive value, negative predictive value, likelihood ratio positive, likelihood ratio negative. These estimates were calculated in the same data used to discover the markers and will therefore potentially overestimate the performance characteristics in future tissue samples. To improve the generalisabilty of the estimates a modified jackknife resampling technique was used to calculate a less biased value for each characteristic.
Results
A range of univariate statistical tests were applied on Affymetrix oligonucleotide microarray data to reveal human genes that could be used to discriminate colorectal neoplastic tissues from non-neoplastic tissues. There were further identified a number of gene transcripts that appear to be useful for differentiating colorectal adenomas from colorectal carcinoma. Also identified were a subset of these transcripts that may have particular diagnostic utility because due to the protein products being either secreted or displayed on the cell surface of epithelial cells. Finally, there were identified a further subset of transcripts expressed specifically in neoplastic tissues and at low- or near-background levels in non-neoplastic tissues.
Genes Differentially Expressed in Neoplastic Tissues
From a total GeneChip set of 44,928 probesets it was determined that over 11,000 probesets were differentially expressed by moderated t-test using the limma package in BioConductor (G. K. Smyth, 2004 supra) employing conservative (Bonferroni) multiple test correction. When this list was further filtered to include only those probesets demonstrating a 2-fold or greater mean expression change between the neoplastic and non-neoplastic tissues, 206 probesets were found to be expressed higher in neoplasias relative to normals.
These 205 probesets were annotated using the most recent metadata and annotation packages available for the chips. The 205 overexpressed probesets were mapped to 174 gene symbols.
Hypothetical Markers Specific for Colorectal Neoplasia
While differential gene expression patterns are useful for diagnostic purposes, this project also seeks to identify diagnostic proteins shed into the lumen of the gut by neoplastic colorectal epithelia. To discover candidate proteins the list of differentially expressed transcripts were filtered with a selection criteria aimed at identifying markers specifically expressed in colorectal neoplasia tissues. This filter criteria is based on a theoretical assumption that most genes on the GeneChip will be turned ‘off’ and that any microarray signals for such ‘off’ transcripts will reflect technical assay background and non-specific oligonucleotide binding. Accordingly, to select genes specifically expressed in neoplastic tumours (i.e. ‘on’) the non-neoplastic signals were compared with a hypothetical background signal threshold from across all genes on the chip. By design, all transcripts in the candidate pool from which the ‘on’ transcripts are chosen are at least two fold overexpressed in the diseased tissues. Combined, it is hypothesized that these criteria yield the subset of differentially expressed genes that are specifically expressed in neoplasia. The expression profile for a representative ‘on’ transcript is shown in
Genes Differentially Expressed Between Adenomas and Cancer Tissues
There were 33 transcripts observed that were differentially expressed at least two-fold higher in adenoma tissues relative to cancer tissues. In particular, there were identified several transcripts that exhibit an expression pattern specific for adenomas, including SLITRK6 and L1TD1, shown in
Further, there were also identified cancer specific transcripts. The expression profile of one such transcript, COL11A1 is shown in
Differential expression analysis was applied to Affymetrix gene chip data measuring RNA concentration in 454 colorectal tissues including 161 adenocarcinoma specimens, 29 adenoma specimens, 42 colitis specimens and 222 non-diseased tissues. Using conservative corrections for multiple hypothesis testing, it was determined that over 25% of the 44,928 probesets measured in each tissue experiment were differentially expressed between the 190 neoplasia specimens and 264 non-neoplasia controls. To identify robust biomarkers for colorectal neoplasia the list of putative probeset biomarkers were further filtered to include only those probesets shown to be expressed at least 2-fold higher in neoplastic vs. non-neoplastic tissues.
205 probesets hybridising to approximately 157 putative genes were observed to be expressed at a statistically significant higher level in neoplastic tissues relative to non-neoplastic controls.
Validation/Hypothesis Testing
To validate these discovery results the hybridisation of 199 candidate probesets were measured against RNA extracts from 68 clinical specimens comprising 19 adenomas, 19 adenocarcinomas, and 30 non-diseased controls using a custom-designed ‘Adenoma Gene Chip’. Six (6) probesets were not tested as they were not included on the custom design. It was confirmed that 186/199 (88%) of the target probesets or probesets which also hybridise to the target locus were likewise differentially expressed (P<0.05) in these independently-derived tissues. The results of testing these probesets in 68 independently collected clinical specimens is shown in Table 1.
We further tested the 142 of the 157 unique gene loci to which the 205 probesets are understood to hybridise. We note the remaining 15 gene symbols were not represented in the validation data. We observed that 133 of 142 gene symbols were represented in the validation data by at least one differentially expressed probeset and many symbols included multiple probesets against regions across the putative locus. A complete list of probesets that bind to target loci is shown in Table 2.
Conclusion
The candidate probesets shown in Tables 1 and 2 are differentially expressed in neoplastic colorectal tissues compared to non-neoplastic controls.
During analysis of the data, a novel expression profile was observed between neoplastic and non-neoplastic phenotypes. It was hypothesized that a subset of quantitatively differentially expressed probesets are furthermore qualitatively differentially expressed. Such probesets show evidence of a neoplasia-specific gene expression profile, i.e. these probesets appear to be expressed above background levels in neoplastic tissues only. This observation and the resulting hypothesis are based on two principles:
To generate a list of neoplasia specific probesets the non-neoplastic intensity of differentially expressed probesets was compared with a hypothetical background signal threshold from across all probesets on the chip. Bydesign, all probesets in the candidate pool from which the ‘on’ transcripts are chosen are at least two fold over-expressed in the diseased tissues. Combined, these criteria yield the subset of differentially expressed transcript species that are specifically expressed in neoplasia.
Validation/Hypothesis Testing
The custom gene chip design precludes testing the hypothetically neoplasia-specific probesets using the same principles as used for discovery. In particular, the custom gene chip (by design) does not contain a large pool of probesets anticipated to hybridise to hypothetically ‘off’/'non-transcribed' gene transcripts. This is because the custom gene chip design is biased toward differentially expressed transcripts in colorectal neoplastic tissues.
The usual differential expression testing (limma) was therefore to these candidate probesets for neoplasia-specific transcripts. Of the 33 probesets on the custom gene chip, 32 probesets (or probesets which bind to the same locus) were differentially expressed between the 38 neoplastic tissues (adenoma & cancer) and non-neoplastic controls. The results of these validation experiments is shown in Table 3.
All probesets which are known to hybridise to the gene loci to which the 33 probesets claimed herein were tested. Of the 32 putative gene loci targeted by the probesets, 29 were present in the validation data. Twenty-eight (28) of these 29 gene symbols demonstrated at least one hybridising probeset which was differentially expressed in the neoplastic tissues. Results for these experiments, including all probesets that bind to each target locus in a differentially expressed manner are shown in Table 4.
Differential expression analysis was applied to Affymetrix gene chip data measuring RNA concentration in neoplastic tissues including 161 adenocarcinoma specimens and 29 adenoma specimens. It was observed that 43 probesets hybridizing to approximately 33 putative gene symbols were expressed higher (P<0.05) in adenoma tissues relative to cancer tissues. Conversely, 145 probesets (104 gene symbols) were identified to be expressed higher in cancer relative to adenomas.
Validation/Hypothesis Testing 188 (43+145) of these probesets were then measured in a set of independent clinical specimens including 19 adenoma tissues and 19 cancer tissues. It was confirmed that 158 (30+128) of the target probesets (or probesets against the same gene locus) were likewise differentially expressed (P<0.05) in these independently-derived tissues. Probesets elevated in adenoma and cancers relative to each other are shown in Table 5 and Table 6 respectively.
It was further observed that 137 (33+104) gene loci are diagnostically useful for discriminating colorectal adenomas and cancers relative to each phenotype. The validation data included probesets designed to hybridise to 128 of these candidate gene symbols. It was observed that 21 of the 31 genes elevated in adenomas relative to cancers were likewise differentially expressed by at least one probeset. Of the 97 gene symbols elevated in cancer relative to adenoma it was confirmed that 89 gene symbols demonstrated at least one probeset in the validation data to be likewise differentially expressed. The validation testing of the adenoma and cancer elevated gene loci is shown in Table 7 and Table 8, respectively.
Conclusion
It was concluded that the candidate probesets shown in FIXME are differentially expressed between adenomatous and adenocarcinoma tissues and thus useful for distinguishing these tissues. Gene transcripts that hybridise to these probesets are thus diagnostically informative in a clinical setting to classify such neoplastic tissues.
Gene expression profiling data measured in 454 colorectal tissue specimens including neoplastic, normal and non-neoplastic disease controls was purchased from GeneLogic Inc (Gaithersburg, Md. USA). For each tissue specimen an Affymetrix (Santa Clara, Calif. USA) oligonucleotide microarray data totalling 44,928 probesets (HGU133A & HGU133B, combined), experimental and clinical descriptors, and digitally archived microscopy images of histological preparations was received. Prior to applying discovery methods to these data extensive quality control methods were carried out, including statistical exploration, review of clinical records for consistency and histopathology audit of a random sample of arrays. Microarrays that did not meet acceptable quality criteria were removed from the analysis.
Hypothesis Testing
Candidate transcription biomarkers were tested using a custom oligonucleotide microarray of 25-mer oligonucleotide probesets designed to hybridise to candidate RNA transcripts identified during discovery. Differential expression hypotheses were tested using RNA extracts derived from independently collected clinical samples comprising 30 normal colorectal tissues, 19 colorectal adenoma tissues, and 19 colorectal adenocarcinoma tissues. Each RNA extract was confirmed to meet strict quality control criteria.
Colorectal Tissue Specimens
All tissues used for hypothesis testing were obtained from a tertiary referral hospital tissue bank in metropolitan Adelaide, Australia (Repatriation General Hospital and Flinders Medical Centre). Access to the tissue bank for this research was approved by the Research and Ethics Committee of the Repatriation General Hospital and the Ethics Committee of Flinders Medical Centre. Informed patient consent was received for each tissue studied.
Following surgical resection, specimens were placed in a sterile receptacle and collected from theatre. The time from operative resection to collection from theatre was variable but not more than 30 minutes. Samples, approximately 125 mm3 (5×5×5 mm) in size, were taken from the macroscopically normal tissue as far from pathology as possible, defined both by colonic region as well as by distance either proximal or distal to the pathology. Tissues were placed in cryovials, then immediately immersed in liquid nitrogen and stored at −150C until processing.
RNA extraction
RNA extractions were performed using Trizol(R)reagent (Invitrogen, Carlsbad, Calif., USA) as per manufacturer's instructions. Each sample was homogenised in 300 μL of Trizol reagent using a modified Dremel drill and sterilised disposable pestles. An additional 200 μL of Trizol reagent was added to the homogenate and samples were incubated at RT for 10 minutes. 100 μL of chloroform was then added, samples were shaken vortexed for 15 seconds, and incubated at RT for 3 further minutes. The aqueous phase containing target RNA was obtained by centrifugation at 12,000 rpm for 15 min, 40 C. RNA was then precipitated by incubating samples at RT for 10 min with 250 μL of isopropanol. Purified RNA precipitate was collected by centrifugation at 12,000 rpm for 10 minutes, 40 C. and supernatants were discarded. Pellets were then washed with 1 mL 75% ethanol, followed by vortexing and centrifugation at 7,500g for 8 min, 40 C. Finally, pellets were air-dried for 5 min and resuspended in 80 μL of RNase free water. To improve subsequent solubility samples were incubated at 55° C. for 10 min. RNA was quantified by measuring the optical density at A260/280 nm. RNA quality was assessed by electrophoresis on a 1.2% agarose formaldehyde gel.
Gene Chip Processing
To test hypotheses related to biomarker candidates for colorectal neoplasia RNA extracts were assayed using a custom GeneChip designed in collaboration with Affymetrix (Santa Clara, Calif. USA). These custom GeneChips were processed using the standard Affymetrix protocol developed for the HU Gene ST 1.0 array described in (Affy:WTAssay).
Statistical software and Data Processing
The R statistics environment R and BioConductor libraries (BioConductor, www.bioconductor.org) (BIOC) were used for most analyses. To map probeset IDs to gene symbol on the Custom GeneChip, hgu133plus2 library version 2.2.0, which was assembled using Entrez Gene data downloaded on Apr 18 12:30:55 2008 (BIOC) was used.
Hypothesis Testing of Differentially Expressed Biomarkers
To assess differential expression between tissue classes, the Student's t test for equal means between two samples or the robust variant provided by the limma library (Smyth)(limma) was used. The impact of false discovery due to multiple hypothesis testing was mitigated by applying a Bonferroni adjustment to P values in the discovery process (MHT:Bonf). For hypotheses testing the slightly less conservative multiple hypothesis testing correction of Benjamini & Hochberg, which aims to control the false discovery rate of solutions(MHT:BH) was applied.
Discovery of Tissue-Specific Gene Expression Patterns
Discovery methods using gene expression data often yield numerous candidates, many of which are not suitable for commercial products because they involve subtle gene expression differences that would be difficult to detect in laboratory practice. Pepe et al. note that the ‘ideal’ biomarker is detectable in tumor tissue but not detectable (at all) in non-tumour tissue (Pepe:biomarker:development.) To bias toward candidates that meet this criterion, an analysis method was developed that aims to enrich the candidates for biomarkers whose qualitative absence or presence measurement is diagnostic for the phenotype of interest. This method attempts to select candidates that show a prototypical ‘turned-on’ or ‘turned-off’ pattern relative to an estimate of the background/noise expression across the chip. Such RNA transcripts are more likely to correlate with downstream translated proteins with diagnostic potential or to predict upstream genomic changes (e.g. methylation status) that can be used diagnostically. This focus on qualitative rather than quantitative outcomes may simplify the product development process for such biomarkers.
The method is based on the assumption that the pool of extracted RNA species in any given tissue (e.g. colorectal mucosae) will specifically bind to a relatively small subset of the full set of probesets on a GeneChip designed to measure the whole genome. On this assumption, it is estimated that most probesets on a full human gene chip will not exhibit specific, high-intensity signals.
This observation is utilised to approximate the background or ‘non-specific binding’ across the chip by choosing a theoretical level equal to the value of e.g. lowest 30% quantile of the ranked mean values. This quantile can be arbitrarily set to some level below which a reasonable assumption is made that the signals do not represent above-background RNA binding. Finally, this background estimate is used as a threshold to estimate the ‘OFF’ probesets in an experiment for, say, the non-neoplastic tissue specimens.
Conversely, probesets which are 1) expressed above this theoretical threshold level and 2) at differentially higher levels in the tumour specimens may be a tumour specific candidate biomarker. In this case the concept of ‘fold-change’ thresholds can also be conveniently applied to further emphasize the concept of absolute expression increases in a putatively ‘ON’ probeset.
Given the assumption of low background binding for a sizeable fraction of the measured probesets, this method was only used in the large GeneLogic data and discovery. To construct a filter for hypothetically ‘turned on’ biomarkers in the GeneLogic discovery data, the mean expression level for all 44,928 probesets was first estimated across the full range of 454 tissues. The 44,928 mean values were then ranked and the expression value equivalent to the 25th percentile across the dataset calculated. This arbitrary threshold was chosen because it was theorized that the majority of transcripts (and presumably more than 25%) in a given specimen should exhibit low concentration which effectively transcriptional silence. Thus this threshold represents a conservative upper bound for what is estimated as non-specific, or background, expression.
BLAST the sequence of interest using online available Basic Local Alignment Search Tools [BLAST]. e.g. NCBI/BLAST
Assessment of the Open BLAST Search Results
Multiple significant sequence alignments may be identified when “blasting” the sequence.
Identify Gene Nomenclature of the Identified Sequence Match
Determine Promiscuity of Sequence
Assessment of the nBLAST Search Results of the Sequence
Determine Location of the Sequence in the Gene
The Ensembl database is an online database, which produces and maintains automatic annotation selected eukaryotic genomes (www.ensembl.orq/index.html)
Identify Location of the Sequence in the Gene
Alternative Splicing and/or Transcription
The AceView Database provides curated and non-redundant sequence representation of all public mRNA sequences. The database is available through NCBI: http;//www.ncbi.nlm.nih.gov/IEB/Research/Acembly/
Further Investigation of the Gene mRNA Transcripts
Application of Method to LOC643911/hCG_1815491
Materials and Methods
Extraction of RNA
RNA extractions were performed using Trizol(R) reagent (Invitrogen, Carlsbad, Calif., USA) as per manufacturer's instructions. Each sample was homogenised in 300 μL of Trizol reagent using a modified dremel drill and sterilised disposable pestles. Additional 200 μL of Trizol reagent was added to the homogenate and samples were incubated at RT for 10 minutes. 100 μL of chloroform was then added, samples were shaken vortexed for 15 seconds, and incubated at RT for 3 further minutes. The aqueous phase containing target RNA was obtained by centrifugation at 12,000 rpm for 15 min, 40° C. RNA was then precipitated by incubating samples at RT for 10 min with 250 μL of isopropanol. Purified RNA precipitate was collected by centrifugation at 12,000 rpm for 10 minutes, 40. C and supernatants were discarded. Pellets were then washed with 1 mL 75% ethanol, followed by vortexing and centrifugation at 7,500 g for 8 min, 40° C. Finally, pellets were air-dried for 5 min and resuspended in 80 μL of RNase free water. To improve subsequent solubility samples were incubated at 55° C. for 10 min. RNA was quantified by measuring the optical density at A260/280 nm. RNA quality was assessed by electrophoresis on a 1.2% agarose formaldehyde gel.
Gene Chip Processing
RNA samples to analyze on Human Exon 1.0 ST GeneChips were processed using the Affymetrix WT target labeling and control kit (part# 900652) following the protocol described in (Affymetrix 2007 P/N 701880 Rev.4). Briefly: First cycle cDNA was synthesized from 100 ng ribosomal reduced RNA using random hexamer primers tagged with T7 promoter sequence and SuperScript II (Invitrogen, Carlsbad Calif.), this was followed by DNA Polymerase I synthesis of the second strand cDNA. Anti-sense cRNA was then synthesized using T7 polymerase. Second cycle sense cDNA was then synthesised using SuperScript II, dNTP+dUTP, and random hexamers to produce sense strand cDNA incorporating uracil. This single stranded uracil containing cDNA was then fragmented using a combination of uracil DNA glycosylase (UDG) and apurinic/apyrimidinic endonuclease1 (APE 1). Finally the DNA was biotin labelled using terminal deoxynucleotidyl transferase (TdT) and the Affymetrix proprietary DNA Labeling reagent. Hybridization to the arrays was carried out at 45° C. for 16-18hours.
Washing and staining of the hybridized GeneChips was carried out using the Affymetrix Fluidics Station 450 and scanned with the Affymetrix Scanner 3000 following recommended protocols.
SYBR Green Based Quantitative Real Time-PCR
Quantitative real time polymerase chain reaction was performed on RNA isolated from clinical samples for the amplification and detection of the various hCG_1815491 transcripts.
Firstly cDNA was synthesized from 2ug of total RNA using the Applied Biosystems High Capacity Reverse transcription Kit (P/N 4368814). After synthesis the reaction was diluted 1:2 with water to obtain a final volume of 40 ul and 1 ul of this diluted cDNA used in subsequent PCR reactions.
PCR was performed in a 25 ul volume using 12.5 ul Promega 2× PCR master mix (P/N M7502), 1.5 ul 5 uM forward primer, 1.5 ul 5 uM reverse primer, 7.875 ul water, 0.625 ul of a 1:3000 dilution of 10,000× stock of SYBR green 1 pure dye (Invitrogen P/N S7567), and 1 ul of cDNA.
Cycling conditions for amplification were 95° for 2minutes×1 cycle, 95° for 15 seconds and 60° for 1 minute×40 cycles. The amplification reactions were performed in a Corbett Research Rotor-Gene RG3000 or a Roche LightCycler480 real-time PCR machine. When the Roche LightCycler480 real-time PCR machine was used for amplification the reaction volume was reduced to 10 ul and performed in a 384 well plate but the relative ratios between all the components remained the same. Final results were calculated using the ΔΔCt method with the expression levels of the various hCG_1815491 transcripts being calculated relative to the expression level of the endogenous house keeping gene HPRT.
End-Point PCR
End point PCR was performed on RNA isolated from clinical samples for the various hCG_1815491 transcripts. Conditions were identical to those described for the SYBR green assay above but with the SYBR green dye being replaced with water. The amplification reactions were performed in a MJ Research PTC-200 thermal cycler. 2.5 μl of the amplified products were analysed on 2% agarose E-gel (Invitrogen) along with a 100-base pair DNA Ladder Marker.
Results
The nucleotide structure and expression levels of transcripts related to hCG_1815491 was analysed based on the identification of diagnostic utility of Affymetrix probesets 238021_s_at and 238022_at from the gene chip analysis.
The gene hCG_1815491 is currently represented in NCBI as a single RefSeq sequence, XM_93911. The RefSeq sequence of hCG_1815491 is based on 89 GenBank accessions from 83 cDNA clones. Prior to March 2006, these clones were predicted to represent two overlapping genes, LOC388279 and LOC650242 (the latter also known as LOC643911). In March 2006, the human genome database was filtered against clone rearrangements, co-aligned with the genome and clustered in a minimal non-redundant way. As a result, LOC388272 and LOC650242 were merged into one gene named hCG_1815491 (earlier references to hCG_1815491 are: LOC388279, LOC643911, LOC650242, XM_944116, AF275804, XM_373688). It has been determined that the Ref Sequence, which is defined by the genomic coordinates 8579310 to 8562303 on human chromosome 16 as defined by the NCBI contig reference NT_010498.15|Hs16_10655, NCBI 36 March 2006 genome encompasses hCG_1815491. The 10 predicted RNA variants derived from this gene have been aligned with the genomic nucleotide sequence residing in the map region 8579310 to 8562303. This alignment analysis revealed the existence of at least 6 exons of which several are alternatively spliced. The identified exons are in contrast to the just 4 exons specified in the NCBI hCG_1815491 RefSeq XM_93911. Two additional putative exons were also identified in the Ref Sequence by examination of included probesets on Affymetrix Genechip HuGene Exon 1.0 that target nucleotide sequences embedded in the Ref Sequence. The identified and expanded exon-intron structure of hCG_1815491 have been used to design specific oligonucleotide primers, which allowed measurement of the expression of RNA variants generated from the Ref Sequence by using PCR-based methodology (
Immunohistochemistry is a useful method for evaluating changes in local expression of up or down-regulated markers in human tissue.
Materials and Methods:
Four micrometre sections were incubated in a universal decloaking buffer for 75 minutes at 80 μL to expose masked epitopes. Protein expression was determined using an antibody targeting the C-terminal domain of Mesothelin (MSLN) on colonic biopsies from 30 patients (10 normals, 10 cancers, 10 adenomas). Antibodies were applied for one hour at room temperature. After washing, sections were incubated with polymeric horse-radish peroxidase. Antibody localization was visualized using 3′3′ diaminobenzidine.
Result:
There was a marked upregulation of MSLN in the adenoma and cancer tissues compared to the normal controls. The normal tissues showed mild staining for MSLN in the cytoplasm of the colonic epithelium but the cancer and particularly the adenomas tissues shows significant upregulation of MSLN in their multilayered epithelium. This upregulation was observed in all 10 adenomas tissue and in 9 out of the 10 cancer tissues. These patterns of staining are illustrated in
Conclusion:
Elevated expression of MSLN has been detected in colon neoplasia, confirming the upregulation observed in the mRNA expression data and verifying the diagnostic utility of both the MSLN mRNA and protein for detection of colorectal neoplasia.
Affymetrix probeset designated 205828_at was identified to be expressed higher in 190 neoplastic tissue specimens relative to 264 non-neoplastic specimens. The probeset 205828_at hybridizes to RNA transcribed from the gene encoding Matrix Metalloproteinase 3 (MMP3) NM_002422. The differential expression profile of probeset 205828_at was further demonstrated by profiling RNA collected from 68 independent clinical specimens comprising 19 adenomas, 19 adenocarcinomas and 30 non-disease controls,
Materials and Methods
A commercially available bead-suspension immunoassay targeting the protein MMP3 was purchased from R&D Systems (MMP Kit reagents LMP000 and LMP513) to measure MMP3 concentration in stools of human patients diagnosed with colorectal neoplasia. Proteins were extracted from stool specimens using a phosphate buffered saline wash from 6 non-disease controls, 10 adenoma and 11 adenocarcinoma subjects. The resulting protein extracts were analyzed using the Luminex bead-based suspension MMP3 assay as recommended by manufacturer.
Results
An elevated endogenous expression of MMP3 was observed in stool specimens from patients diagnosed with colon adenomas or adenocarcinomas relative to non-neoplastic controls (
Conclusion
Measurement of MMP3 protein in bodily fluids such as stool samples is useful for diagnosing colorectal neoplasia.
Tables
Probeset designations include both HG-133plus2 probeset IDs and Human Gene 1.0 ST array probe ids. The latter can be conveniently mapped to Transcript Cluster ID using the Human Gene 1.0ST probe tab file provided by Affymetrix (http://www.affymetrix.com/Auth/analysis/downloads/na22/wtgene/HuGene-1_0-st-v1.probe.tab.zip). Using publicly available software such as NetAffx (provided by Affymetrix), the Transcript Cluster ID may be further mapped to gene symbol, chromosomal location, etc.
Table 1.
Probesets demonstrated to be expressed higher in neoplastic tissues relative to non-neoplastic controls. TargetPS: Affymetrix HG-U133plus2 probeset id; Symbol: putative gene symbol corresponding to target probeset id—multiple symbol names indicate the possibility of probeset hybridisation to multiple gene targets; Signif. FDR: Adjusted p-value for mean difference testing between RNA extracted from neoplasia and non-neoplastic tissues. Adjustment is made using Benjamini & Hochberg correction for multiple hypothesis testing (Benjamini and Hochberg, 1995); D.value50: Diagnostic effectiveness parameter estimate corresponding to the area of a receiver operator characteristic ROC. This parameter provides a convenient estimate of diagnostic utility and is described in (Saunders, 2006); FC: fold change between mean expression level of neoplasia vs. non-neoplasia; Sens-Spec: Estimate of diagnostic performance corresponding to the ROC curve point demonstrating equal sensitivity and specificity; CI (95): 95% confidence interval of sensitivity and specificity estimates.
Table 2.
Evidence of multiple probesets which correspond to gene symbols claimed herein exhibiting RNA concentration differences between neoplasia and non-neoplastic controls. Symbol: gene symbol; ValidPS_UP: Affymetrix probeset IDs demonstrating statistically significant overexpression in neoplastic RNA extracts relative to non-neoplastic controls. Signif. FDR: Adjusted p-value for mean difference testing between RNA extracted from neoplasia and non-neoplastic tissues. Adjustment is made using Benjamini & Hochberg correction for multiple hypothesis testing (Benjamini and Hochberg, 1995); D.value50: Diagnostic effectiveness parameter estimate corresponding to the area of a receiver operator characteristic ROC. This parameter provides a convenient estimate of diagnostic utility and is described in (Saunders, 2006); FC: fold change between mean expression level of neoplasia vs. non-neoplasia; Sens-Spec: Estimate of diagnostic performance corresponding to the ROC curve point demonstrating equal sensitivity and specificity; CI (95): 95% confidence interval of sensitivity and specificity estimates.
Table 3.
Probesets which demonstrate a qualitatively (in addition to quantitative) elevated profile in neoplastic tissues relative to non-neoplastic controls. TargetPS: Affymetrix HG-U133plus2 probeset id; Symbol: putative gene symbol corresponding to target probeset id—multiple symbol names indicate the possibility of probeset hybridisation to multiple gene targets; Signif. FDR: Adjusted p-value for mean difference testing between RNA extracted from neoplasia and non-neoplastic tissues. Adjustment is made using Benjamini & Hochberg correction for multiple hypothesis testing (Benjamini and Hochberg, 1995); D.value50: Diagnostic effectiveness parameter estimate corresponding to the area of a receiver operator characteristic ROC. This parameter provides a convenient estimate of diagnostic utility and is described in (Saunders, 2006); FC: fold change between mean expression level of neoplasia vs. non-neoplasia; Sens-Spec: Estimate of diagnostic performance corresponding to the ROC curve point demonstrating equal sensitivity and specificity; CI (95): 95% confidence interval of sensitivity and specificity estimates.
Table 4.
Evidence of multiple probesets which correspond to gene symbols claimed herein exhibiting qualitative changes in RNA concentration in neoplastic tissues. Symbol: gene symbol; ValidPS_UP: Affymetrix probeset IDs demonstrating statistically significant overexpression in neoplastic RNA extracts relative to non-neoplastic controls. Signif. FDR: Adjusted p-value for mean difference testing between RNA extracted from neoplasia and non-neoplastic tissues. Adjustment is made using Benjamini & Hochberg correction for multiple hypothesis testing (Benjamini and Hochberg, 1995); D.value50: Diagnostic effectiveness parameter estimate corresponding to the area of a receiver operator characteristic ROC. This parameter provides a convenient estimate of diagnostic utility and is described in (Saunders, 2006); FC: fold change between mean expression level of neoplasia vs. non-neoplasia; Sens-Spec: Estimate of diagnostic performance corresponding to the ROC curve point demonstrating equal sensitivity and specificity; CI (95): 95% confidence interval of sensitivity and specificity estimates.
Table 5
Probesets demonstrated to be expressed higher in adenoma tissues relative to cancer tissues. TargetPS: Affymetrix HG-U133plus2 probeset id; Symbol: putative gene symbol corresponding to target probeset id—multiple symbol names indicate the possibility of probeset hybridisation to multiple gene targets; Signif. FDR: Adjusted p-value for mean difference testing between RNA extracted from neoplasia and non-neoplastic tissues. Adjustment is made using Benjamini & Hochberg correction for multiple hypothesis testing (Benjamini and Hochberg, 1995); D.value50: Diagnostic effectiveness parameter estimate corresponding to the area of a receiver operator characteristic ROC. This parameter provides a convenient estimate of diagnostic utility and is described in (Saunders, 2006); FC: fold change between mean expression level of adenomas vs. cancers; Sens-Spec: Estimate of diagnostic performance corresponding to the ROC curve point demonstrating equal sensitivity and specificity; CI (95): 95% confidence interval of sensitivity and specificity estimates.
Table 6
Probesets demonstrated to be expressed higher in cancer tissues relative to adenoma tissues. TargetPS: Affymetrix HG-U133plus2 probeset id; Symbol: putative gene symbol corresponding to target probeset id—multiple symbol names indicate the possibility of probeset hybridisation to multiple gene targets; Signif. FDR: Adjusted p-value for mean difference testing between RNA extracted from neoplasia and non-neoplastic tissues. Adjustment is made using Benjamini & Hochberg correction for multiple hypothesis testing (Benjamini and Hochberg, 1995); D.value50: Diagnostic effectiveness parameter estimate corresponding to the area of a receiver operator characteristic ROC. This parameter provides a convenient estimate of diagnostic utility and is described in (Saunders, 2006); FC: fold change between mean expression level of cancer tissues vs. adenoma tissues; Sens-Spec: Estimate of diagnostic performance corresponding to the ROC curve point demonstrating equal sensitivity and specificity; CI (95): 95% confidence interval of sensitivity and specificity estimates.
Table 7
Evidence of multiple probesets which correspond to gene symbols claimed herein exhibiting RNA concentration differences between adenoma and cancer tissues. Symbol: gene symbol; ValidPS_UP: Affymetrix probeset IDs demonstrating statistically significant overexpression in neoplastic RNA extracts relative to non-neoplastic controls. Signif. FDR: Adjusted p-value for mean difference testing between RNA extracted from neoplasia and non-neoplastic tissues. Adjustment is made using Benjamini & Hochberg correction for multiple hypothesis testing (Benjamini and Hochberg, 1995); D.value50: Diagnostic effectiveness parameter estimate corresponding to the area of a receiver operator characteristic ROC. This parameter provides a convenient estimate of diagnostic utility and is described in (Saunders, 2006); FC: fold change between mean expression level of adenoma tissues vs. cancer tissues; Sens-Spec: Estimate of diagnostic performance corresponding to the ROC curve point demonstrating equal sensitivity and specificity; CI (95): 95% confidence interval of sensitivity and specificity estimates.
Table 8
Evidence of multiple probesets which correspond to gene symbols claimed herein exhibiting RNA concentration differences between cancer and adenoma tissues. Symbol: gene symbol; ValidPS_UP: Affymetrix probeset IDs demonstrating statistically significant overexpression in neoplastic RNA extracts relative to non-neoplastic controls. Signif. FDR: Adjusted p-value for mean difference testing between RNA extracted from neoplasia and non-neoplastic tissues. Adjustment is made using Benjamini & Hochberg correction for multiple hypothesis testing (Benjamini and Hochberg, 1995); D.value50: Diagnostic effectiveness parameter estimate corresponding to the area of a receiver operator characteristic ROC. This parameter provides a convenient estimate of diagnostic utility and is described in (Saunders, 2006); FC: fold change between mean expression level of cancer tissues vs. adenoma tissues; Sens-Spec: Estimate of diagnostic performance corresponding to the ROC curve point demonstrating equal sensitivity and specificity; CI (95): 95% confidence interval of sensitivity and specificity estimates.
Those skilled in the art will appreciate that the invention described herein is susceptible to variations and modifications other than those specifically described. It is to be understood that the invention includes all such variations and modifications. The invention also includes all of the steps, features, compositions and compounds referred to or indicated in this specification, individually or collectively, and any and all combinations of any two or more of said steps or features.
90-97.8
60-82.8
56-79.6
78-94.6
76-93.5
76-93.6
73-91.9
73-91.9
72-91.2
73-91.9
68-88.7
69-89.3
69-89.4
68-88.8
65-86.7
65-86.6
65-86.6
65-86.5
64-85.8
66-87.4
64-85.7
65-86.7
58-81.1
58-81.1
50-74.4
77-94.2
76-93.5
73-91.9
72-91.3
73-91.9
69-89.4
65-86.6
65-86.6
66-87.3
58-81.1
56-79.4
50-74.4
Number | Date | Country | |
---|---|---|---|
60982114 | Oct 2007 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15381895 | Dec 2016 | US |
Child | 15809162 | US | |
Parent | 14540583 | Nov 2014 | US |
Child | 15381895 | US | |
Parent | 12739580 | Jul 2010 | US |
Child | 14540583 | US |