A Sequence Listing in ASCII text format, submitted under 37 C.F.R. § 1.821, entitled 9151-107TSCT2 ST25.txt, 9,934 bytes in size, generated on Feb. 10, 2021 and filed via EFS-Web, is provided in lieu of a paper copy. This Sequence Listing is hereby incorporated by reference herein into the specification for its disclosures.
The present invention provides methods and compositions directed to identification of genetic markers associated with prostate cancer.
Genome-wide association (GWA) studies have identified sequence variants that are consistently associated with risk for complex diseases1. Such variants have limited utility in the assessment of disease risk in an individual, however, because most of them confer a relatively small risk. What is needed is a determination of whether combinations of individual variants confer larger, more clinically useful, increases in risk.
Age, race, and family history are the three risk factors that are consistently associated with the risk of prostate cancer3. A meta analysis found a pooled odds ratio of 2.5 for men who have an affected first-degree relative4. In the present invention, genetic variants in five chromosomal regions associated with a statistically significant risk of prostate cancer have been identified using genome-wide analysis. These include three independent regions at 8q245-8 and one region each at 17q12 and 17q24.39. While it is anticipated that these five regions harbor prostate cancer susceptibility genes or regulatory factors affecting critical genes, the specific genes in question have not been identified to date.
Thus, the present invention overcomes previous shortcomings in the art by identifying significant statistical associations between a combination of genetic markers in different chromosomal regions and prostate cancer. Thus, the present invention provides methods and compositions for identifying a subject at increased risk of developing prostate cancer by detecting the genetic markers of this invention.
In one aspect, the present invention provides a method of identifying a subject as having an increased risk of developing prostate cancer, comprising detecting in nucleic acid of the subject the presence of two or more polymorphisms associated with an increased risk of prostate cancer, wherein each of the two or more polymorphisms is present in a different chromosome region selected from the group consisting of:
The methods of the present invention can also be employed in identifying a subject having an increased risk of developing prostate cancer by detecting the various polymorphisms and genetic markers described herein and further identifying a family history of prostate cancer in the subject, whereby the presence of any of the combinations of risk markers in the subject's genotypic makeup as described herein and a family history of prostate cancer identify the subject as having an increased risk of developing prostate cancer. The methods of this invention can also be used to supplement the predictive value of prostate serum antigen (PSA). Thus, a subject having any of the combinations of risk markers as described herein and an elevated and/or rising PSA serum level is a subject that has an increased risk of developing prostate cancer.
In a further aspect, the present invention provides a method of identifying a human subject as having an increased risk of developing prostate cancer, comprising detecting in the subject the presence of two or more alleles selected from the group consisting of:
The present invention is explained in greater detail below. This description is not intended to be a detailed catalog of all the different ways in which the invention may be implemented, or all the features that may be added to the instant invention. For example, features illustrated with respect to one embodiment may be incorporated into other embodiments, and features illustrated with respect to a particular embodiment may be deleted from that embodiment. In addition, numerous variations and additions to the various embodiments suggested herein will be apparent to those skilled in the art in light of the instant disclosure, which do not depart from the instant invention. Hence, the following specification is intended to illustrate some particular embodiments of the invention, and not to exhaustively specify all permutations, combinations and variations thereof.
The present invention is based on the unexpected discovery that the combination of alleles in various chromosome regions is statistically associated with an increased risk of developing prostate cancer. There are numerous benefits of carrying out the methods of this invention to identify a subject having an increased risk of developing prostate cancer, including but not limited to, identifying subjects who are good candidates for prophylactic and/or therapeutic treatment, and screening for cancer at an earlier time or more frequently than might otherwise be indicated, to increase the chances of early detection of a prostate cancer. Thus, in one aspect, the present invention provides a method of identifying a subject (e.g., a human subject) as having an increased risk of developing prostate cancer, comprising detecting in nucleic acid of the subject the presence of two or more polymorphisms, wherein each of the two or more polymorphisms is present in a different chromosome region selected from the group consisting of:
As noted herein, the methods of this invention can comprise detecting three or more polymorphisms, each from a different chromosome region among those listed as (a)-(e) above, in any combination; detecting four or more polymorphisms, each from a different chromosome region among those listed as (a)-(e) above, in any combination; and/or detecting five polymorphisms, each from a different chromosome region among those listed as (a)-(e) above.
Thus, the present invention provides methods for detection of a polymorphism or genetic marker of this invention in any of the following combinations of chromosome regions, wherein a, b, c, d and e represent each chromosome region as listed herein.
Combinations of two alleles include: a and b; a and c; a and d; a and e; b and c; b and d; b and e; c and d; c and e; d and e.
Combinations of three alleles include: a, b and c; a, b and d; a, b and e; a, c and e; a, c and d; a, e and d; b, c and d; b, c and e; b, d and e; c, d and e.
Combinations of four alleles include: a, b, c and d; a, b, c and e; b, c, d and e; a, b, c and e; a, c, d and e; and a, b, d and e.
The two, three, four or five polymorphisms can also be detected in combination with other polymorphisms, present in any one two, three, four or five of the chromosome regions listed as (a)-(e) above and/or present in other chromosome regions in which polymorphisms and genetic markers associated with prostate cancer risk are known or later identified to be present.
In certain embodiments of this invention, the polymorphism in chromosome region 17q12 can be the T allele of the single nucleotide polymorphism having GenBank® database Accession No. rs4430796. In other embodiments, the polymorphism in chromosome region 17q24.3 can be the G allele of the single nucleotide polymorphism having GenBank® database Accession No. rs1859962. In further embodiments, the polymorphism in chromosome region 8q24 (Region 1) can be the A allele of the single nucleotide polymorphism having GenBank® database Accession No. rs1447295. In still further embodiments, the polymorphism in chromosome region 8q24 (Region 2) can be the A allele of the single nucleotide polymorphism having GenBank® database Accession No. rs16901979. In other embodiments, the polymorphism in chromosome region 8124 (Region 3) can be the G allele of the single nucleotide polymorphism having GenBank® database Accession No. rs6983267.
In a further aspect, the present invention provides a method of identifying a human subject as having an increased risk of developing prostate cancer, comprising detecting in the subject the presence of two or more alleles selected from the group consisting of:
a) the T allele of the single nucleotide polymorphism having GenBank® database Accession No. rs4430796;
b) the G allele of the single nucleotide polymorphism having GenBank® database Accession No. rs1859962;
c) the A allele of the single nucleotide polymorphism having GenBank® database Accession No. rs16901979;
d) the G allele of the single nucleotide polymorphism having GenBank® database Accession No. rs6983267;
e) the A allele of the single nucleotide polymorphism having GenBank® database Accession No. rs1447295; and
f) any combination of (a), (b), (c) (d) and (e) above,
whereby the presence of said alleles identifies the subject as having an increased risk of developing prostate cancer. Thus, the methods of this invention can further comprise detecting, in a subject, three or more alleles among those listed as (a)-(e) above, in any combination; detecting four or more alleles among those listed as (a)-(e) above, in any combination; and/or detecting all five of the alleles listed as (a)-(e) above. The two, three, four or five alleles can also be detected in combination with other alleles, which can be present in the chromosome regions in which the alleles of (a)-(e) above are located and/or which can be present in other chromosome regions in which alleles associated with prostate cancer risk are known or later identified to be present.
Thus, for example, the following combinations of alleles can be detected according to the methods of this invention to identify a subject as having an increased risk of developing prostate cancer, wherein a, b, c, d and e represent each of the alleles as listed herein.
Combinations of two alleles can include: a and b; a and c; a and d; a and e; b and c; b and d; b and e; c and d; c and e; d and e.
Combinations of three alleles can include: a, b and c; a, b and d; a, b and e; a, c and e; a, c and d; a, e and d; b, c and d; b, c and e; b, d and e; c, d and e.
Combinations of four alleles include: a, b, c and d; a, b, c and e; b, c, d and e; a, b, c and e; a, c, d and e; and a, b, d and e.
Additional risk alleles that can be detected in the methods of this invention to identify a subject as having an increased risk of developing prostate cancer, with and without a family history of prostate cancer and/or with and without an elevated and/or rising PSA level are described in Tables 8-12 herein. These alleles can be present in any combination with any of the five alleles described above as (a)-(e) and/or in any combination with one another.
The present invention further provides embodiments wherein a subject of this invention is heterozygous for an allele of this invention and other embodiments wherein a subject of this invention is homozygous for an allele of this invention. In the methods provided herein wherein a combination of alleles is analyzed, the subject can be heterozygous or homozygous for any given allele in any combination relative to the other alleles in the combination.
In certain embodiments of this invention, the methods described herein can be employed to identify 1) a subject at increased or decreased risk of a more aggressive form of prostate cancer (e.g., having a Gleason score of 7 (4+3) to 10), 2) a subject at increased or decreased risk of a poor prognosis (e.g., increased likelihood the cancer will metastasize, will be poorly responsive to treatment and/or will lead to death) once cancer has been diagnosed in the subject; and/or 3) a subject at increased or decreased risk of an early age of onset of prostate cancer, by identifying in the subject the polymorphisms and/or alleles of this invention.
It is further contemplated that the methods of this invention can be carried out to diagnose prostate cancer in a subject, by detecting the combinations of polymorphisms or genetic markers described herein.
In further aspects, the present invention provides a kit for carrying out the methods of this invention, wherein the kit can comprise primers, probes, primer/probe sets, reagents, buffers, etc., as would be known in the art, for the detection of the polymorphisms and/or alleles of this invention in a nucleic acid sample from a subject. For example, a primer or probe can comprise a contiguous nucleotide sequence that is complementary to a region comprising a polymorphism or genetic marker of this invention. In particular embodiments, a kit of this invention will comprise primers and probes that allow for the specific detection of the polymorphisms and genetic markers of this invention. Such a kit can further comprise blocking probes, labeling reagents, blocking agents, restriction enzymes, antibodies, sampling devices, positive and negative controls, etc., as would be well known to those of ordinary skill in the art.
As used herein, “a,” “an” or “the” can mean one or more than one. For example, “a” cell can mean a single cell or a multiplicity of cells.
Also as used herein, “and/or” refers to and encompasses any and all possible combinations of one or more of the associated listed items, as well as the lack of combinations when interpreted in the alternative (“or”).
Furthermore, the term “about,” as used herein when referring to a measurable value such as an amount of a compound or agent of this invention, dose, time, temperature, and the like, is meant to encompass variations of ±20%, ±10%, ±5%, ±1%, ±0.5%, or even ±0.1% of the specified amount.
As used herein, the term “prostate cancer” describes an uncontrolled (malignant) growth of cells in the prostate gland, which is located at the base of the urinary bladder and is responsible for helping control urination as well as forming part of the semen. Symptoms of prostate cancer can include, but are not limited to, urinary problems (e.g., not being able to urinate; having a hard time starting or stopping the urine flow; needing to urinate often, especially at night; weak flow of urine; urine flow that starts and stops; pain or burning during urination), difficulty having an erection, blood in the urine or semen, and/or frequent pain in the lower back, hips, or upper thighs.
The term “chromosome region” as used herein refers to a part of a chromosome defined either by anatomical details, especially by banding, or by its linkage groups. The particular chromosome regions of this invention are further defined by the following boundaries.
Chromosome region 17q12: Region around rs4430796 (chr17:33,172,153): from 33,163,028 to 33,189,279, ˜20 Kb, #SNPs=11 (Table 8).
Chromosome region 17q24.2: Region around rs1859962 (chr17:66,20,348): from 66,616,533 to 66,754,527, ˜140 Kb, #SNPs=174 (Table 9).
Chromosome region 8q24 (Region 2): Region around rs16901979 (chr8:128,194,098): from 128,145,397 to 128,215,780), ˜70 kb, #SNPs=112 (Table 11).
Chromosome region 8q24 (Region 3): Region around rs6983267 (chr8:128,482,487): from 128,469,358 to 128,535,996, ˜65 kb, #SNPs=70 (Table 12).
Chromosome region 8q24 (Region 1): Region around rs1447295 (chr8:128,554,220): from 128,536,936 to 128,617,860, ˜80 kb, #SNPs=116 (Table 10).
All the positions described above are based on Build 35 and the SNPs are based on Hapmap SNP release 21.
Also as used herein, “linked” describes a region of a chromosome that is shared more frequently in family members or members of a population manifesting a particular phenotype and/or affected by a particular disease or disorder, than would be expected or observed by chance, thereby indicating that the gene or genes or other identified marker(s) within the linked chromosome region contain or are associated with an allele that is correlated with the phenotype and/or presence of a disease or disorder, or with an increased or decreased likelihood of the phenotype and/or of the disease or disorder. Once linkage is established, association studies (linkage disequilibrium) can be used to narrow the region of interest or to identify the marker (e.g., allele or haplotype) correlated with the phenotype and/or disease or disorder.
Furthermore, as used herein, the term “linkage disequilibrium” or “LD” refers to the occurrence in a population of two linked alleles at a frequency higher or lower than expected on the basis of the gene frequencies of the individual genes. Thus, linkage disequilibrium describes a situation where alleles occur together more often than can be accounted for by chance, which indicates that the two alleles are physically close on a DNA strand.
The term “genetic marker” or “polymorphism” as used herein refers to a characteristic of a nucleotide sequence (e.g., in a chromosome) that is identifiable due to its variability among different subjects (i.e., the genetic marker or polymorphism can be a single nucleotide polymorphism, a restriction fragment length polymorphism, a microsatellite, a deletion of nucleotides, an addition of nucleotides, a substitution of nucleotides, a repeat or duplication of nucleotides, a translocation of nucleotides, and/or an aberrant or alternate splice site resulting in production of a truncated or extended form of a protein, etc., as would be well known to one of ordinary skill in the art).
A “single nucleotide polymorphism” (SNP) in a nucleotide sequence is a genetic marker that is polymorphic for two (or in some case three or four) alleles. SNPs can be present within a coding sequence of a gene, within noncoding regions of a gene and/or in an intergenic (e.g., intron) region of a gene. A SNP in a coding region in which both forms lead to the same polypeptide sequence is termed synonymous (i.e., a silent mutation) and if a different polypeptide sequence is produced, the alleles of that SNP are non-synonymous. SNPs that are not in protein coding regions can still have effects on gene splicing, transcription factor binding and/or the sequence of non-coding RNA.
The SNP nomenclature provided herein refers to the official Reference SNP (rs) identification number as assigned to each unique SNP by the National Center for Biotechnological Information (NCBI), which is available in the GenBank® database.
In some embodiments, the term genetic marker is also intended to describe a phenotypic effect of an allele or haplotype, including for example, an increased or decreased amount of a messenger RNA, an increased or decreased amount of protein, an increase or decrease in the copy number of a gene, production of a defective protein, tissue or organ, etc., as would be well known to one of ordinary skill in the art.
An “allele” as used herein refers to one of two or more alternative forms of a nucleotide sequence at a given position (locus) on a chromosome. Usually alleles are nucleotides present in a nucleotide sequence that makes up the coding sequence of a gene, but sometimes the term is used to refer to a nucleotide in a non-coding region of a gene. An individual's genotype for a given gene is the set of alleles it happens to possess. As noted herein, an individual can be heterozygous or homozygous for an allele of this invention.
Also as used herein, a “haplotype” is a set of SNPs on a single chromatid that are statistically associated. It is thought that these associations, and the identification of a few alleles of a haplotype block, can unambiguously identify all other polymorphic sites in its region. The term “haplotype” is also commonly used to describe the genetic constitution of individuals with respect to one member of a pair of allelic genes; sets of single alleles or closely linked genes that tend to be inherited together.
The terms “increased risk” and “decreased risk” as used herein define the level of risk that a subject has of developing prostate cancer, as compared to a control subject that does not have the polymorphisms and genetic markers of this invention in the control subject's nucleic acid.
A sample of this invention can be any sample containing nucleic acid of a subject, as would be well known to one of ordinary skill in the art. Nonlimiting examples of a sample of this invention include a cell, a body fluid, a tissue, a washing, a swabbing, etc., as would be well known in the art.
A subject of this invention is any animal that is susceptible to prostate cancer as defined herein and can include, for example, humans, as well as animal models of prostate cancer (e.g., rats, mice, dogs, nonhuman primates, etc.). In some aspects of this invention, the subject can be a Caucasian (e.g., white; European-American; Hispanic) human and in other aspects the subject can be a human of black African ancestry (e.g., black; African American; African-European; African-Caribbean, etc.). In yet other aspects the subject can be Asian. In further aspects of this invention, the subject has a family history of prostate cancer (e.g., having at least one first degree relative diagnosed with prostate cancer) and in some embodiments, the subject does not have a family history of prostate cancer. Additionally a subject of this invention has a diagnosis of prostate cancer in certain embodiments and in other embodiments, a subject of this invention does not have a diagnosis of prostate cancer.
As used herein, “nucleic acid” encompasses both RNA and DNA, including cDNA, genomic DNA, mRNA, synthetic (e.g., chemically synthesized) DNA and chimeras, fusions and/or hybrids of RNA and DNA. The nucleic acid can be double-stranded or single-stranded. Where single-stranded, the nucleic acid can be a sense strand or an antisense strand. The nucleic acid can be synthesized using oligonucleotide analogs or derivatives (e.g., inosine or phosphorothioate nucleotides, etc.). Such oligonucleotides can be used, for example, to prepare nucleic acids that have altered base-pairing abilities or increased resistance to nucleases.
An “isolated nucleic acid” is a nucleotide sequence that is not immediately contiguous with nucleotide sequences with which it is immediately contiguous (one on the 5′ end and one on the 3′ end) in the naturally occurring genome of the organism from which it is derived. Thus, in one embodiment, an isolated nucleic acid includes some or all of the 5′ non-coding (e.g., promoter) sequences that are immediately contiguous to a coding sequence. The term therefore includes, for example, a recombinant DNA that is incorporated into a vector, into an autonomously replicating plasmid or virus, or into the genomic DNA of a prokaryote or eukaryote, or which exists as a separate molecule (e.g., a cDNA or a genomic DNA fragment produced by PCR or restriction endonuclease treatment), independent of other sequences. It also includes a recombinant DNA that is part of a hybrid nucleic acid encoding an additional polypeptide or peptide sequence.
The term “isolated” can refer to a nucleic acid or polypeptide that is substantially free of cellular material, viral material, and/or culture medium (e.g., when produced by recombinant DNA techniques), or chemical precursors or other chemicals (when chemically synthesized). Moreover, an “isolated fragment” is a fragment of a nucleic acid or polypeptide that is not naturally occurring as a fragment and would not be found in the natural state.
The term “oligonucleotide” refers to a nucleic acid sequence of at least about six nucleotides to about 100 nucleotides, for example, about 15 to about 30 nucleotides, or about 20 to about 25 nucleotides, which can be used, for example, as a primer in a PCR amplification and/or as a probe in a hybridization assay or in a microarray. Oligonucleotides of this invention can be natural or synthetic, e.g., DNA, RNA, PNA, LNA, modified backbones, etc., as are well known in the art.
The present invention further provides fragments of the nucleic acids of this invention, which can be used, for example, as primers and/or probes. Such fragments or oligonucleotides can be detectably labeled or modified, for example, to include and/or incorporate a restriction enzyme cleavage site when employed as a primer in an amplification (e.g., PCR) assay.
The detection of a polymorphism, genetic marker or allele of this invention can be carried out according to various protocols standard in the art and as described herein for analyzing nucleic acid samples and nucleotide sequences, as well as identifying specific nucleotides in a nucleotide sequence.
For example, nucleic acid can be obtained from any suitable sample from the subject that will contain nucleic acid and the nucleic acid can then be prepared and analyzed according to well-established protocols for the presence of genetic markers according to the methods of this invention. In some embodiments, analysis of the nucleic acid can be carried by amplification of the region of interest according to amplification protocols well known in the art (e.g., polymerase chain reaction, ligase chain reaction, strand displacement amplification, transcription-based amplification, self-sustained sequence replication (3 SR), Qβ replicase protocols, nucleic acid sequence-based amplification (NASBA), repair chain reaction (RCR) and boomerang DNA amplification (BDA), etc.). The amplification product can then be visualized directly in a gel by staining or the product can be detected by hybridization with a detectable probe. When amplification conditions allow for amplification of all allelic types of a genetic marker, the types can be distinguished by a variety of well-known methods, such as hybridization with an allele-specific probe, secondary amplification with allele-specific primers, by restriction endonuclease digestion, and/or by electrophoresis. Thus, the present invention further provides oligonucleotides for use as primers and/or probes for detecting and/or identifying genetic markers according to the methods of this invention.
The genetic markers of this invention are correlated with (i.e., identified to be statistically associated with) prostate cancer as described herein according to methods well known in the art and as disclosed in the Examples provided herein for statistically correlating genetic markers with various phenotypic traits, including disease states and pathological conditions as well as determining levels of risk associated with developing a particular phenotype, such as a disease or pathological condition. In general, identifying such correlation involves conducting analyses that establish a statistically significant association and/or a statistically significant correlation between the presence of a genetic marker or a combination of markers and the phenotypic trait in a population of subjects and controls (e.g., ethnically matched controls). The correlation can involve one or more than one genetic marker of this invention (e.g., two, three, four, five, or more) in any combination. An analysis that identifies a statistical association (e.g., a significant association) between the marker or combination of markers and the phenotype establishes a correlation between the presence of the marker or combination of markers in a population of subjects and the particular phenotype being analyzed. A level of risk (e.g., increased or decreased) can then be determined for an individual on the basis of such population-based analyses.
Thus, in certain embodiments, the present invention provides a method of screening a subject for polymorphisms that are associated with prostate cancer, comprising: a) performing a population based study to detect polymorphisms in a group of subjects with prostate cancer and ethnically matched controls; b) identifying polymorphisms in the group of subjects that are statistically associated with prostate cancer; and c) screening a subject for the presence of the polymorphisms identified in step (b).
The present invention further provides a method of identifying an effective and/or appropriate (i.e., for a given subject's particular condition or status) treatment regimen for a subject with prostate cancer, comprising detecting one or more of the polymorphisms and genetic markers associated with prostate cancer of this invention in the subject, wherein the one or more polymorphisms and genetic markers are further statistically correlated with an effective and/or appropriate treatment regimen for prostate cancer according to protocols as described herein and as are well known in the art.
Also provided is a method of identifying an effective and/or appropriate treatment regimen for a subject with prostate cancer, comprising: a) correlating the presence of one or more genetic markers of this invention in a test subject or population of test subjects with prostate cancer for whom an effective and/or appropriate treatment regimen has been identified; and b) detecting the one or more markers of step (a) in the subject, thereby identifying an effective and/or appropriate treatment regimen for the subject.
Further provided is a method of correlating a polymorphism or genetic marker of this invention with an effective and/or appropriate treatment regimen for prostate cancer, comprising: a) detecting in a subject or a population of subjects with prostate cancer and for whom an effective and/or appropriate treatment regimen has been identified, the presence of one or more genetic markers or polymorphisms of this invention; and b) correlating the presence of the one or more genetic markers of step (a) with an effective treatment regimen for prostate cancer.
Examples of treatment regimens for prostate cancer are well known in the art. Subjects who respond well to particular treatment protocols can be analyzed for specific genetic markers and a correlation can be established according to the methods provided herein. Alternatively, subjects who respond poorly to a particular treatment regimen can also be analyzed for particular genetic markers correlated with the poor response. Then, a subject who is a candidate for treatment for prostate cancer can be assessed for the presence of the appropriate genetic markers and the most effective and/or appropriate treatment regimen can be provided.
In some embodiments, the methods of correlating genetic markers with treatment regimens of this invention can be carried out using a computer database. Thus the present invention provides a computer-assisted method of identifying a proposed treatment for prostate cancer. The method involves the steps of (a) storing a database of biological data for a plurality of subjects, the biological data that is being stored including for each of said plurality of subjects, for example, (i) a treatment type, (ii) at least one genetic marker associated with prostate cancer and (iii) at least one disease progression measure for prostate cancer from which treatment efficacy can be determined; and then (b) querying the database to determine the dependence on said genetic marker of the effectiveness of a treatment type in treating prostate cancer, to thereby identify a proposed treatment as an effective and/or appropriate treatment for a subject carrying a genetic marker correlated with prostate cancer.
In one embodiment, treatment information for a subject is entered into the database (through any suitable means such as a window or text interface), genetic marker information for that subject is entered into the database, and disease progression information is entered into the database. These steps are then repeated until the desired number of subjects has been entered into the database. The database can then be queried to determine whether a particular treatment is effective for subjects carrying a particular marker or combination of markers, not effective for subjects carrying a particular marker or combination of markers, etc. Such querying can be carried out prospectively or retrospectively on the database by any suitable means, but is generally done by statistical analysis in accordance with known techniques, as described herein.
The present invention is more particularly described in the following examples that are intended as illustrative only since numerous modifications and variations therein will be apparent to those skilled in the art.
The study sample was described in detail elsewhere10. Briefly, a large-scale population-based case-control study was conducted in Sweden, named CAPS (CAncer Prostate in Sweden). Prostate cancer patients were identified and recruited from four of the six regional cancer registries in Sweden. The inclusion criterion for case subjects was pathological or cytological verified adenocarcinoma of the prostate, diagnosed between July, 2001 and October, 2003. Among 3,648 identified prostate cancer case subjects, 3,161 (87%) agreed to participate. DNA samples from blood and TNM stage, Gleason grade (biopsy), and PSA levels at diagnosis were available for 2,893 patients (91%). These case subjects were classified as having advanced disease if they met any of the following criteria: T3/4, N+, M+, Gleason score sum ≥8, or PSA>50 ng/ml; otherwise, they were classified as localized. Control subjects were recruited concurrently with case subjects. They were randomly selected from the Swedish Population Registry, and matched according to the expected age distribution of cases (groups of five-year intervals) and geographical region. A total of 3,153 controls were invited and 2,149 (68%) agreed to participate. DNA samples from blood were available for 1,781 control subjects (83%). Serum PSA level was measured for all control subjects but was not used as an exclusion variable. A history of prostate cancer among first-degree relatives was obtained from a questionnaire for both cases and controls. Table 1 presents the demographic and clinical characteristics of the study subjects, which were Caucasian. Recruitment of the study population was completed in two phases, each with a similar number of subjects, those before Oct. 31, 2002 (CAPS1) and after Nov. 1, 2002 (CAPS2). Each participant gave written informed consent. The study received institutional approval at the Karolinska Institute, Umea University, and Wake Forest University School of Medicine.
Sixteen SNPs from five chromosomal regions (three at 8q24 and one each at 17q12 and 17q24.3) that have been reported to be associated with prostate cancer7-9,11 were selected for this study. Polymerase chain reaction (PCR) and extension primers for these SNPs were designed using the MassARRAY Assay Design 3.0 software (Sequenom, Inc). The primer information is shown in Table 13. PCR and extension reactions were performed according to the manufacturer's instructions, and extension product sizes were determined by mass spectrometry using the Sequenom iPLEX system. Duplicate test samples and two water samples (PCR negative controls) that were blinded to the technician were included in each 96-well plate. The rate of concordant results between duplicate samples was >99%.
Tests for Hardy-Weinberg equilibrium were performed for each SNP separately among case patients and control subjects using Fisher's exact test. Pair-wise linkage disequilibrium (LD) was tested for SNPs within each of the five chromosomal regions in control subjects using SAS/Genetics software (Version 9.0).
Allele frequency differences between case patients and control subjects were tested for each SNP using a chi-square test with 1 degree of freedom. Allelic odds ratio (OR) and 95% confidence interval (95% CI) were estimated based on a multiplicative model. For genotypes, a series of tests assuming an additive, dominant, or recessive genetic model were performed for each of the five SNPs using unconditional logistic regression with adjustment for age and geographic region, and the model that had the highest likelihood was considered as the best-fitting genetic model for the respective SNP.
The independent effect of each of the five previously implicated regions was tested by including the most significant SNP from each of the five regions in a logistic regression model using a backward selection procedure. Multiplicative interactions between SNPs were tested for each pair of SNPs by including both main effects and an interaction term (product of two main effects) in a logistic regression model. The cumulative effects of the five SNPs on prostate cancer were tested by counting the number of prostate cancer associated genotypes (based on the best-fitting genetic model from single SNP analysis) for these five SNPs in each subject. The OR for prostate cancer for men carrying any combination of 1, 2, 3, or ≥4 prostate cancer associated genotypes was estimated by comparing to men carrying none of the prostate cancer associated genotypes using logistic regression analysis. Tests were also performed for cumulative effect on prostate cancer association, which included five SNPs and family history.
Population attributable risk (PAR) was estimated for SNPs that remained significant after adjusting other covariates using the formula PAR=100%×p(OR−1)/[p(OR−1)+1], where p is the prevalence of prostate cancer associated genotypes among control subjects12. The joint PAR was calculated as
where PARi is the individual PAR for each associated SNP calculated under the full model and assuming no multiplicative interaction between the SNPs.
Associations of these five SNPs with TNM stages, aggressiveness of prostate cancer (advanced or localized prostate cancer), and family history (yes or no) were tested among cases only using a chi-square test of 2×N table. A trend test was used to assess the proportion of prostate cancer associated genotypes with each increasing Gleason score, from ≤4 to 10. Associations of SNPs with mean age at diagnosis were tested among cases only using a two sample t-test. Because serum PSA levels were not normally distributed, a non-parametric analysis (Wilcoxon rank sum test) was used to assess association between SNPs and pre-operative serum PSA level in cases or PSA levels at the time of sampling in controls. All reported P-values were based on a two sided test.
Sixteen SNPs in five chromosomal regions (three at 8q24 and two on 17q), which were previously implicated in harboring putative genes related to susceptibility to prostate cancer were evaluated. In the control group, each SNP was in Hardy-Weinberg equilibrium (P≥0.05). Significant pair-wise linkage disequilibrium (P<0.05) was observed for the SNPs within each region.
Table 2 lists allele frequencies of the 16 SNPs among case and control subjects and shows the results of allelic and genotypic tests. Significantly different frequencies (P<0.05) between case and control subjects were observed for SNPs in each of the five chromosomal regions. At 17q12, SNP rs4430796 had the strongest association with prostate cancer; the frequency of allele ‘T’ (SNP rs4430796) was 0.61 in cases and 0.56 in controls (P=6.0×10−7). Of the four SNPs at 17q24.3, three were associated with prostate cancer, but only rs1859962 had a highly statistically significant association (P=2.1×10−4). The results for 17q12 and 17q24.3 were similar to those of a previous report9. For SNPs at 8q24, statistically significant associations with prostate cancer were found for all SNPs examined across the three independent regions at 8q24. Of the 16 SNPs, 13 remained significant at P<0.05 after adjusting for 16 tests using a Bonferroni correction.
Similar to the results of allelic tests, carriers of previously reported risk associated alleles for SNPs at 17q12, 17q24.3, and 8q24 were significantly more likely to be prostate cancer cases (Table 2). When various genetic models were tested for SNPs at each region, a recessive model was the best-fitting genetic model for SNPs at 17q12 and 17q24.3, and a dominant model was the best-fitting genetic model for SNPs at Regions 1, 2, and 3 of 8q24.
Due to strong genetic dependence (linkage disequilibrium) among SNPs within each region, for a combined analysis, it was possible to select one SNP (the most significant SNP from single SNP analysis) to represent each of the five regions in tests for their independent association with prostate cancer (Table 3). When these five SNPs were included in a multivariate logistic regression model, each of the five SNPs remained significantly associated with prostate cancer after adjusting for other SNPs, and each continued to be highly significant when family history was included in the model. The population attributable risks, based on adjusted ORs, for each of these five SNPs and positive family history were estimated to account for 4% to 21% of prostate cancer in this Swedish study population. The estimated joint population attributable risk for prostate cancer of the five associated SNPs plus family history was 46% in the Swedish population studied.
When multiplicative interaction was tested for each possible pair of these five SNPs using an interaction term in logistic regression, none was significant at P<0.05. However, these five SNPs appeared to have a cumulative effect on the association with prostate cancer diagnosis, adjusting for age, geographic region, and family history (Table 4). When compared with men who did not carry any prostate cancer associated genotype of these five SNPs, men that carried any combination of 1, 2, 3, or ≥4 prostate cancer associated genotypes had increasingly higher likelihood to be a prostate cancer case (P-trend=3.33×10−18). When family history was included as another risk factor (coded as 0 or 1) for a total of 6 possible prostate cancer associated factors, a stronger cumulative effect on prostate cancer association was observed, adjusting for age and geographic region (P-trend=3.93×10−28). For example, compared with men who carried none of the six prostate cancer associated factors, men that carried any five or more of these associated factors had an OR of 9.48 (95% CI: 3.65-24.64, P=8.94×10−9) for prostate cancer. This cumulative effect was similarly observed in two subsets of CAPS study subjects, P-trend=1.36×10−10 for CAPS1 and P-trend=9.03×10−20 for CAPS2
The specificity and sensitivity of the regression model was calculated by constructing receiver operating characteristic (ROC) curves and calculated the area under the curve (AUC) statistics to estimate each model's ability to discriminate cases from control subjects. The AUC was 57.7 (95% CI: 56.0-59.3), 60.8 (59.1-62.4), and 63.3 (61.7-65.0), respectively, for the model with (1) age and region alone, (2) age, region and family history, and (3) age, region, family history and number of prostate cancer associated genotypes at the five SNPs. The AUC was significantly higher for model (3) than for model (2), P=6.12×10−6. It is important to note that these results may suffer from model over-fitting.
Table 5 shows that none of the five SNPs was significantly associated with aggressiveness of prostate cancer, Gleason score, family history, serum PSA level at diagnosis, or age at diagnosis (P>0.05). Furthermore, no associations with these clinical variables were found when multiple prostate cancer associated SNPs were considered simultaneously. For example, the 154 cases that carried four or more prostate cancer associated genotypes of these five SNPs were not significantly different from 162 cases that did not carry any prostate cancer associated genotype in terms of these clinical variables; positive family history was 17% and 21%, respectively (P=0.39), the proportion of advanced cases was 54% and 48%, respectively (P=0.33), and median serum PSA levels at diagnosis were 15 ng/ml and 14 ng/ml, respectively (P=0.27). A lack of association between these SNPs at 8q24 and clinical characteristics was also observed in previous studies8,13,14,16, while in other studies a trend of 8q24 prostate cancer associated alleles has been reported as occurring more often in patients with higher Gleason grade, stage or aggressive disease5-7,15,17.
Multiple chromosomal regions at 8q24 and 17q have been reported to be associated, at genome-wide significance level, with prostate cancer.5-9 While all three regions at 8q24 have been replicated in all published studies,11,13-17 no replication result has been published for regions at 17q. The highly statistically significant findings at 17q12 and 17q24.3 in this study provide the first independent confirmation for these two regions at 17q. In addition, the association of SNPs at Regions 1, 2, and 3 of 8q24 with prostate cancer was also confirmed. The discovery and confirmation of these five chromosomal regions that are associated with prostate cancer supports the value and potential of genetic association studies in complex diseases.
Although each of the SNPs in the five chromosomal regions was moderately associated with prostate cancer, the present study reveals that they have a stronger cumulative effect on prostate cancer association. It was estimated that men having 5 or more of the prostate cancer associated factors (prostate cancer associated genotypes at five SNPs and a positive family history of prostate cancer) have an odds ratio of 9.48 for prostate cancer. The cumulative effect is highly significant in the overall CAPS sample (P-trend=3.93×10−28) and consistent between the two subsets of CAPS study subjects, P-trend=1.36×10−10 for CAPS1 and P-trend=9.03×10−20 for CAPS2. Thus, the combined information from the five SNPs and family history can be used according to the present invention to assess an individual's risk of prostate cancer.
It was found that the presence of the five prostate cancer associated SNPs was independent of PSA levels in cases (Table 5) and controls, which suggests that some men with low PSA levels may have an increased risk of prostate cancer if they carry one or more prostate cancer associated genotypes described here. Studies using prediagnostic PSA in combination with the associated SNPs and family history will provide further insight into this aspect of the present invention.
The mechanism by which the SNPs analyzed in this study could affect the risk of prostate cancer has not been elucidated. Other than SNP rs4430796, which is located within the TCF2 gene, the specific genes affected by the rest of the SNPs have not been identified. As the five SNPs in this study appear to be associated with risk of prostate cancer in general, rather than with a more or less aggressive form, it is possible that the genetic variants described herein act at an early stage of carcinogenesis.
The African American study population cases consisted of 373 prostate cancer patients undergoing treatment for prostate cancer in the Department of Urology at Johns Hopkins Hospital from 1999 to 2006. The average age at diagnosis was 57 years (median, 56 years), and the range was 36-74 years. The 372 control individuals were men undergoing disease screening and were not thought to have prostate cancer on the basis of a physical exam and a serum prostate-specific antigen (PSA) value below 4 ng/ml. Both cases and controls were self-reported African Americans (i.e., of black African ancestry). The Institutional Review Board of Johns Hopkins University approved the study protocol.
Similar statistical methods as described in Example 1 are used to assess the cumulative effect of the SNPs of this invention in the five chromosome regions described herein on prostate cancer risk in African Americans.
As shown in Table 6, the risk of developing prostate cancer in African Americans increases as the number of risk genotypes of the five variants of this invention increased, in the same manner as shown for the Caucasian population described in Example 1.
The study population is the same Swedish population described in Example 1.
Similar statistical methods as described in Example 1 are used to assess the cumulative effect of the SNPs of this invention in the five chromosome regions described herein and family history on early age of onset of prostate cancer. Age-specific odds ratios were calculated in three intervals (<65, 65-69, >69).
As shown in Table 7, ORs for prostate cancer are stronger in prostate cancer subjects with early age of onset (<65 years) than in the other groups. For example, OR was 25.94 for men with ≥5 risk factors (five risk variants and family history) among men <65 years, compared with OR of 8.27 and 4.51 among men at age 65-69 years and at age >69, respectively.
The foregoing is illustrative of the present invention, and is not to be construed as limiting thereof. The invention is defined by the following claims, with equivalents of the claims to be included therein.
All publications, patent applications, patents, patent publications, all sequences identified by GenBank® database and/or SNP accession numbers, and other references cited herein are incorporated by reference in their entireties for the sequences and/or teachings relevant to the sentence and/or paragraph and/or claim in which the reference is presented.
aPosition is based on NCBI Build 35.
bAlleles reported to be associated with prostate cancer in previously published studies (Ref 5-9, 11).
cAllelic odds ratio is based on the multiplicative model
dThe best-fitting model for each SNP was determined after testing associations of a series of genetic models, including dominant and recessive models, with prostate cancer in the current study
eReference and prostate cancer associated genotypes for each SNP were defined based on the best-fitting genetic model
fP-value is based on likelihood ratio test (1-df tests, ajusted for age and geographic region, two-sided)
aFamily history and five SNPs are included in the multivariate logistic regression model adjusting for age and geographic
bFor SNPs, the reference and prostate cancer associated genotypes at each SNP are determined based on the best-fitting model after associations of a series of genetic models with prostate cancer in the current study
cRegression coefficient
dBased on likelihood ratio test
aTesting for cumulative effect of five SNPs (rs4430796, rs1859962, rs16901979, rs6983267, and rs1447295) adjusting for age, geographic region, and family history
b Testing for cumulative effect of the five SNPs plus family history adjusting for age and geographic region
cNumber of prostate cancer associated genotypes at the five SNPs
dNumber of prostate cancer associated factors (the five SNPs plus family history)
eRegression coefficient
fP-value is based on likelihood-ratio test, two-sided
gP-value is based on Armitage trend test
aReference or prostate cancer associated genotypes are determined based on the the best-fitting model at each SNP in the current study
b,dPearson Chi-square test, two-sided
cArmitage trend test, two-sided
eWilcoxon rank sum test, two-sided
fTwo-sample t test, two-sided
a Assuming the best-fit model at each SNP
b P-value is based on likelihood-ratio test, two-sided
This application is a continuation of U.S. application Ser. No. 15/863,636, filed Jan. 5, 2018, which is a continuation of U.S. application Ser. No. 12/339,653, filed Dec. 19, 2008, now abandoned, which claims the benefit under 35 U.S.C. § 119(e) of U.S. Provisional Application Ser. No. 61/016,117, filed Dec. 21, 2007, the entire contents of each of which are incorporated by reference herein.
This invention was made with government support under grant #CA106523, CA95052, CA1125117, and CA58236 awarded by the National Institutes of Health and grant #PC051264 awarded by the Department of Defense. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
61016117 | Dec 2007 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15863636 | Jan 2018 | US |
Child | 17177568 | US | |
Parent | 12339653 | Dec 2008 | US |
Child | 15863636 | US |