The methods relate to performing a medical procedure on patients diagnosed with having an increased risk for the development of breast cancer, that is based on using a polygenic risk score derived from single nucleotide polymorphisms. The methods further relates to a method for diagnosing patients with having an increased risk for the development of breast cancer, that is based on using a polygenic risk score derived from single nucleotide polymorphisms. The invention further relates to a unique set of single nucleotide polymorphisms for use in deriving the polygenic risk score.
Breast cancer is the most common cancer affecting women in the world. It is estimated that worldwide over 500,000 women died in 2011 due to breast cancer (Global Health Estimates, WHO 2013).
Breast cancer survival rates vary greatly worldwide. The survival rate can range from 80% in developed countries to below 40% in developing countries (Coleman et al., 2008). Early detecting in conjunction with various screening methods can potentially decrease the mortality associated with breast cancer.
Genome-wide association studies (GWAS) are observational studies of a set of genetic variants in individuals to see if any variant is associated with a particular trait. GWASs typically focus on associations between single-nucleotide polymorphisms (SNPs) and human diseases. In contrast to testing a small number of genetic regions, GWASs analyze the entire genome.
Since 2007, GWASs have identified many common SNPs, each with a modest contribution to breast cancer risk (Easton, D. F., et al., 2007).
As these SNPs are associated with relative risks ranging from 1.03-1.41 (Michailidou, K., et al., 2017), no individual SNP is usually informative on its own. However, a score based on combined genotypes across a large number of SNPs may have substantial predictive value for risk stratification (Mavaddat, N., et al., 2015; Dite, G. S., et al., 2016; Mealiffe, M. E., et al., 2010; Reeves, G. K., 2010; Shieh, Y., et al., 2016). While the utility of such a score has been investigated in large studies conducted in the general population, few have assessed its performance in high-risk women referred for genetic testing for breast cancer (Li, H., et al., 2017; Sawyer, S., et al., 2012).
SNP-based scores may have clinically useful predictive power in women referred for genetic testing due to a family history of disease. Sawyer et al. (2012) examined a 22-SNP polygenic risk score (PRS) comparing women who were diagnosed with breast cancer, who were either BRCA1/2 carriers or BRCA1/2 negative, to a set of controls. They found that BRCA1/2 negative cases had a significantly higher PRS than BRCA1/2 carriers or controls, and that BRCA1/2 negative cases in the highest quartile of the PRS distribution were more likely to have had early-onset breast cancer (<30 years of age) compared to those with a score in the lowest PRS quartile. Li et al. assessed a 24-SNP PRS among unaffected women from two familial breast cancer cohorts, and observed that women in the highest quintile of the PRS distribution were more than three times as likely to develop breast cancer as those in the lowest quintile (Li, H., et al., 2017).
Taken together, the data suggested that a SNP-based PRS may be useful for risk stratification in women with family history of breast cancer who are negative for high-penetrance breast cancer-susceptibility genes.
The present disclosure provides a method for performing a medical procedure by determining whether an individual has an increased risk for the development of breast cancer. The present disclosure also provides a method for diagnosis by determining whether an individual has an increased risk for the development of breast cancer. This disclosure sets forth processes, in addition to making and using the same, and other solutions to problems in the relevant field.
In some embodiments, there is provided a method for performing a medical procedure on a patient with a potential pre-disposition to cancer comprising: obtaining a nucleic acid sample from a patient, assaying the nucleic acid sample obtained from the patient for at least 50 single nucleotide polymorphisms (SNPs) set forth in Table 1, wherein for each SNP in this step, one or more of the following is assayed: the SNP from Table 1, another SNP located within 250 kilobases of the SNP from Table 1, and another SNP that has a pairwise r2=1.0 with the SNP from Table 1; calculating a polygenic risk score (PRS) based on the presence or absence of the at least 50 single nucleotide polymorphisms, wherein the polygenic risk score indicates a risk, relative to an average population, that the subject will develop breast cancer; and performing a medical procedure for the patient based on the PRS.
In some embodiments, there is provided a method for diagnosing a patient with a potential pre-disposition to cancer comprising: obtaining a nucleic acid sample from a patient, assaying the nucleic acid sample obtained from the patient for at least 50 single nucleotide polymorphisms (SNPs) set forth in Table 1, wherein for each SNP in this step, one or more of the following is assayed: the SNP from Table 1, another SNP located within 250 kilobases of the SNP from Table 1, and another SNP that has a pairwise r2=1.0 with the SNP from Table 1; calculating a polygenic risk score (PRS) based on the presence or absence of the at least 50 single nucleotide polymorphisms, wherein the polygenic risk score indicates a risk, relative to an average population, that the subject will develop breast cancer.
The following description is presented to enable one of ordinary skill in the art to make and use the disclosed subject matter and to incorporate it in the context of applications. Various modifications, as well as a variety of uses in different applications, will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to a wide range of embodiments. Thus, the present disclosure is not intended to be limited to the embodiments presented, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
As used herein, the term “biological sample,” refers to a sample derived from, obtained by, generated from, provided from, take from, or removed from an organism; or from fluid or tissue from the organism. Biological samples include, but are not limited to synovial fluid, whole blood, blood serum, blood plasma, urine, sputum, tissue, saliva, tears, spinal fluid, tissue section(s) obtained by biopsy, cell(s) that are placed in or adapted to tissue culture, sweat, mucous, fecal material, gastric fluid, abdominal fluid, amniotic fluid, cyst fluid, peritoneal fluid, pancreatic juice, breast milk, lung lavage, marrow, gastric acid, bile, semen, pus, aqueous humor, transudate, and the like including derivatives, portions and combinations of the foregoing. In some examples, biological samples include, but are not limited, to blood and/or plasma. In some examples, biological samples include, but are not limited, to urine or stool. Biological samples include, but are not limited, to saliva. Biological samples include, but are not limited, to tissue dissections and tissue biopsies. Biological samples include, but are not limited, samples that can provide nucleic acids for analysis. Biological samples include, but are not limited, any derivative or fraction of the aforementioned biological samples.
As used herein, the term “patient” refers to a human female subject. The methods and uses of the invention described herein are useful to treat a human.
As used herein, the term “Ashkenazi Jew” refers to a population whose recent ancestry over the past millennium traces to Central and Eastern Europe.
As used herein, the term “Caucasian” refers to individuals whose recent ancestry over the past millennium traces to Northern Europe.
As used herein, the term “Northern Europe” is the general term for the geographical region in Europe that is North of the Baltic Sea and includes the British Isles, Greenland, Sweden, Norway, Lithuania, Latvia, Estonia, and Finland.
As used herein, the term “single nucleotide polymorphism” or “SNP” refers to a genetic variation between individuals wherein the variation is a single nitrogenous base position in the DNA of organisms that is variable. In other words, an SNP refers to a polymorphism at a single nucleotide position in a genome where the nucleotide at the specified position varies between individuals or populations.
As used herein, the term, “SNPs” is the plural of SNP.
As used herein, the term “allele frequency” (p) refers to the relative frequency at which an allele is present at a locus within a population expressed as a fraction or percentage. For example, for a given allele “A”, individuals who are diploid may have the following genotypes: “AA”, “Aa” or “aa”. The genotype frequencies for an allele “A” are calculated by multiplying the number of individuals who have the genotypes: “AA”, “Aa” or “aa” by 2, 1, or 0, respectively to determine how many alleles for “A” and “a” exist within the population. The allele frequency is calculated by dividing the total number of alleles “A” in a population by the total number of alleles.
As used herein, a “risk allele frequency” refers to the allele frequency of a risk allele. A risk allele is an allele that is associated with an increased risk of contracting a disease.
As used herein, the term per allele odds ratio (OR) is an odds ratio with respect to each copy of an allele. An allelic OR describes the association between disease and allele by comparing the odds of disease in an individual carrying allele “A” to the odds of disease in an individual carrying allele “a”. An OR of 1.0 means that the DNA variant has no affect on the odds of having the disease, while values above 1.0 indicate a statistical association between that variant and having the disease. OR values below 1 indicate a lower association of disease.
An individual has “triple negative breast cancer” if the individual has breast cancer that tests negative for estrogen receptors, progesterone receptors, and is not overexpressing the HER2 protein.
As used herein, the term “medical procedure” is also synonymous with treatment.
As used herein, the term “treatment” or “treating” means any treatment of a disease or condition in a subject, such as a human female subject, including for example: 1) preventing or protecting against the disease or condition, including, causing the clinical symptoms not to develop; 2) inhibiting the disease or condition, including, arresting or suppressing the development of clinical symptoms; and/or 3) relieving the disease or condition, including, causing the regression or elimination of clinical symptoms. Treating includes administering therapeutic agents to a subject in need thereof.
As used herein, the term “linkage disequilibrium” is the non-random association of alleles at different loci in a given population. Two or more alleles are said to be in linkage equilibrium when they occur randomly in a population. Two or more alleles are in linkage disequilibrium when they do not occur randomly with respect to each other.
As used herein, the term “pairwise r2” indicates the amount of linkage disequilibrium between two SNPs. An r2=1 indicates that the SNPs are in complete linkage disequilibrium.
In this disclosure, methods are presented demonstrating the effectiveness of a PRS, based on the combined effects of 100 SNPs previously reported in multiple large GWAS studies, in predicting breast cancer in high-risk women referred for genetic testing who tested negative for pathogenic or likely pathogenic variants in known breast cancer susceptibility genes.
The disclosure herein sets forth embodiments for performing a medical procedure on a patient based on calculating a polygenic risk score of the patient. The methods herein provide a polygenic risk score, based on a select number of single nucleotide polymorphisms as listed in Table 1, that indicates the potential of developing breast cancer in a patient.
The disclosure herein sets forth embodiments for diagnosing a patient based on calculating a polygenic risk score of the patient. The methods herein provide a polygenic risk score, based on a select number of single nucleotide polymorphisms as listed in Table 1, that indicates the potential of developing breast cancer in a patient.
In some embodiments the minimum number of SNPs in Table 1 used to calculate the PRS are: 50, 55, 60, 65, 70, 75, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100.
In some embodiments, at least 50 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In some embodiments, at least 55 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In some embodiments, at least 60 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In some embodiments, at least 65 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, at least 70 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, at least 75 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, at least 80 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, at least 81 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, at least 82 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, at least 83 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, at least 84 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, at least 85 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, at least 86 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, at least 87 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, at least 88 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, at least 89 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, at least 90 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, at least 91 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, at least 92 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, at least 93 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, at least 94 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, at least 95 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, at least 96 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, at least 97 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, at least 98 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, at least 99 of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS. In other embodiments, all of the single nucleotide polymorphisms as set forth in Table 1 are used to calculate the PRS.
In some embodiments, the 50 SNPs used to calculate the PRS, as set forth in Table 1, are chosen in descending order with respect to the odds ratio. In some embodiments, SNPs with an odds ratio of 1.07 or more, as set forth in Table 1, are selected to calculate the PRS. In some embodiments, SNPs with an odds ratio of 1.08 or more, as set forth in Table 1, are selected to calculate the PRS. In some embodiments, SNPs with an odds ratio of 1.09 or more, as set forth in Table 1, are selected to calculate the PRS. In some embodiments, SNPs with an odds ratio of 1.10 or more, as set forth in Table 1, are selected to calculate the PRS.
In some embodiments, the SNPs of Table 2 may be used as a proxy for the SNPs of Table 1. Table 2 lists SNPs that are within 250 kilobases of the SNPs in Table 1 and have a pairwise r2=1.0. Table 2 also lists SNPs that are present in Table 1 by either 1 (indicating yes) or 0 (indicating no).
In some embodiments, SNPs that are within 50 kilobases of the SNPs in Table 1 may be used as a proxy in the calculation of the PRS. In some embodiments, SNPs that are within 100 kilobases of the SNPs in Table 1 may be used as a proxy in the calculation of the PRS. In some embodiments, SNPs that are within 150 kilobases of the SNPs in Table 1 may be used as a proxy in the calculation of the PRS. In some embodiments, SNPs that are within 200 kilobases of the SNPs in Table 1 may be used as a proxy in the calculation of the PRS. In some embodiments, SNPs that are within 250 kilobases of the SNPs in Table 1 may be used as a proxy in the calculation of the PRS. In some embodiments, SNPs that are within 300 kilobases of the SNPs in Table 1 may be used as a proxy in the calculation of the PRS. In some embodiments, SNPs that are within 350 kilobases of the SNPs in Table 1 may be used as a proxy in the calculation of the PRS. In some embodiments, SNPs that are within 400 kilobases of the SNPs in Table 1 may be used as a proxy in the calculation of the PRS. In some embodiments, SNPs that are within 450 kilobases of the SNPs in Table 1 may be used as a proxy in the calculation of the PRS. In some embodiments, SNPs that are within 500 kilobases of the SNPs in Table 1 may be used as a proxy in the calculation of the PRS.
In some embodiments, SNPs that have a pairwise r2=1.0 with respect to the SNPs in Table 1, may be used as a proxy in the calculation of the PRS. In some embodiments, SNPs that have a pairwise r2=0.9 with respect to the SNPs in Table 1, may be used as a proxy in the calculation of the PRS. In some embodiments, SNPs that have a pairwise r2=0.8 with respect to the SNPs in Table 1, may be used as a proxy in the calculation of the PRS. In some embodiments, SNPs that have a pairwise r2=0.7 with respect to the SNPs in Table 1, may be used as a proxy in the calculation of the PRS. In some embodiments, SNPs that have a pairwise r2=0.6 with respect to the SNPs in Table 1, may be used as a proxy in the calculation of the PRS. In some embodiments, SNPs that have a pairwise r2=0.5 with respect to the SNPs in Table 1, may be used as a proxy in the calculation of the PRS. In some embodiments, SNPs that have a pairwise r2=0.4 with respect to the SNPs in Table 1, may be used as a proxy in the calculation of the PRS. In some embodiments, SNPs that have a pairwise r2=0.3 with respect to the SNPs in Table 1, may be used as a proxy in the calculation of the PRS. In some embodiments, SNPs that have a pairwise r2=0.2 with respect to the SNPs in Table 1, may be used as a proxy in the calculation of the PRS. In some embodiments, SNPs that have a pairwise r2=0.1 with respect to the SNPs in Table 1, may be used as a proxy in the calculation of the PRS.
In some embodiments, the PRS is calculated by a method that comprises: computing an unscaled population risk score according to the equation μ=(1−p)2+2p(1−p)OR+p2OR2, wherein i is unscaled population risk, p is a risk allele frequency, and OR is a per-allele odds ratio for each SNP. Next, calculating the adjusted risk values using p according to: 1/μ, when 0 risk alleles are present, OR/μ, when 1 risk allele is present; OR2/μ, when 2 risk alleles are present; and multiplying together the adjusted risk values for each SNP of the at least 50 SNPs to calculate the PRS for a patient based on the patient's observed genotypes.
In some embodiments, if the PRS score is at least 20% greater than the average population risk, the method of treatment includes physician recommended screenings of patients. In further embodiments, these patient screenings include increased frequency of screenings. In still further embodiments, these screenings include, but are not limited to: mammograms, one or more breast magnetic resonance imaging (MRI) scans, one or more clinical breast exams, ultrasound, and taking one or more additional biological samples for genetic testing. In further embodiments the biological samples taken for additional testing include tissue taken from biopsies and blood samples.
In some embodiments, if the PRS score is at least 20% greater than the average population risk, the method of treatment includes the physician recommending surgeries to the patient to remove breast tissue and includes but is not limited to: a prophylactic mastectomy, a mastectomy, and breast conservation surgery.
In some embodiments, if the PRS score is at least 20% greater than the average population risk, the method of treatment includes the physician recommending drug treatments. The types of drugs prescribed in a treatment includes preventative drugs, such as, but are not limited to: raloxifene hydrochloride and tamoxifen citrate.
In some embodiments, if the PRS score is at least 20% greater than the average population risk, the method of treatment includes the physician recommending drug treatments. The types of drugs prescribed in treatment includes drugs, such as, but are not limited to: Abemaciclib, Ado-Trastuzumab Emtansine, Anastrozole, Capecitabine, Cyclophosphamide, Docetaxel, Doxorubicin Hydrochloride, Epirubicin Hydrochloride, Eribulin Mesylate, Everolimus, Exemestane, Fluorouracil Injection, Fulvestrant, Gemcitabine Hydrochloride, Goserelin Acetate, Ixabepilone, Lapatinib Ditosylate, Letrozole, Megestrol Acetate, Methotrexate, Neratinib Maleate, Olaparib, Paclitaxel, Paclitaxel Albumin-stabilized Nanoparticle Formulation, Palbociclib, Pamidronate Disodium, Pertuzumab, Ribociclib, Tamoxifen Citrate, Thiotepa, Toremifene, Trastuzumab, and Vinblastine Sulfate.
In some embodiments, the medical procedure recommended to the patient, as set forth above, is based on the patient having a polygenic risk score that is at least 30% greater than the average population risk. In other embodiments, the medical procedure recommended to the patient, as set forth above, is based on the patient having a polygenic risk score that is at least 40% greater than the average population risk. In other embodiments, the medical procedure recommended to the patient, as set forth above, is based on the patient having a polygenic risk score that is at least 50% greater than the average population risk. In other embodiments, the medical procedure recommended to the patient, as set forth above, is based on the patient having a polygenic risk score that is at least 60% greater than the average population risk.
In some embodiments, the PRS is combined with a score derived from patient history information to calculate an absolute risk to the patient of developing cancer. In some embodiments, the patient history information includes, but is not limited to: age, sex, breast density, birth control, obesity, alcohol use and family breast cancer history.
In one embodiment the Tyrer-Cuzick model is used. As described in Tyrer et al. 2016, and incorporated in its entirety, the Tyrer-Cuzick model is a breast cancer risk score that includes information provided by patients. The model uses information including, but is not limited to: age, a detailed family history of breast and ovarian cancer in first and second degree relatives with age at onset, prior proliferative benign breast disease or atypical hyperplasia, hormone replacement therapy use, height, weight, age at menopause, and parity including age at first child birth. In further embodiments, the information is taken directly from a patient or obtained from the patient's history file, by either the physician or a third party entity given consent to access the file history in order to calculate the score.
In one embodiments, the PRS score is used to independently verify the Tyrer-Cuzick score when recommending medical procedures to a patient.
In another embodiment, the patient history score derived using the Tyrer-Cuzick model is multiplied together with the PRS to calculate an absolute risk known as the Ambry Combined Score.
In one embodiment, the medical procedure recommended to the patient, as set forth above, is based on an Ambry Combined Score calculating an absolute risk to the patient of developing cancer within their lifetime of at least 20%.
In some embodiments, the medical procedure recommended to the patient, as set forth above, is based on the Ambry Combined Score calculating an absolute risk to the patient of developing cancer within their lifetime of at least 30%. In some embodiments, the medical procedure recommended to the patient, as set forth above, is based on the Ambry Combined Score calculating an absolute risk to the patient of developing cancer within their lifetime of at least 40%. In some embodiments, the medical procedure recommended to the patient, as set forth above, is based on the Ambry Combined Score calculating an absolute risk to the patient of developing cancer within their lifetime of at least 50%. In some embodiments, the medical procedure recommended to the patient, as set forth above, is based on the Ambry Combined Score calculating an absolute risk to the patient of developing cancer within their lifetime of at least 60%.
In some embodiments the SNPs are analyzed using next generation sequencing platforms. In further embodiments, the SNPs are sequenced with commercial next generation sequencing probes. In still further embodiments, SNPs are sequenced with commercial next generation sequencing probes that have been either supplemented or augmented based on an experimenters preference in order to improve the ability to collect data and the efficiency at which it is obtained.
In other embodiments, SNPs are analyzed using a variety of techniques including: SNP microarrays, molecular beacons, dynamic allele-specific hybridization, restriction fragment length polymorphism, PCR-based methods, flap endonuclease, 5′-nuclease assays, primer extension, single strand polymorphism, temperature gradient gel electrophoresis, and denaturing high performance liquid chromatography.
In some embodiments, the PRS is calculated from a woman without a pathogenic or likely pathogenic BRCA-1 and/or BRCA-2 gene.
In some embodiments, the PRS is calculated from a woman without pathogenic or likely pathogenic variants of the genes: ATM, BARD1, BLM, BRIP1, CDH1, CHEK2, FANCC, MRE11A, NBN, NF1, PALB2, PTEN, RAD50, RAD51C, RAD51D, STK11, and TP53.
In some embodiments the patient is a woman of Caucasian, non-Ashkenazi Jewish, descent.
In some embodiments the absolute risk indicates a lifetime risk of developing breast cancer up to age 85.
It will be understood that any embodiments from any aspect, where applicable, can be used in combination with other embodiments.
The following non-limiting methods are provided to further illustrate the embodiments of the invention disclosed herein. It should be appreciated by those of skill in the art that the techniques disclosed in the examples that follow represent approaches that have been found to function well in the practice of several embodiments of the invention, and thus be considered to constitute examples of modes for its practice. However, those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments that are disclosed and still obtain a like or similar result without departing from the spirit and the scope of the invention.
A total of 100 SNPs were identified from genome wide association studies presented in the literature as set forth in Table 1. These SNPs were chosen to be used in calculating a polygenic risk score. SNPs from individuals or populations from non-Caucasian and Ashkenazi Jewish descent were excluded from Table 1. Additionally, the SNPs listed in Table 1 were chosen because of they had p-values that were less than or equal to 5×104.
Women were included in the study sample if they were: female, self-reported Caucasian, of non-Ashkenazi Jewish descent, between 18 to 84 years of age at the time of testing, and provided information regarding family history to ordering clinicians.
Women who tested positive for a pathogenic or likely pathogenic with regards to a breast cancer-susceptibility gene (ATM, BARD1, BLM, BRCA1, BRCA2, BRIP1, CDH1, CHEK2, FANCC, MRE11A, NBN, NF1, PALB2, PTEN, RAD50, RAD51C, RAD51D, STK11, TP53) were excluded.
Cases were identified as those with a personal history of breast cancer, and were excluded if clinical history included other cancer primaries. Controls were unaffected with any cancer (not including basal or squamous cell carcinoma); those with a first- or second-degree relative with breast or ovarian cancer were further excluded from analysis.
Biological samples taken from patients were analyzed by using next generation sequencing molecular analysis was performed using Illumina's NextSeq 500 system.
Sequencing quality for Illumina NextSeq 500 are monitored during the sequencing run, and include visualization of Intensity-vs-Cycle (IVC) plots, and cluster intensity over the duration of the run. Other quality metrics that are evaluated for the entire sequencing run upon completion of sequencing and demultiplexing of the samples include metrics for the % Perfect Index Reads, % of ≥Q30 Bases, and overall Mean Quality Score.
Samples passing the sequencing quality metrics were fed into a proprietary next generation sequencing data processing pipeline in a parallelized fashion, starting with alignment of sequencing reads to human reference genome build (GRCh37/hg19), followed by variant and genotype calling on the panel genes and the 100 breast cancer-associated SNP positions. Additionally, next generation sequencing coverage is evaluated for all 100 breast cancer associated SNPs for every sample, and any SNPs with no or low coverage (<20×) were excluded from genotype calling, and were not included in downstream statistical analysis.
Next generation sequencing data were examined to assess missing rates for each sample, and each SNP. Samples were excluded if greater than 10 SNPs were missing due to bioinformatics quality control thresholds (n=12; 0.4% of samples). SNP calls were checked for consistency with publically available databases (GRCh37/hg19; Ensembl release 91 {Zerbino et. al.}) and literature-reported reference and risk alleles. SNP allele frequencies were compared among control subjects to those available in the 1000 Genomes EUR population to ensure consistency with the reference population. Hardy Weinberg Equilibrium (HWE) was assessed for all SNPs among controls using R package Hardy-Weinberg (Graffelman et al.).
To assess the assumption of SNP effects consistent with a log additive model, all possible pair-wise SNP*SNP interactions were examined using logistic regression, with a Dickey-Fuller test for the interaction and breast cancer as the outcome. Additional tests were performed for higher-order SNP interactions using logic regression.
Using an approach consistent with prior literature (Dite et al., Mealiffe et al., Cuzick et al., Allman et al.), an SNP-based population-standardized PRS is computed for each patient. Using previously published estimates of the per-allele odds ratio (OR) and risk allele frequency (p) for each SNP, and assuming independent and additive risks on the log OR scale, the unscaled population average risk was calculated as:
μ=(1−p)2+2p(1−p)OR+p2OR2 (Equation 1)
Adjusted risk values were then calculated as:
for the 3 genotypes defined by the number of risk alleles: 0, 1 or 2, respectively. Missing genotypes were assigned a population average risk of 1.0. Adjusted risk values for each SNP were multiplied to compute the overall PRS-associated risk for each individual based on their observed genotypes.
Logistic regression models were used to estimate the ORs for breast cancer by quartile of the PRS, with the 1st quartile category (<25th percentile) as the reference.
The performance of the PRS in predicting breast cancer cases was examined by receiver operating curves (ROC). The area under a receiver operating curve (AUROC) is a graphical way to show the ability of a test's discriminative ability of how good the test in a given clinical situation is. The closer the AUROC is to 1, the better the discriminative ability of the test.
The AUROC was computed using the R package pROC (Robin et al.). R (v.3.3.3) was used for all statistical analyses; all statistical tests were two sided, and p-values <0.05 were considered nominally statistically significant.
A total of 3,020 patient samples (1,772 breast cancer cases and 1,248 controls) underwent next generation sequencing. After assessment of quality control and inclusion/exclusion criteria, data from 1,689 breast cancer cases and 1,160 controls were available for analysis. The mean age and standard deviation (mean±SD) at testing for cases and controls was 55.7±11.3 and 47.5±12.9 years, respectively.
Among cases, the mean±SD age at first diagnosis of breast cancer was 51.0±10.9 years. While 92.0% had at least one close relative (1st, 2nd or 3rd degree) with cancer, 74.8% had a close relative, and 39.7% had at least one first degree relative with breast and/or ovarian cancer. Approximately 21.8% of cases were estrogen receptor negative, and 14.0% had triple negative breast cancer.
The mean±SD SNP call rate, or the proportion of individuals for whom a genotype was successfully determined for a given SNP, was 99.7%/1.1% (range 92.2% to 100.0%). SNP risk allele frequencies (RAF) among controls ranged from 0.8% to 93.5%, and were consistent with the 1000 Genomes non-Finnish EUR population (range: 1.0% to 93.3%; mean±SD absolute difference among SNPs: 0.5%/2.5%, p=0.05).
One SNP was monomorphic in both cases and controls (RAF=0%), as observed in the 1000 Genomes non-Finnish EUR population; the Finnish population carries the risk allele with a frequency of 2.5%, and a frequency of 0.7% has been reported among controls in the literature (Michailidou et al.). Consistent with the findings of previous studies (Mavaddat et al., Mealiffe et al., Milne et al.), there was little to no significant pairwise or high-order interactions among the SNPs after Bonferroni or false discovery rate correction for multiple testing.
The sum of the risk alleles across the 100 SNPs was approximately normally distributed among cases and controls, and ranged from 75 to 119 and 73 to 111, respectively (mean±SD risk allele count: 95.3±6.5 vs. 93.1±6.7, p<0.0001;
The area under the receiver operating characteristic curve (AUROC) was used to compare discrimination of the models. A maximum AUROC for PRS discrimination of cases and controls was reached at a threshold of 0.83, corresponding to a positive predictive value (PPV) equal to 0.67 and negative predictive value (NPV) equal to 0.50 (AUROC=0.61, 95% CI: 0.59-0.63;
The results show that overall, the OR per standard deviation reported by this disclosure for the 100-SNP PRS is similar to results obtained from Dite et al. and Shieh et al. Dite et al. reported an OR per standard deviation of the PRS of 1.46 (95% CI: 1.29-1.64). Shieh et al. observed unadjusted ORs for breast cancer of 1.34 (95% CI: 0.90-2.00), 1.76 (95% CI: 1.18-2.62) and 2.54 (95% CI: 1.69-3.82) for the 2nd, 3rd and 4th quartile of PRS compared to the 1st quartile (Shieh et al.). Further, the results also show the validity of the disclosed PRS in predicting breast cancer as demonstrated by a AUROC greater than 0.5. This is consistent with prior reports where AUROC ranged 0.55-0.68 (Mavaddat et al., Dite et al., Mealiffe et al., Shieh et al., Li et al., Sawyer et al., Allman et al., Vachon et al.). The PRS presented in this disclosure therefore has demonstrable performance regarding its ability to predict breast cancer.