Age-related macular degeneration (AMD) is the most common geriatric eye disorder leading to blindness. Macular degeneration is responsible for visual handicap in what is estimated conservatively to be approximately 16 million individuals worldwide. Among the elderly, the overall prevalence is estimated between 5.7% and 30% depending on the definition of early AMD, and its differentiation from features of normal aging, a distinction that remains poorly understood.
Histopathologically, the hallmark of early neovascular AMD is accumulation of extracellular drusen and basal laminar deposit (abnormal material located between the plasma membrane and basal lamina of the retinal pigment epithelium) and basal linear deposit (material located between the basal lamina of the retinal pigment epithelium and the inner collageneous zone of Bruch's membrane). The end stage of AMD is characterized by a complete degeneration of the neurosensory retina and of the underlying retinal pigment epithelium in the macular area. Advanced stages of AMD can be subdivided into geographic atrophy and exudative AMD. Geographic atrophy is characterized by progressive atrophy of the retinal pigment epithelium. In exudative AMD the key phenomenon is the occurrence of choroidal neovascularisation (CNV). Eyes with CNV have varying degrees of reduced visual acuity, depending on location, size, type and age of the neovascular lesion. The development of choroidal neovascular membranes can be considered a late complication in the natural course of the disease possibly due to tissue disruption (Bruch's membrane) and decompensation of the underlying longstanding processes of AMD.
Many pathophysiological aspects as well as vascular and environmental risk factors are associated with a progression of the disease. Family, twin, segregation, and case-control studies all suggested an involvement of genetic factors in the etiology of AMD prior to the discovery of various genes associated with AMD.
Knowledge is growing about the extent of heritability, number of genes involved, and mechanisms underlying phenotypic heterogeneity. The search for genes and markers related to AMD faces challenges—onset is late in life, and there is usually only one generation available for studies. The parents of patients are often deceased, and their children are too young to manifest the disease. Generally, the heredity of late-onset diseases has been difficult to estimate because of the uncertainties of the diagnosis in previous generations and the inability to diagnose AMD among the children of an affected individual. Even in the absence of the ambiguities in the diagnosis of AMD in previous generations, the late onset of the condition itself, natural death rates, and small family sizes result in underestimation of genetic forms of AMD, and in overestimation of rates of sporadic disease. Moreover, the phenotypic variability is considerable, and it is conceivable that the currently used diagnostic entity of AMD in fact represents a spectrum of underlying conditions with various genetic and environmental factors involved.
There remains a strong need for improved methods of diagnosing or prognosticating AMD or a susceptibility to AMD in subjects, as well as for evaluating and developing new methods of treatment.
In an aspect of the present invention, a method is provided for determining a risk of age-related macular degeneration (AMD) in a human patient, the method including detecting in a sample from the human patient the presence of polymorphism rs147859257, and correlating the presence of polymorphism rs147859257 to an increased risk of AMD in the human patient.
In another aspect of the invention, a method is provided for determining a risk of age-related macular degeneration (AMD) in a human patient, the method including detecting in a sample from the human patient the presence of polymorphism rs34882957, and correlating the presence of polymorphism rs34882957 to an increased risk of AMD in the human patient.
In still another aspect of the invention, a method is provided for determining a risk of age-related macular degeneration (AMD) in a human patient, the method including detecting in a sample from the human patient the presence of a polymorphism listed in Supplementary Table 2, and correlating the presence of the polymorphism to AMD risk in the human patient.
The features and advantages described herein are not all-inclusive and, in particular, many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and not to limit the scope of the inventive subject matter.
The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:
Supplementary
Supplementary
Supplementary
Supplementary
Supplementary
Supplementary
Supplementary
Supplementary
Supplementary
Supplementary
Supplementary
The present invention relates, in part, to the discovery that particular alleles or variants at polymorphic sites associated with genes, including complement pathway genes C3, C9, and CFI are useful as markers for AMD etiology, for determining susceptibility to AMD, and for predicting or monitoring disease progression or severity, e.g., to determine a treatment course and/or to titrate dosages of therapeutic agents. More specifically, methods are provided for determining a risk of an individual developing AMD or progressing to advanced forms of AMD (e.g., geographic atrophy and/or wet AMD) using these genetic markers. More specifically, and by non-limiting example, the single nucleotide polymorphisms (SNPs) rs147859257 in the C3 gene (p=5.2×10̂-9, OR=3.8) and rs34882957 in the C9 gene (p=6.5×10̂-7, OR=2.2) can be used as markers for AMD etiology, for determining susceptibility to AMD, and for predicting disease progression or severity, and for distinguishing risk of geographic atrophy, the advanced dry type of AMD from the advanced wet form of AMD (See Supplementary
For example, in one aspect, the invention provides a method of screening for age-related macular degeneration (AMD) in a human subject. The method can include determining a risk of AMD progression in the subject by analyzing a sample obtained from the subject for the presence in the subject's genome of at least one single nucleotide polymorphism (SNP) identified in Supplementary Tables 2 or 3, or in Table 1, or a proxy therefor. In some embodiments, a proxy is a marker that is in linkage disequilibrium with a particular SNP or marker of interest. The presence of a SNP indicates that the subject has an increased risk of developing AMD or developing an advanced form of AMD. The markers can be used individually or in combination when screening a subject. Preferred SNPs include, but are not limited to, rs147859257 (K155Q variant in C3) and rs34882957 (P167S variant in C9). In some embodiments, the presence of a particular SNP indicates the subject has an increased risk of developing AMD. In some embodiments, the presence of a particular SNP indicates the subject has an increased risk of developing an advanced form of AMD, such as geographic atrophy and/or wet AMD, which also is referred to as neovascular disease, choroidal neovascularisation (CNV), and exudative AMD.
Various techniques can be used for analyzing a sample to determine the presence of a SNP in the subject's genome. For example, in some embodiments, the method of screening can include the steps of (i) combining a nucleic acid sample from the subject with one or more polynucleotide probes capable of hybridizing selectively to a particular SNP (e.g., any SNP identified in Supplementary Tables 2 or 3 and Table 1) or gene allele, or a proxy therefor, and (ii) detecting the presence or absence of hybridization. The probes can be oligonucleotides capable of priming polynucleotide synthesis in an amplification reaction, such as PCR or real time PCR. In some embodiments, the presence of at least one SNP is determined using a microarray. In some embodiments, the presence of at least one SNP is determined using an antibody. In various embodiments, the presence of at least one SNP is determined by sequencing a portion of the patient's genome.
In various embodiments, methods are provided for determining risk of AMD or severe forms of AMD in a human patient, the method comprising detecting in a sample from the patient who is determined to be at risk for developing age-related macular degeneration due to one or more patient specific risk factors wherein the one or more patient specific risk factors is genetic, environmental/behavioral, or demographic.
In some embodiments, the patient is asymptomatic at the time of screening for AMD, and in some embodiments, the patient displays one or more AMD like symptoms at the time of screening. In some embodiments, the sample is from a patient predetermined to be at risk for AMD based on one or more patient specific factors, such as environmental/behavioral, demographic, or genetic factors. Behavioral and environmental factors include, for example, obesity, body mass index, smoking, vitamin intake, antioxidant intake, mineral intake, dietary supplement intake, use of alcohol or drugs, poor diet, a sedentary lifestyle, medical history of heart disease or other vascular disease, and medical history of kidney or liver disease. In a particular embodiment, elevated BMI is used to determine obesity. Demographic factors can include age, sex, education level, income level, marital status, occupation, religion, birth rate, death rate, average size of a family, average age at marriage. Genetic factors can include, for example, family history of AMD, presentation of AMD symptoms, and/or detection of one or more AMD risk alleles in the patient.
In some embodiments, the method includes detecting a haplotypes that includes a particular SNP (e.g., any SNP listed in Supplementary Tables 2 and 3 and Table 1).
In some embodiments, the method includes screening for a specific subtype of AMD, such as, for example, early AMD, geographic atrophy, wet AMD, neovascular disease, choroidal neovascularisation (CNV), exudative AMD, and combinations thereof.
The invention also provides, in part, a diagnostic system. The diagnostic system can include an array of polynucleotides comprising one or more of reference sequences corresponding to the SNPs identified in Supplementary Tables 2 and 3 and Table 1. The polynucleotides can include at least six or more contiguous nucleotides, and the polynucleotides can include an allelic polymorphism or SNP. The system also can include an array reader, an image processor, a database having AMD allelic data records and patient information records, a processor, and an information output. The system compiles and processes patient data and outputs information relating to the statistical probability of the patient developing AMD.
The system can be used for various methods, including contacting a subject sample or portion thereof to the diagnostic array under high stringency hybridization conditions; inputting patient information into the system; and obtaining from the system information relating to the statistical probability of the patient developing AMD.
Further provided are methods for diagnosing risk of AMD or severe forms of AMD in a human subject. The method includes combining genetic risk with behavioral risk or environmental risk, wherein the genetic risk is determined by detecting in a sample obtained from a subject the presence or absence of a single nucleotide polymorphism SNP listed in Supplementary Tables 2 and 3 and Table 1, or proxy therefor, wherein the presence of the allele is indicative of an increased risk of the subject developing AMD or a severe form of AMD.
In one embodiment, the present invention is directed to a method for determining AMD or a susceptibility to AMD in a subject comprising combining genetic risk with behavioral risk, wherein the genetic risk is determined by detecting the presence or absence of a particular allele at a polymorphic site associated with a complement pathway gene, wherein the allele is indicative of AMD or a susceptibility to AMD. In a particular embodiment, the presence or absence of a particular allele is detected by a hybridization assay. In a particular embodiment, the presence or absence of a particular allele is determined using a microarray. In a particular embodiment, the presence or absence of a particular allele is determined using an antibody. In a particular embodiment, the presence or absence of a particular allele is determined by sequencing.
As used herein, “gene” is a term used to describe a genetic element that gives rise to expression products (e.g., pre-mRNA, mRNA and polypeptides). A gene can include regulatory elements, exons and sequences that otherwise appear to have only structural features, e.g., introns and untranslated regions.
The genetic markers disclosed herein are particular “alleles” at “polymorphic sites” associated with various genes, including C3, C9, and any markers identified in Supplementary Tables 2 and 3 and Table 1. A nucleotide position at which more than one nucleotide can be present in a population (either a natural population or a synthetic population, e.g., a library of synthetic molecules), is referred to herein as a “polymorphic site”. Where a polymorphic site is a single nucleotide in length, the site is referred to as a single nucleotide polymorphism (“SNP”). If at a particular chromosomal location, for example, one member of a population has an adenine and another member of the population has a thymine at the same genomic position, then this position is a polymorphic site, and, more specifically, the polymorphic site is a SNP. Polymorphic sites can allow for differences in sequences based on substitutions, insertions or deletions. Each version of the sequence with respect to the polymorphic site is referred to herein as an “allele” of the polymorphic site. Thus, in the previous example, the SNP allows for both an adenine allele and a thymine allele.
A genetic marker is “associated” with a genetic element or phenotypic trait, for example, if the marker is co-present with the genetic element or phenotypic trait at a frequency that is higher than would be predicted by random assortment of alleles (based on the allele frequencies of the particular population). Association also indicates physical association, e.g., proximity in the genome or presence in a haplotype block, of a marker and a genetic element.
A reference sequence is typically referred to for a particular genetic element, e.g., a gene. The reference sequence, often chosen as the most frequently occurring allele, is referred to as a “wild type” allele or the “major allele”). Alleles that are more common or less common in individuals with a disease/trait compared to individuals without the disease/trait, with a certain level of statistical significance, are referred to as the variant alleles. The corresponding genotype is referred to as a genetic variant.
Some variant alleles can include changes that affect a polypeptide or protein, e.g., the polypeptide encoded by a variant allele. These sequence differences, when compared to a reference nucleotide sequence, can include, for example, the insertion or deletion of a single nucleotide, or of more than one nucleotide, resulting in a frame shift; the change of at least one nucleotide, resulting in a change in the encoded amino acid; the change of at least one nucleotide, resulting in the generation of a premature stop codon; the deletion of several nucleotides, resulting in a deletion of one or more amino acids encoded by the nucleotides; the insertion of one or several nucleotides, such as by unequal recombination or gene conversion, resulting in an interruption of the coding sequence of a reading frame; duplication of all or a part of a sequence; transposition; or a rearrangement of a nucleotide sequence.
Alternatively, a polymorphism associated with AMD or a susceptibility to AMD can be a synonymous change in one or more nucleotides (i.e., a change that does not result in a change to a codon of a complement pathway gene). Such a polymorphism can, for example, alter splice sites, affect the stability or transport of mRNA, or otherwise affect the transcription or translation of the polypeptide. The polypeptide encoded by the reference nucleotide sequence is the “reference” polypeptide with a particular reference amino acid sequence, and -polypeptides encoded by variant alleles are referred to as “variant” polypeptides with variant amino acid sequences.
A haplotype is a combination or set of genetic markers, e.g., particular alleles at polymorphic sites, such as, e.g., SNPs and/or microsatellites. The haplotypes described herein are associated with AMD and/or a susceptibility to AMD. Detection of the presence or absence of the haplotypes herein, therefore, is indicative of AMD, is indicative of a susceptibility to AMD, is indicative of a factor related to progression from early to intermediate or late stages of AMD, is indicative of progression from intermediate to late stages of AMD, or is indicative of a lack of AMD. Detecting haplotypes, therefore, can be accomplished by methods known in the art for detecting sequences at polymorphic sites.
“Linkage” refers to a higher than expected statistical association of genotypes and/or phenotypes with each other. Linkage disequilibrium (“LD”) refers to a non-random assortment of two genetic elements. If a particular genetic element (e.g., an allele at a polymorphic site), for example, occurs in a population at a frequency of 0.25 and another occurs at a frequency of 0.25, then the predicted occurrence of a person's having both elements is 0.125, assuming a random distribution of the elements. If, however, it is discovered that the two elements occur together at a frequency higher than 0.125, then the elements are said to be in LD since they tend to be inherited together at a higher frequency than what their independent allele frequencies would predict. Roughly speaking, LD is generally correlated with the frequency of recombination events between the two elements. Allele frequencies can be determined in a population, for example, by genotyping individuals in a population and determining the occurrence of each allele in the population. For populations of diploid individuals, e.g., human populations, individuals will typically have two alleles for each genetic element (e.g., a marker or gene).
The invention is also directed to markers identified in a “haplotype block” or “LD block”. These blocks are defined either by their physical proximity to a genetic element, e.g., a C3, C9, CFI, or the other markers provided herein, or by their “genetic distance” from the element. Markers and haplotypes identified in these blocks, because of their association with AMD and C3, C9, CFI, or the markers identified herein, are encompassed by the invention. One of skill in the art will appreciate regions of chromosomes that recombine infrequently and regions of chromosomes that are “hotspots”, e.g., exhibiting frequent recombination events, are descriptive of LD blocks. Regions of infrequent recombination events bounded by hotspots will form a block that will be maintained during cell division. Thus, identification of a marker associated with a phenotype, wherein the marker is contained within an LD block, identifies the block as associated with the phenotype. Any marker identified within the block can therefore be used to indicate the phenotype.
Additional markers that are in LD with the markers of the invention or haplotypes are referred to herein as “surrogate” markers (i.e., “proxy” markers). Such a surrogate is a marker for another marker or another surrogate marker. Surrogate markers are themselves markers and are indicative of the presence of another marker, which is in turn indicative of either another marker or an associated phenotype.
Susceptibility for developing AMD includes an asymptomatic patient showing increased risk to develop AMD, and a patient having early or intermediate stages of AMD indicating a progression toward more advanced forms of AMD and expected visual loss. Susceptibility for not developing AMD includes an asymptomatic patient having at least one wild type allele, or a non-risk genotype, or a protective genotype, or a non-risk allele, or a protective allele, or a non-risk haplotype, or a protective haplotype indicates a lack of a predisposition for developing AMD.
Genetic markers (e.g., SNPs) can be detected in nucleic acids (e.g., DNA or mRNA) in any suitable sample source obtained or taken from an individual, including blood, saliva, feces, bone, epithelial cells, endothelial cells, blood cells, and other bodily fluids, cells, and/or tissues.
In one aspect, the invention provides a method of determining a subject's, for example, a human subject's, risk of developing age-related macular degeneration. The method comprises determining whether the subject has a risk variant at a polymorphic site of the C3 gene, wherein, if the subject has at least one risk variant, the subject is more likely to develop age-related macular degeneration than a person without the risk variant. One exemplary risk variant is at a SNP, rs147859257 (or K155Q) in C3.
In another aspect, the invention provides a method of determining a subject's, for example, a human subject's, risk of developing age-related macular degeneration. The method comprises determining whether the subject has a risk variant at a polymorphic site of the C9 gene, wherein, if the subject has at least one risk variant, the subject is more likely to develop age-related macular degeneration than a person without the risk variant. One exemplary risk variant is at a SNP, rs34882957 (or P167S) in C9.
In one aspect, the invention provides a method of determining a subject's, for example, a human subject's, risk of developing age-related macular degeneration. The method comprises determining whether the subject has a risk or protective variant at a polymorphic site of any of the genes listed in Supplementary Tables 2 or 3, wherein, if the subject has at least one protective variant, the subject is less likely to develop age-related macular degeneration than a person without the protective variant, and wherein, if the subject has at least one risk variant, the subject is more likely to develop age-related macular degeneration than a person without the risk variant.
The presence of a protective and/or risk variant can be determined by standard nucleic acid detection assays including, for example, conventional SNP detection assays, which may include, for example, amplification-based assays, probe hybridization assays, restriction fragment length polymorphism assays, and/or direct nucleic acid sequencing.
In one aspect, the invention comprises an array of gene fragments, particularly nucleic acids including one or more reference sequences identified in Supplementary Table 3 and Table 1 and probes for detecting the allele at the SNPs of one or more reference sequences identified in Supplementary Tables 2 or 3 and Table 1. Polynucleotide arrays provide a high throughput technique that can assay a large number of polynucleotide sequences in a single sample. This technology can be used, for example, as a diagnostic tool to assess the risk potential of developing AMD using the SNPs and probes of the invention. Polynucleotide arrays (for example, DNA or RNA arrays), include regions of usually different sequence polynucleotides arranged in a predetermined configuration on a substrate, at defined x and y coordinates. These regions (sometimes referenced as “features”) are positioned at respective locations (“addresses”) on the substrate. The arrays, when exposed to a sample, will exhibit an observed binding pattern. This binding pattern can be detected upon interrogating the array. For example, all polynucleotide targets (for example, DNA) in the sample can be labeled with a suitable label (such as a fluorescent compound), and the fluorescence pattern on the array accurately observed following exposure to the sample. Assuming that the different sequence polynucleotides were correctly deposited in accordance with the predetermined configuration, then the observed binding pattern will be indicative of the presence and/or concentration of one or more polynucleotide components of the sample.
Arrays can be fabricated by depositing previously obtained biopolymers onto a substrate, or by in situ synthesis methods. The substrate can be any supporting material to which polynucleotide probes can be attached, including but not limited to glass, nitrocellulose, silicon, and nylon. Polynucleotides can be bound to the substrate by either covalent bonds or by non-specific interactions, such as hydrophobic interactions. The in situ fabrication methods include those described in U.S. Pat. No. 5,449,754 for synthesizing peptide arrays, and in U.S. Pat. No. 6,180,351 and WO 98/41531 and the references cited therein for synthesizing polynucleotide arrays. Further details of fabricating biopolymer arrays are described in U.S. Pat. No. 6,242,266; U.S. Pat. No. 6,232,072; U.S. Pat. No. 6,180,351; U.S. Pat. No. 6,171,797; EP No. 0 799 897; PCT No. WO 97/29212; PCT No. WO 97/27317; EP No. 0 785 280; PCT No. WO 97/02357; U.S. Pat. Nos. 5,593,839; 5,578,832; EP No. 0 728 520; U.S. Pat. No. 5,599,695; EP No. 0 721 016; U.S. Pat. No. 5,556,752; PCT No. WO 95/22058; and U.S. Pat. No. 5,631,734. Other techniques for fabricating biopolymer arrays include known light directed synthesis techniques. Commercially available polynucleotide arrays, such as Affymetrix GeneChip™, can also be used. Use of the GeneChip™, to detect gene expression is described, for example, in Lockhart et al., Nat. Biotechnol., 14: 1675, 1996; Chee et al., Science, 274:610, 1996; Hacia et al, Nat. Gen., 14:441, 1996; and Kozal et al., Nat. Med., 2:753, 1996. Other types of arrays are known in the art, and are sufficient for developing an AMD diagnostic array of the present invention.
To create the arrays, single-stranded polynucleotide probes can be spotted onto a substrate in a two-dimensional matrix or array. Each single-stranded polynucleotide probe can comprise at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, or 30 or more contiguous nucleotides selected from the nucleotide sequences including the SNPs identified in Supplementary Table 3 and Table 1, or the complement thereof. Preferred arrays comprise at least one single-stranded polynucleotide probe comprising at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, or 30 or more contiguous nucleotides selected from the nucleotide sequences including the SNPs identified in Supplementary Tables 2 or 3, or Table 1, or the complement thereof.
Tissue samples from a subject can be treated to form single-stranded polynucleotides, for example by heating or by chemical denaturation, as is known in the art. The single-stranded polynucleotides in the tissue sample can then be labeled and hybridized to the polynucleotide probes on the array. Detectable labels that can be used include but are not limited to radiolabels, biotinylated labels, fluorophors, and chemiluminescent labels. Double stranded polynucleotides, comprising the labeled sample polynucleotides bound to polynucleotide probes, can be detected once the unbound portion of the sample is washed away. Detection can be visual or with computer assistance. Preferably, after the array has been exposed to a sample, the array is read with a reading apparatus (such as an array “scanner”) that detects the signals (such as a fluorescence pattern) from the array features. Such a reader preferably would have a very fine resolution (for example, in the range of five to twenty microns) for an array .having closely spaced features.
The signal image resulting from reading the array can then be digitally processed to evaluate which regions (pixels) of read data belong to a given feature as well as to calculate the total signal strength associated with each of the features. The foregoing steps, separately or collectively, are referred to as “feature extraction” (U.S. Pat. No. 7,206,438). Using any of the feature extraction techniques so described, detection of hybridization of a patient derived polynucleotide sample with one of the AMD markers on the array given as the nucleotide sequences including the SNPs identified in Supplementary Tables 2 or 3, or Table 1 identifies that subject as having or not having a genetic risk factor for AMD, as described above.
In another aspect, the invention provides a system for compiling and processing patient data, and presenting a risk profile for developing AMD or for the progression to late stages. A computer aided medical data exchange system is preferred. The system is designed to provide high-quality medical care to a patient by facilitating the management of data available to care providers. The care providers will typically include physicians, surgeons, nurses, clinicians, various specialists, and so forth. It should be noted, however, that while general reference is made to a clinician in the present context, the care providers may also include clerical staff, insurance companies, teachers and students, and so forth. The system provides an interface, which allows the clinicians to exchange data with a data processing system. The data processing system is linked to an integrated knowledge base and a database.
The database may be software-based, and includes data access tools for drawing information from the various resources as described below, or coordinating or translating the access of such information. In general, the database will unify raw data into a useable form. Any suitable form may be employed, and multiple forms may be employed, where desired, including hypertext markup language (HTML) extended markup language (XML), Digital Imaging and Communications in Medicine (DICOM), Health Level Seven™ (HL7), and so forth. In the present context, the integrated knowledge base is considered to include any and all types of available medical data that can be processed by the data processing system and made available to the clinicians for providing the desired medical care. In general, data within the resources and knowledge base are digitized and stored to make the data available for extraction and analysis by the database and the data processing system. Even where more conventional data gathering resources are employed, the data is placed in a form that permits it to be identified and manipulated in the various types of analyses performed by the data processing system.
The integrated knowledge base is intended to include one or more repositories of medical-related data in a broad sense, as well as interfaces and translators between the repositories, and processing capabilities for carrying out desired operations on the data, including analysis, diagnosis, reporting, display and other functions. The data itself may relate to patient-specific characteristics as well as to non-patient specific information, as for classes of persons, machines, systems and so forth. Moreover, the repositories may include devoted systems for storing the data, or memory devices that are part of disparate systems, such as imaging systems. As noted above, the repositories and processing resources making up the integrated knowledge base may be expandable and may be physically resident at any number of locations, typically linked by dedicated or open network links. Furthermore, the data contained in the integrated knowledge base may include both clinical data (e.g., data relating specifically to a patient condition) and non-clinical data. Examples of preferred clinical data include patient medical histories, patient serum, plasma, and/or other biomarkers such as blood levels of certain other nutrients, fats, female and male hormones, etc., and cellular antioxidant levels, and the identification of past or current environmental, lifestyle and other factors that predispose a patient to develop AMD. These include but are not limited to various risk factors such as obesity, smoking, vitamin and dietary supplement intake, use of alcohol or drugs, poor diet, a sedentary lifestyle, medical history of heart disease or other vascular disease, and/or medical history of kidney or liver disease. Non-clinical data may include more general information about the patient, such as residential address, data relating to an insurance carrier, and names and addresses or phone numbers of significant or recent practitioners who have seen or cared for the patient, including primary care physicians, specialists, and so forth.
The flow of information can include a wide range of types and vehicles for information exchange. In general, the patient can interface with clinicians through conventional clinical visits, as well as remotely by telephone, electronic mail, forms, and so forth. The patient can also interact with elements of the resources via a range of patient data acquisition interfaces that can include conventional patient history forms, interfaces for imaging systems, systems for collecting and analyzing tissue samples, body fluids, and so forth. Interaction between the clinicians and the interface can take any suitable form, depending upon the nature of the interface. Thus, the clinicians can interact with the data processing system through conventional input devices such as keyboards, computer mice, touch screens, portable or remote input and reporting devices. The links between the interface, data processing system, the knowledge base, the database and the resources typically include computer data exchange interconnections, network connections, local area networks, wide area networks, dedicated networks, virtual private network, and so forth.
In general, the resources can be patient-specific or patient-related, that is, collected from direct access either physically or remotely (e.g., via computer link) from a patient. The resource data can also be population-specific so as to permit analysis of specific patient risks and conditions based upon comparisons to known population characteristics. It should be noted that the resources can generally be thought of as processes for generating data. While many of the systems and resources will themselves contain data, these resources are controllable and can be prescribed to the extent that they can be used to generate data as needed for appropriate treatment of the patient. Exemplary controllable and prescribable resources include, for example, a variety of data collection systems designed to detect physiological parameters of patients based upon sensed signals. Such electrical resources can include, for example, electroencephalography resources (EEG), electrocardiography resources (ECG), electromyography resources (EMG), electrical impedance tomography resources (EIT), nerve conduction test resources, electronystagmography resources (ENG), and combinations of such resources. Various imaging resources also can be controlled and prescribed as necessary. Exemplary eye tests include, for example, electrophysiologic tests, elcetroretinograms, electrooculagrams, retinal angiography, retinal photography, ultrasonography, optical coherence tomography, and other imaging modalities such as autofluorescence. A number of modalities of such resources are currently available, such as, for example, X-ray imaging systems, magnetic resonance (MR) imaging systems, computed tomography (CT) imaging systems, positron emission tomography (PET) systems, fluorography systems, sonography systems, infrared imaging systems, nuclear imaging systems, thermoacoustic systems, and so forth. Imaging systems can draw information from other imaging systems, electrical resources can interface with imaging systems for direct exchange of information (such as for timing or coordination of image data generation, and so forth).
In addition to such electrical and highly automated systems, various resources of a clinical and laboratory nature can be accessible. Such resources may include blood, urine, saliva and other fluid analysis resources, including gastrointestinal, reproductive, urological, nephrological (kidney function), and cerebrospinal fluid analysis system. Such resources can further include polymerase (PCR) chain reaction analysis systems, genetic marker analysis systems, radioimmunoassay systems, chromatography and similar chemical analysis systems, receptor assay systems and combinations of such systems. Histologic resources, somewhat similarly, can be included, such as tissue analysis systems, cytology and tissue typing systems and so forth. Other histologic resources can include immunocytochemistry and histopathological analysis systems. Similarly, electron and other microscopy systems, in situ hybridization systems, and so forth can constitute the exemplary histologic resources. Pharmacokinetic resources can include such systems as therapeutic drug monitoring systems, receptor characterization and measurement systems, and so forth. Again, while such data exchange can be thought of passing through the data processing system, direct exchange between the various resources can also be implemented.
Use of the present system involves a clinician obtaining a patient sample, and evaluation of the presence of a genetic marker in that patient indicating a predisposition (or not) for AMD or its progression, such as one or more of the nucleotide sequences including the SNPs identified in Supplementary Table 3 and Table 1 alone or in combination with other known risk factors. The clinician or their assistant also obtains appropriate clinical and non-clinical patient information, and inputs it into the system. The system then compiles and processes the data, and provides output information that includes a risk profile for the patient, of developing AMD and/or progressing to advanced forms of AMD.
The present invention thus provides for certain polynucleotide sequences that have been correlated to AMD. These polynucleotides are useful as diagnostics, and are preferably used to fabricate an array, useful for screening patient samples. The array, in a currently most preferred embodiment, is used as part of a laboratory information management system, to store and process additional patient information in addition to the patient's genomic profile. As described herein, the system provides an assessment of the patient's risk for developing AMD, risk for disease progression, and likelihood of disease prevention based on patient controllable factors.
The invention relates in part to kits and systems useful for performing the diagnostic methods described herein. The methods described herein can be performed by, for example, diagnostic laboratories, service providers, experimental laboratories, and individuals. The kits can be useful in these settings, among others.
Kits include reagents and materials for obtaining genetic material and assaying one or more markers in a sample from an individual, analyzing the results, diagnosing whether the individual is susceptible to or at risk for developing AMD, monitoring disease progression, and/or determining an appropriate treatment course. For example, in some embodiments, the kit can include a needle, syringe, vial, cotton swap or other apparatus for obtaining and/or containing a sample from an individual. In some embodiments, the kit can include at least one reagent which is used specifically to detect a marker disclosed herein. That is, suitable reagents and techniques readily can be selected by one of skill in the art for inclusion in a kit for detecting or quantifying a marker of interest.
For example, where the marker is a nucleic acid (e.g., DNA or RNA), the kit includes reagents appropriate for detecting nucleic acids using, for example, PCR, hybridization techniques, and microarrays.
Where appropriate, the kit includes: extraction buffers or reagents, amplification buffers or reagents, reaction buffers or reagents, hybridization buffers or reagents, immunodetection buffers or reagents, labeling buffers or reagents, and detection means. The kit can include all or part of the nucleic acids of the nucleotide sequences including the SNPs identified in Supplementary Table 3 and Table 1, or a nucleic acid molecule complementary thereto.
Kits can also include a control, which can be a control sample, a reference sample, an internal standard, or previously generated empirical data. The control may correspond to a known allele, e.g., a wild type and/or a variant allele. In addition, a control may be provided for each marker or the control may be a reference (e.g., a wild type and/or variant sequence).
Kits can include one or more containers for each individual reagent. Kits can further include instructions for performing the methods described herein and/or interpreting the results, in accordance with any regulatory requirements. In addition, software can be included in the kit for analyzing the results. Preferably, the kits are packaged in a container suitable for commercial distribution, sale, and/or use.
The following examples are provided for illustration, not limitation.
To define the role of rare variants in advanced age-related macular degeneration (AMD) risk, we sequenced exons of 779 genes within AMD loci and pathways in 2,493 cases and controls. We found that 7.8% of AMD cases compared to 2.3% of controls are carriers of rare missense CFI variants (OR=3.6, p=2×10−8); some of these variants result in loss of CFI function. We also observed significant association in rare missense alleles outside of CFI. Genotyping in 5,115 independent samples confirmed associations to AMD with a K155Q allele in C3 (joint p=5.2×10−9, OR=3.8) and a P167S allele in C9 (joint p=6.5×10−7, OR=2.2). Finally, we show that the 155Q allele reduces C3 binding to CFH in vitro, mitigating subsequent deactivation of C3 by CFI cleavage. These results implicate loss of C3 protein regulation by CFH and CFI leading to excessive alternative complement activation in AMD pathogenesis, thus informing both the direction of effect and mechanistic underpinnings of this disorder.
Age-related macular degeneration (AMD) is a common progressive disease that can lead to irreversible vision loss in individuals over 55 years of age with genetic risk factors1-3. Most genetic variants identified for AMD are common, without clearly established disease mechanisms4. To date, highly penetrant missense variants that confer AMD risk have been discovered only in CFH5,6. We sought to identify additional rare missense variants that contribute to AMD risk, with the broader goal of identifying specific disease mechanisms and also elucidating the direction of effect of such alleles. Such data could have a direct impact on the design of therapeutic paradigms.
We applied next-generation targeted sequencing to exons, 5′ untranslated regions, and 3′ untranslated regions for 779 candidate genes in known AMD loci and in pathways related to AMD pathogenesis (see Online Methods). In total we sequenced 5.28 megabases in 1,676 cases, 745 controls, and 36 siblings with discordant disease status (n=2,493), obtaining>20× coverage for a median of 95.6% of the targeted region (Supplementary
To assess the accuracy of genotypes obtained by sequencing we compared them to genotype data for the same individuals obtained by Illumina HumanExome BeadChip™ array (i.e. exome-chip)8,9. We observed 99.97% concordance for 2,426 sequenced missense SNPs with allele frequencies of at least 0.001 that were on the exome-chip (Online Methods). Allelic dosages were almost perfectly correlated (r>0.99) for 96.5% of common variants (f>0.01) and 93.0% of rare variants (Supplementary
We next used our sequencing data to test whether any individual gene had a higher burden of rare variants in cases or in controls. For these analyses we selected only rare variants (f<0.01) that alter coding sequences (missense, nonsense, read-through, or splice variants, n=18,854 SNPs). We used a simple burden test to assess whether the proportion of case individuals who carried at least one rare variant was in excess of chance (Online Methods)10. We similarly tested increased burden in controls. We saw a significantly increased burden of rare variants in cases in only one gene, CFI (exact one-tailed p=1.6×10−8, OR=3.57,
The enrichment of rare coding variants in CFI is not likely to be the consequence of population stratification. First, if stratification were driving the observed enrichment, then other genes would also demonstrate enrichment for rare variants; however burden tests for other genes did not exceed expectation due to chance (
The case-enrichment of rare coding variants in CFI was independent of the nearby common risk allele, rs46987754,11. Stratifying our samples for rs4698775 genotypes did not obviate the CFI rare coding variant enrichment (p=1.7×10−8,
We examined the 59 separate CFI nucleotide variants that conferred 58 coding changes (
Remarkably, many of these CFI rare variants have been seen in atypical hemolytic uremic syndrome cases (aHUS), including P50A14, G119R14, A240G15, G261D16, R317W15, I340T17, Y369S15,18, D403N14, 1416L18, Y459S14, R474X19, and P553S14. These variants are hypothesized to confer aHUS risk by decreasing the ability of CFI to cleave and thereby deactivate C3b, the cleaved and activated form of C3. Of these variants R317W and 1340T result in CFI functional deficiency20, while P50A, 1416L, and R474X produced in a CFI protein quantitative deficiency14. The G261D variant has been studied extensively and no functional deficiency has been noted to date16.
We then used the same sequencing data to test rare variants individually for association to AMD risk. This study was not powered to detect an association for variants observed less than five times and we therefore excluded those extremely rare SNPs from this test. Overall, we identified 2,169 variants that had an allele frequency of <1% in cases or controls and passed strict quality control (Online Methods). Of these, we identified five and 16 variants associated with increased and decreased AMD risk respectively (exact 1-tailed p<0.01, Supplementary Table 3). Four of these variants were within or near CFH, including the previously reported CFH R1210C risk variant (p=0.0012)5. In addition we observed association to the CFH N1050Y, CFHR2Y264C, and CFHR5G278S protective variants. After phasing common variants in samples from the sequencing experiment within this locus, we observed that these variants were in LD with CFH haplotypes (Supplementary
We evaluated 11 of the 17 variants outside of the CFH locus for evidence of association in 2,227 separate cases and 2,888 separate controls from Boston, Baltimore, and France using either the exome-chip or Taqman (Supplementary Table 4). Of those variants, we observed independent evidence of association for two variants in replication (p<0.0045=0.05/11, see Table 1).
The two variants that ultimately demonstrated association in replication were a nonsynonymous rs147859257 (or K155Q) variant in C3 (exact 1-tailed p=4.8×10−6 in discovery) and the rs34882957 (P167S) variant in C9 (p=2.3×10−3 in discovery, Table 1, Supplementary Table 3). Both variants had highly concordant sequence-based genotypes compared to separate exome-chip genotypes (r=1). To assess the possibility that either association could be related to population stratification, we calculated principal components for a subset of samples (n=2,241) as described above for CFI variants. There was no evidence of clustering of individuals that were carriers of either of these variants along the first two principal components (Supplementary
The C3 K155Q variant, conferring a lysine to glutamine change at position 155 (or position 133 in the mature protein excluding the signal peptide), demonstrated compelling evidence of association in replication (p=3.5×10−5) and was highly significant with a large effect size in joint analysis with discovery samples (p=5.2×10−9, OR=3.8; Table 1A). This risk is independent from a previously discovered common risk variant, R102G (rs2230199)21,22. In the samples with genotypes for both R102G and K155Q (n=6,935), we observed increased significance of the risk conferred by K155Q when we stratified individuals on R102G genotypes (from p=5.4×10−9 to p=6.2×10−10, Table 1A). We found that the K155Q risk variant is, in fact, in phase with the protective 102R (rs2230199-G/102-R) allele. The C9 P167S variant also demonstrated convincing evidence of association in replication (p=2.4×10−5) and in joint analysis (p=6.5×10−7, OR=2.2; Table 1B). While nominal associations have been reported previously23, our result implicates C9 in AMD pathogenesis definitively for the first time. This expands the repertoire of AMD genes in the complement cascade, specifically implicating the membrane attack complex (C5-C9) formed downstream of alternative complement pathway activation.
Since K155Q is exposed on the protein surface of the C3 β-chain in a positively charged patch, close to the CFH binding site (Supplementary
We conclude that C3 K155Q results in impaired binding of the C3b active form to CFH and subsequent inactivation by CFI, resulting in increased C3 convertase and feedback amplification of the alternative pathway (
6.2 × 10−10
Case Definitions.
Board certified ophthalmologists evaluated all case and matched (non-shared) control individuals. We either (1) clinically evaluated with visual acuity measurements, dilated slit lamp biomicroscopy and stereoscopic color fundus photography or (2) reviewed full ophthalmologic medical records and images. Case patients had either geographic atrophy (advanced dry AMD) or neovascular disease (wet AMD) (Clinical Age-Related Maculopathy Grading System (CARMS) stages 4 and 5)29. Controls were also examined and had no signs of intermediate or advanced macular degeneration in either eye and absence of bilateral early AMD. All Boston and France controls and most Baltimore controls (>80%) were ≧60 years old.
Boston.
Subjects were recruited through ongoing AMD study protocols2,11,30-33. We genotyped a sub-set of these samples using the Affymetrix SNP 6.0 GeneChip34. These samples included 2,422 unrelated cases and 1,287 unrelated controls, and also 49 discordant sib-pairs. We genotyped all samples with the Illumina HumanExome genotyping array (see below). We selected a subset of these samples for targeted sequencing including 1,676 unrelated cases and 745 unrelated controls, as well as 36 siblings with discordant case status (see below).
France.
We recruited AMD cases and controls at the Hopital Intercommunal de Creteil (FR-CRET), Creteil, France, as previously described34,35, who were self-described white individuals of European descent. These samples included 953 unrelated cases and 203 unrelated controls. We genotyped all samples with the Illumina HumanExome genotyping array.
Baltimore.
Unrelated subjects were recruited at Johns Hopkins Hospital in Baltimore, Md. as previously described34,36-38, and were self-described white individuals of European descent. We genotyped these 516 cases and 163 controls for selected SNPs with TaqMan (see below).
Shared Controls.
We also augmented our samples by utilizing a collection of shared controls, that were the controls recruited for four separate studies. These samples had been broadly consented for medical use, and had been genotyped at the Broad Institute as a part of ongoing studies. In aggregate, these samples included 2,466 samples that were not evaluated for retinal diseases, Caucasian, and unrelated samples. The samples included control individuals recruited for the 1000 genomes project (n=448)39, the international Serious Adverse Event Consortium (iSAEC, n=709)4°, the National Institutes of Mental Health controls (n=1,054)41, and the Prospective Registry in IBD Study at MGH (Prism, n=255)42.
Gene Selection.
To identify a set of genes that were most likely to harbor rare variants, we first selected all genes within 19 genomic regions defined by common SNPs which have been associated with AMD and obtained genome-wide significance in a recent large meta-analysis of data from 18 research groups4. Additionally, we selected genes closest to any common SNP with nominal significance (p<10−4) in a smaller previous meta-analysis35. We also selected genes involved in pathways believed to play a critical role in AMD or other retinal diseases, including complement genes, angiogenesis genes, genes involved in the structure of retinal pigment epithelium (RPE) and photoreceptors, HDL metabolism genes, genes involved in inflammation and oxidation pathways, and genes in the extracellular and collagen matrix pathways. We also included genes previously reported to be associated with AMD and related diseases, including Stargardt disease, Best disease or vitelliform macular dystrophy, Alzheimer's disease, and atypical hemolytic uremic syndrome (aHUS).
Targets Capture and Re-Sequencing.
We conducted at PerkinElmer, Inc., according to the manufacturer's protocols. Briefly, we designed a custom Agilent SureSelectXT Kit to capture genomic sequences of coding exons, splice junctions, 5′ UTR and 3′ UTR regions in 779 selected genes with indexing barcodes for each sample. We sequenced a total target length of 5.28 Mb including 1.76 MB of coding exons. We isolated the hybridized library fragments, quantitated by qPCR and sequenced, and then sequenced paired-end reads with the IIlumina HiSeq2000™ sequencing platform. We required sequencing data for each sample to have over 10× coverage at greater than 90% targeted regions and over 20× coverage at greater than 80% targeted regions. We resequenced a small fraction (<5%) of samples that didn't achieve this high quality standard.
Read Mapping, Variant Detection and Annotation.
We aligned sequence reads in each individual to the human reference genome (NCBI build 37.3, hg19) using Burrows-Wheeler Aligner (BWA, v0.59)43. We called the consensus genotypes in the target regions with The Genome Analysis Toolkit (GATK, v2.18) with the workflow and parameters recommended in the best practice variant detection with GATK v47,44. Briefly, we applied GATK duplicate removal, indel realignment, base quality score recalibration, and performed multi-sample SNP and indel discovery and genotyping across all samples simultaneously using variant quality score recalibration (VQSR). Other than high quality variants assigned “PASS” by VQSR, we also included only those variants in lower tranches with truth sensitivity between 99.0-100 that were also separately recorded in the exome sequencing project (ESP) database of 6500 samples45. We annotated variants with snpEff (v2.05)46.
Quality Control.
We further excluded SNPs failing Hardy-Weinberg test (p<10−6). We also excluded alleles from further analysis that had high missing genotype data (>1%), likely due to systematic low coverage or difficulty mapping reads across a high proportion of samples. We also excluded samples with high missing genotype data (>1%) for common alleles with >1% frequency in our data set.
We genotyped the France and Boston sample sets with the Infinium HumanExome BeadChip (v1.0), which provides coverage of over 240,000 functional exonic variants selected from >12,000 whole exome and whole genome sequences. In addition, we customized our assay by adding 3,214 SNPs in candidate AMD genes. We conducted genotyping at the John Hopkins Genotyping Core Laboratory. We genotyped shared control samples separately at the Broad Institute using the Illumina HumanExome v1.0, v1.1 and v1.1+custom content SNP arrays.
We called genotypes using Illumina's GenomeStudio software and then zCall8, a rare variant caller developed at the Broad Institute, to recover missed rare genotypes. We required that samples have <2% missing genotype calls for common variants (MAF>5%) before applying zCall. Then after applying zCall we removed duplicate SNPs, monomorphic SNPs, SNPs with a low call rate (<98%), and SNPs failing Hardy-Weinberg (p<10−6). We merged genotype calls from the four shared control cohorts and the AMD cohort by only including SNPs that passed quality control in all 5 cohorts and passed Hardy-Weinberg test (p>10−6) across all samples. We then used EIGENSTRAT47 to check for related samples and generate the first 5 principal components.
We genotyped Baltimore samples at the Duke University Center for Human Disease Modeling using a custom made TaqMan genotyping assay by Applied Biosystems and with the ABI 7900 Real-Time PCR system.
Because rare SNPs are challenging to genotype with array-based platforms, we selected only those alleles with minor allele counts of ≧5 (i.e. allele frequency>0.1%). We also selected only those SNPs with 0% missingness in the array data to ensure the highest possible accuracy for array-based genotypes. To assess concordance we calculated a simple concordance between genotype dosages across individuals with the two different platforms. This correlation-based metric is comparable across allele frequency spectrums.
Ancestry Informative Markers.
We identified 16,008 ancestry informative markers genotyped on the exome-chip. These SNPs had common allele frequencies (f>5%), and excluded regions in the CFH locus (chr1, 195.5-197.5 MB in HG19 coordinates), the ARMS2 locus (chr10, 123-125 MB), and the Major Histocompatability Complex locus (chr 6, 25-35 MB) loci. We pruned SNPs using the indep option in plink with default parameters (VIF=2, window size=50 SNPs)48.
Combining Shared Controls.
For samples genotyped with exome-chip in the France and Boston data sets, we expanded the set of available controls by including shared controls. In order to mitigate the potential effects of population stratification in these replication data sets we included shared controls into each collection by matching on case ancestry. First we applied EIGENSTRAT to these SNP genotypes to calculate principal components to match samples in replication samples together with shared controls47. Then we used the first 5 principal components to calculate Euclidean distances between samples in the Boston and France cohorts and shared control samples. Finally, we randomly selected individual case samples in these cohorts and assigned the nearest unassigned shared controls to the selected case's cohort. Shared controls could only be assigned to one of two cohorts. We added one shared control for every two cases to the France collection, and two shared controls for every one case to the Boston replication collection. The resulting expanded Boston and France cohorts had minimal evidence of population stratification (λgc=1.04 and λgc=1.06 respectively for ancestry informative markers).
Statistical Framework for Association Testing.
Asymptotic statistics can be inaccurate for rare variants, so we elected to utilize exact statistics instead to test association for individual variants and also for sets of variants in genes (burden tests). We used a strategy similar to Raychaudhuri et al5.
For single strata case-control sample collections we used a 2×2 Fisher's exact test to calculate a one-tailed p-value. Assuming a single strata that there are a total of N individuals, of whom ncase are cases and nvariant are carriers for the variant, we can calculate the one-tailed significance of observing nvariant,case individuals who have the variant and also have advanced AMD as follows:
where hg is the hypergeometric probability distribution assuming that there are nvariant draws from a total of N samples, of which x of a total of ncase are drawn.
If multiple case-control strata are present, for example if we are stratifying genotypes of a common variant or combining multiple case-control collections together, we expand the above strategy to calculate an exact p-value. Assume that we observe a total of nvariant,case carriers who are affected across all strata then, we can calculate significance as follows:
Here, for each strata j we have separate numbers of total individuals (Nj), separate numbers of individuals who are cases (nj,case), and heterozygote individuals (nj,variant). So, we calculate the hypergeometric probability for each individual strata for all the possible counts that would result in an equal or greater than nvariant,case total number of heterozygotes associated with advanced AMD, and total these probabilities together to obtain a p-value.
For discordant siblings we calculated statistical significance using the binomial distribution. For a given variant, we consider only those pairs of siblings with discordant genotype for the rare variant. The probability under the null that the affected sibling will have the variant is 0.5. We assign each discordant sibling pair a score, si, which is 1 if the affected sibling has the rare variant or 0 if the unaffected sibling has the rare variant. We obtain an aggregate score by summing over all independent siblings:
Under the null hypothesis, the aggregate score should be distributed according to a binomial distribution. So if there are a total of nsiblings we can calculate psibling, the one-tailed significance:
where the function binomial represents the binomial distribution for x successful draws out of nsiblings each with a 0.5 probability.
In order to calculate an aggregate meta-analysis we define a score, Saggregate which is the total of svariant,case across all strata and siblings. We can calculate the probability of obtaining the score s or a more significant score to determine the exact one-tailed p-value:
Association Testing.
We filtered variants to include nonsynonymous, splice acceptor-site and donor-site, start lost, stop gained and stop lost; these variants are most likely to alter gene function. Then we identified rare variants (f≦0.01 in either cases or controls). In the single variant association test, we only included rare variants with greater or equal to 5 minor allele counts in our dataset. We used the statistical framework described above. To conduct stratified testing for other nearby common SNPs, we further subdivided strata based on common variant genotypes.
Burden Testing.
To test for increased burden of these annotations, we followed the strategy defined by Li and Leal, using our exact statistical framework10. We first identified those variants that altered coding sequences, that is, variants that result in missense changes, altered splice acceptor-sites or donor-sites, a start lost, or a stop gained or lost. We selected those variants that were present in with low allele frequencies (f<0.01) in all of our sequenced set of patients. We tested each gene in two ways: (1) assessing if rare variants are increased in cases compared to controls and then (2) assessing if rare variants are increased in controls compared to cases. We used the statistical framework described above to test for aggregate effects.
Defining CFH Haplotypes.
To insure accurate phasing, the individuals in this data set had 0% missing genotype for CFH markers used in our previous study to define haplotypes5. For all markers we constructed haplotypes with genotype data from sequencing with Beagle49. We selected haplotypes with frequencies>0.5%, and calculated case and control frequencies. For each haplotype we calculated odds ratios and 95% confidence intervals relative to the most frequent haplotype.
Adjusting for Stratification.
We used ancestry informative markers to reassess association statistics for C3 K155Q and rare CFI coding variants and stratification in a subset of sequenced samples with exome-chip data (1,558 cases and 683 controls). We applied EIGENSTRAT To calculate principal components to capture ancestry information and exclude outliers in sequenced samples47. For sequenced samples we plotted carriers of C3 K155Q and also of CFI rare variants along the first two components to look for gross evidence of stratification. We also re-assessed two-tailed association statistics by adjusting for the top 5 principal components in a logistic regression framework. Given that logistic regression p-values can be biased for rare events, in the case of C3 K155Q we reported two-tailed significance by calculating beta with the actual data, and comparing it to a null beta distribution defined by permuting case-control labels 1,000,000 times.
Reagents.
For these experiments we obtained purified Factor H(CFH) and Factor I (CFI) (Complement Technologies, Tyler, Tex.); chicken anti-human C3 antibody (Biodesign International, Saco, Me.); goat anti-human C3 antibody (Complement Technologies, Tyler, Tex.); donkey anti-chicken horseradish peroxidase (HRP) linked IgG (Jackson Immunoresearch, Westgrove, Pa.); rabbit anti-goat HRP linked IgG (Sigma, St. Louis, Mo.); murine monoclonal anti-human C3d antibody (Quidel, San Diego, Calif.); and 3,3′,5,5′-Tetramethylbenzidine (TMB) and SuperSignal ELISA substrate (Pierce, Rockford, Ill.).
Protein Expression.
We applied site directed mutagenesis using QuikChange (Agilent Technologies, Santa Clara, Calif.) to modify wildtype C3 cDNA with the 155K allele to instead contain the 155Q mutation. We produced 155K and 155Q C3 proteins by transient transfection of 293T cells with TransIt 293 (Minis, Madison, Wis.) and collected and concentrated cell supernatants three days post-transfection. We treated transfection supernatants with methylamine (MA) to convert native C3 to an inactive, C3b-like form, C3 (MA). We quantified C3 by ELISA, surface plasmon resonance and Western blotting as previously described25.
Ligand Binding Assays.
To assess binding of 155K and 155Q C3 proteins to regulators, we utilized ELISA assays as previously described25. Briefly, we coated either soluble membrane cofactor protein (sMCP), CFH or soluble complement receptor 1 (sCR1) (each at 2 μg/ml) on wells in PBS overnight at 4° C. followed by an incubation with a blocking buffer at 37° C. for 60 minutes. We prepared dilutions of 155K and 155Q C3 proteins in a low salt (25 mM) ELISA buffer. Following incubation at 37° C. for 1 hour, we washed the wells and then applied affinity purified chicken anti-human C3 Ab (1:10,000) (Biodesign International) for 1 hour at 37° C. After washing, we applied HRP linked donkey anti-chicken IgY (1:10,000) for 1 hour at 37° C. Following washing, we added TMB substrate and quantified absorbance at 630 nm.
Surface Plasmon Resonance (SPR) Analysis.
We performed SPR analysis using the BIAcore 2000 (GE Lifesciences, Piscataway, N.J.). Using standard amine coupling technology (GE Lifesciences, Piscataway, N.J.)sMCP, CFH and anti-C3d mAb were coupled to individual flow paths. We activated one flow path in each chip without protein as a reference. The running buffer was composed of 10 mM Hepes, pH 7.4, 0.005% Tween-20 and 25 mM NaCl. We injected 155K or the 155Q C3 protein for 90 sec at 30 μl/min and monitored dissociation for 300 sec. We regenerated the chip by injecting 0.5 M NaCl. We analyzed each protein at four concentrations, with a minimum of two injections per concentration. These experiments were performed three times using independently produced and quantitated C3 preparations. We analyzed data using the BIAeval software from BIAcore.
Cofactor Assays.
We assessed cleavage of 155K and 155Q C3 proteins by FI using previously described cofactor assays25. C3 preparations were incubated for 0 to 30 min at 37° C. with CFI (5 ng in MCP and 20 ng in CFH assays) and a cofactor protein MCP (50 ng; recombinant) or CFH (200 ng) in 15 μl of buffer (10 mM Tris, pH 7.4, 150 mM NaCl). To stop the reaction, we added 7 μl of 3× reducing Laemmli sample buffer. The samples were boiled at 95° C. for 5 min, electrophoresed on 10% Tris-glycine polyacrylamide gels, transferred to nitrocellulose and blocked overnight with 5% non-fat dry milk in PBS. We probed the blots with either a 1:10,000 dilution of chicken anti-human C3 Ab followed by HRP-conjugated donkey anti-chicken IgG or a 1:5000 dilution of goat anti-human C3 Ab followed by HRP-conjugated rabbit anti-goat IgG. We developed the blots with SuperSignal substrate.
The use of headings and sections in the application is not meant to limit the invention; each section can apply to any aspect, embodiment, or feature of the invention.
Throughout the application, where compositions are described as having, including, or comprising specific components, or where processes are described as having, including or comprising specific process steps, it is contemplated that compositions of the present teachings also consist essentially of, or consist of, the recited components, and that the processes of the present teachings also consist essentially of, or consist of, the recited process steps.
In the application, where an element or component is said to be included in and/or selected from a list of recited elements or components, it should be understood that the element or component can be any one of the recited elements or components and can be selected from a group consisting of two or more of the recited elements or components.
It should be understood that the order of steps or order for performing certain actions is immaterial so long as the present teachings remain operable. Moreover, two or more steps or actions may be conducted simultaneously.
Where a range or list of values is provided, each intervening value between the upper and lower limits of that range or list of values is individually contemplated and is encompassed within the invention as if each value were specifically enumerated herein. In addition, smaller ranges between and including the upper and lower limits of a given range are contemplated and encompassed within the invention. The listing of exemplary values or ranges is not a disclaimer of other values or ranges between and including the upper and lower limits of a given range.
This application claims the benefit of U.S. Provisional Patent Applications Ser. Nos. 61/775,673, entitled Genes Associated with Progression to Advanced Stages of Macular Degeneration, filed on Mar. 10, 2013, and 61/778,601, entitled Markers Related to Age-Related Macular Degeneration and Uses Therefor, filed on Mar. 13, 2013, the contents both of which are incorporated herein by reference in their entireties for all purposes.
This invention was made with government support under grant number ROI EY1 1309 awarded by the National Institutes of Health and the National Eye Institute. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
61775673 | Mar 2013 | US | |
61778601 | Mar 2013 | US |