LUNG CANCER METHYLATION MARKERS

Information

  • Patent Application
  • 20160281175
  • Publication Number
    20160281175
  • Date Filed
    April 12, 2016
    8 years ago
  • Date Published
    September 29, 2016
    8 years ago
Abstract
The present invention discloses a method of diagnosing lung cancer by using methylation specific markers from a set, having diagnostic power for lung cancer diagnosis and distinguishing lung cancer types in diverse samples; as well as methods to identify sets of prognostic and diagnostic value.
Description

The present invention relates to cancer diagnostic methods and means therefor.


Neoplasms and cancer are abnormal growths of cells. Cancer cells rapidly reproduce despite restriction of space, nutrients shared by other cells, or signals sent from the body to stop re-production. Cancer cells are often shaped differently from healthy cells, do not function properly, and can spread into many areas of the body. Abnormal growths of tissue, called tumors, are clusters of cells that are capable of growing and di-viding uncontrollably. Tumors can be benign (noncancerous) or malignant (cancerous). Benign tumors tend to grow slowly and do not spread. Malignant tumors can grow rapidly, invade and destroy nearby normal tissues, and spread throughout the body. Malignant cancers can be both locally invasive and metastatic. Locally invasive cancers can invade the tissues surrounding it by sending out “fingers” of cancerous cells into the normal tissue. Metastatic cancers can send cells into other tissues in the body, which may be distant from the original tumor. Cancers are classified according to the kind of fluid or tissue from which they originate, or according to the location in the body where they first developed. All of these parameters can effectively have an influence on the cancer characteristics, development and progression and subsequently also cancer treatment. Therefore, reliable methods to classify a cancer state or cancer type, taking diverse parameters into consideration is desired. Since cancer is predominantly a genetic disease, trying to classify cancers by genetic parameters is one extensively studied route.


Extensive efforts have been undertaken to discover genes relevant for diagnosis, prognosis and management of (cancerous)disease. Mainly RNA-expression studies have been used for screening to identify genetic biomarkers. Over recent years it has been shown that changes in the DNA-methylation pattern of genes could be used as biomarkers for cancer diagnostics. In concordance with the general strategy identifying RNA-expression based biomarkers, the most convenient and prospering approach would start to identify marker candidates by genome-wide screening of methylation changes.


The most versatile genome-wide approaches up to now are using microarray hybridization based techniques. Although studies have been undertaken at the genomic level (and also the single-gene level) for elucidating methylation changes in diseased versus normal tissue, a comprehensive test obtaining a good success rate for identifying biomarkers is yet not available.


Developing biomarkers for disease (especially cancer)-screening, -diagnosis, and -treatment was improved over the last decade by major advances of different technologies which have made it easier to discover potential biomarkers through high-throughput screens. Comparing the so called “OMICs”-approaches like Genomics, Proteomics, Metabolomics, and derivates from those, Genomics is best developed and most widely used for biomarker identification. Because of the dynamic nature of RNA expression and the ease of nucleic acid extraction and the detailed knowledge of the human genome, many studies have used RNA expression profiling for elucidation of class differences for distinguishing the “good” from the “bad” situation like diseased vs. healthy, or clinical differences between groups of diseased patients. Over the years especially microarray-based expression profiling has become a standard tool for research and some approaches are currently under clinical validation for diagnostics. The plasticity over a broad dynamic range of RNA expression levels is an advantage using RNA and also a prerequisite of successful discrimination of classes, the low stability of RNA itself is often seen as a drawback. Because stability of DNA is tremendously higher than stability of RNA, DNA based markers are more promising markers and expected to give robust assays for diagnostics. Many of clinical markers in oncology are more or less DNA based and are well established, e.g. cytogenetic analyses for diagnosis and classification of different tumor-species. However, most of these markers are not accessible using the cheap and efficient molecular-genetic PCR routine tests. This might be due to 1) the structural complexity of changes, 2) the inter-individual differences of these changes at the DNA-sequence level, and 3) the relatively low “quantitative” fold-changes of those “chromosomal” DNA changes. In comparison, RNA-expression changes range over some orders of magnitudes and these changes can be easily measured using genome-wide expression microarrays. These expression arrays are covering the entire translated transcriptome by 20000-45000 probes. Elucidation of DNA changes via microarray techniques re-quires in general more probes depending on the requested resolution. Even order(s) of magnitude more probes are required than for standard expression profiling to cover the entire 3×109 by human genome. For obtaining best resolution when screening biomarkers at the structural genomic DNA level, today genomic tiling arrays and SNP-arrays are available. Although costs of these techniques analysing DNA have decreased over recent years, for biomarker screening many samples have to be tested, and thus these tests are cost intensive.


Another option for obtaining stable DNA-based biomarkers re-lies on elucidation of the changes in the DNA methylation pattern of (malignant; neoplastic) disease. In the vertebrate genome methylation affects exclusively the cytosine residues of CpG dinucleotides, which are clustered in CpG islands. CpG islands are often found associated with gene-promoter sequences, present in the 5′-untranslated gene regions and are per default unmethylated. In a very simplified view, an unmethylated CpG island in the associated gene-promoter enables active transcription, but if methylated gene transcription is blocked. The DNA methylation pattern is tissue- and clone-specific and almost as stable as the DNA itself. It is also known that DNA-methylation is an early event in tumorigenesis which would be of interest for early and initial diagnosis of disease. In principle screening for biomarkers suitable to answering clinical questions including DNA-methylation based approaches would be most successful when starting with a genome-wide approach.


Shames D et al. (PLOS Medicine 3(12) (2006): 2244-2262) identified multiple genes that are methylated with high penetrance in primary lung, breast, colon and prostate cancers.


Sato N et al. (Cancer Res 63(13) (2003): 3735-3742) identified potential targets with aberrant methylation in pancreatic cancer. These genes were tested using a treatment with a de-methylating agent (5-aza-2′-deoxycytidine and/or the histone deacetylase inhibitor trichostatin A) after which certain genes were increased transcribed.


Bibikova M et al. (Genome Res 16(3) (2006): 383-393) analysed lung cancer biopsy samples to identify methylated cpu sites to distinguish lung adenocarcinomas from normal lung tissues.


Yan P S et al. (Clin Cancer Res 6(4) (2000): 1432-1438) analysed CpG island hypermethylation in primary breast tumor.


Cheng Y et al. (Genome Res 16(2) (2006): 282-289) discussed DNA methylation in CpG islands associated with transcriptional silencing of tumor suppressor genes.


Ongenaert M et al. (Nucleic Acids Res 36 (2008) Database issue D842-D846) provided an overview over the methylation database “PubMeth”.


Microarray for human genome-wide hybridization testings are known, e.g. the Affymetrix Human Genome U133A Array (NCB1 Database, Acc. No. GLP96).


A substantial number of differentially methylated genes has been discovered over years rather by chance than by rationality. Albeit some of these methylation changes have the potential being useful markers for differentiation of specifically defined diagnostic questions, these would lack the power for successful delineation of various diagnostic constellations. Thus, the rational approach would start at the genomic-screen for distinguishing the “subtypes” and diagnostically, prognostically and even therapeutically challenging constellations. These rational expectations are the base of starting genomic (and also other—omics) screenings but do not warrant to obtain the maker panel for all clinical relevant constellations which should be distinguished. This is neither unreliable when thinking about a universal approach (e.g. transcriptomics) suitable to distinguish for instance all subtypes in all different malignancies by focusing on a single class of target-molecules (e.g. RNA). Rather all omics-approaches together would be necessary and could help to improve diagnostics and finally patient management.


Lung cancer is the third most common malignant neoplasm in the EU following breast and colon cancers. Lung cancer presents the second worst 5-year survival figures following pancreas. Thus, although it accounts for 14% of all cancer diagnoses, lung cancer is responsible for 22% of cancer deaths, indicating the poor prognosis of this tumour type and the comparative lack of progress in treatment. Therapy is hampered by the tendency for lung cancer to be diagnosed at a late stage, hence the need to develop markers for early detection. Approximately 80% of lung cancer cases are of the non-small cell type (NSCLC), with squamous cell carcinoma and adenocarcinoma being the most frequent subtypes. A goal of the present invention is to provide an alternative and more cost-efficient route to identify suitable markers for lung cancer diagnostics.


Therefore, in a first aspect, the present invention provides a set of nucleic acid primers or hybridization probes being specific for a potentially methylated region of marker genes being suitable to diagnose or predict lung cancer or a lung cancer type, preferably being selected from adenocarcinoma or squamous cell carcinoma, the marker genes comprising WT1, SALL3, TERT, ACTB, CPEB4. Preferably the set further comprises any one of the markers ABCB1, ACTB, AIM1L, APC, AREG, BMP2K, BOLL, C5AR1, C5orf4, CADM1, CDH13, CDX1, CLIC4, COL21A1, CPEB4, CXADR, DLX2, DNAJA4, DPH1, DRD2, EFS, ERBB2, ERCC1, ESR2, F2R, FAM43A, GABRA2, GAD1, GBP2, GDNF, GNA15, GNAS, HECW2, HIC1, HIST1H2AG, HLA-G, HOXA1, HOXA10, HSD17B4, HSPA2, IRAK2, ITGA4, JUB, KCNJ15, KCNQ1, KIF5B, KL, KRT14, KRT17, LAMC2, MAGEB2, MBD2, MSH4, MT1G, MT3, MTHFR, NEUROD1, NHLH2, NKX2-1, ONECUT2, PENK, PITX2, PLAGL1, PTTG1, PYCARD, RASSF1, S100A8, SALL3, SERPINB5, SERPINE1, SERPINI1, SFRP2, SLC25A31, SMAD3, SPARC, SPHK1, SRGN, TERT, THRB, TJP2, TMEFF2, TNFRSF10C, TNFRSF25, TP53, ZDHHC11, ZNF256, ZNF711, F2R, HOXA10, KL, SALL3, SPARC, TNFRSF25, WT1.


In a further aspect, the present invention provides a method of determining a subset of diagnostic markers for potentially methylated genes from the genes of gene marker IDs 1-359 of table 1, suitable for the diagnosis or prognosis of lung cancer or lung cancer type, comprising

    • a) obtaining data of the methylation status of at least 50 random genes selected from the 359 genes of gene marker IDs 1-359 in at least 1 sample, preferably 2, 3, 4 or at least 5 samples, of a confirmed lung cancer or lung cancer type state and at least one sample of a lung cancer or lung cancer type negative state,
    • b) correlating the results of the obtained methylation status with the lung cancer or lung cancer type,
    • c) optionally repeating the obtaining a) and correlating b) steps for a different combination of at least 50 random genes selected from the 359 genes of gene marker IDs 1-359 and
    • d) selecting as many marker genes which in a classification analysis have a p-value of less than 0.1 in a random-variance t-test, or selecting as many marker genes which in a classification analysis together have a correct lung cancer or lung cancer type prediction of at least 70% in a cross-validation test,


      wherein the selected markers form the subset of diagnostic markers.


The present invention provides a master set of 359 genetic markers which has been surprisingly found to be highly relevant for aberrant methylation in the diagnosis or prognosis of lung cancer. It is possible to determine a multitude of marker subsets from this master set which can be used to diagnose and differentiate between various lung cancer or tumor types, e.g. adenocarcinoma and squamous cell carcinoma.


The inventive 359 marker genes of table 1 (given in example 1 below) are: NHLH2, MTHFR, PRDM2, MLLT11, S100A9 (control), S100A9, S100A8 (control), S100A8, S100A2, LMNA, DUSP23, LAMC2, PTGS2, MARK1, DUSP10, PARP1, PSEN2, CLIC4, RUNX3, AIM1L, SFN, RPA2, TP73, TP73 (p73), POU3F1, MUTYH, UQCRH, FAF1, TACSTD2, TN-FR5F25, DIRAS3, MSH4, GBP2, GBP2, LRRC8C, F3, NANOS1, MGMT, EBF3, DCLRE1C, KIF5B, ZNF22, PGBD3, SRGN, GATA3, PTEN, MMS19, SFRP5, PGR, ATM, DRD2, CADM1, TEAD1, OPCML, CALCA, CTSD, MYOD1, IGF2, BDNF, CDKN1C, WT1, HRAS, DDB1, GSTP1, CCND1, EPS8L2, PI-WIL4, CHST11, UNG, CCDC62, CDK2AP1, CHFR, GRIN2B, CCND2, VDR, B4GALNT3, NTF3, CYP27B1, GPR92, ERCC5, GJB2, BRCA2, KL, CCNA1, SMAD9, C13orf15, DGKH, DNAJC15, RB1, RCBTB2, PARP2, APEX1, JUB, JUB (control NM 198086), EFS, BAZ1A, NKX2-1, ESR2, HSPA2, PSEN1, PGF, MLH3, TSHR, THBS1, MYO5C, SMAD6, SMAD3, NOX5, DNAJA4, CRABP1, BCL2A1 (ID NO: 111), BCL2A1 (ID NO: 112), BNC1, ARRDC4, SOCS1, ERCC4, NTHL1, PYCARD, AXIN1, CYLD, MT3, MT1A, MT1G, CDH1, CDH13, DPH1, HIC1, NEUROD2 (control), NEUROD2, ERBB2, KRT19, KRT14, KRT17, JUP, BRCA1, COL1A1, CACNA1G, PRKAR1A, SPHK1, SOX15, TP53 (TP53_CGI23_1 kb), TP53 (TP53_both_CGIs_1 kb), TP53 (TP53_CGI36_1 kb), TP53, NPTX1, SMAD2, DCC, MBD2, ONECUT2, BCL2, SERPINB5, SERPINB2 (control), SERPINB2, TYMS, LAMA1, SALL3, LDLR, STK11, PRDX2, RAD23A, GNA15, ZNF573, SPINT2, XRCC1, ERCC2, ERCC1, C5AR1 (NM_001736), C5AR1, POLD1, ZNF350, ZNF256, C3, XAB2, ZNF559, FHL2, IL1B, IL1B (control), PAX8, DDX18, GAD1, DLX2, ITGA4, NEUROD1, STAT1, TMEFF2, HECW2, BOLL, CASP8, SERPINE2, NCL, CYP1B1, TACSTD1, MSH2, MSH6, MXD1, JAG1, FOXA2, THBD, CTCFL, CTSZ, GATA5, CXADR, APP, TTC3, KCNJ15, RIPK4, TFF1, SEZ6L, TIMP3, BIK, VHL, IRAK2, PPARG, MBD4, RBP1, XPC, ATR, LXN, RARRES1, SERPINI1, CLDN1, FAM43A, IQCG, THRB, RARB, TGFBR2, MLH1, DLEC1, CTNNB1, ZNF502, SLC6A20, GPX1, RASSF1, FHIT, OGG1, PITX2, SLC25A31, FBXW7, SFRP2, CHRNA9, GABRA2, MSX1, IGFBP7, EREG, AREG, ANXA3, BMP2K, APC, HSD17B4 (ID No 249), HSD17B4 (ID No 250), LOX, TERT, NEUROG1, NR3C1, ADRB2, CDX1, SPARC, C5orf4, PTTG1, DUSP1, CPEB4, SCGB3A1, GDNF, ERCC8, F2R, F2RL1, VCAN, ZDHHC11, RHOBTB3, PLAGL1, SASH1, ULBP2, ESR1, RNASET2, DLL1, HIST1H2AG, HLA-G, MSH5, CDKN1A, TDRD6, COL21A1, DSP, SERPINE1 (ID No 283), SERPINE1 (ID No 284), FBXL13, NRCAM, TWIST1, HOXA1, HOXA10, SFRP4, IGFBP3, RPA3, ABCB1, TFPI2, COL1A2, ARPC1B, PILRB, GATA4, MAL2, DLC1, EPPK1, LZTS1, TNFRSF10B, TNFRSF10C, TNFRSF10D, TNFRSF10A, WRN, SFRP1, SNAI2, RDHE2, PENK, RDH10, TGFBR1, ZNF462, KLF4, CDKN2A, CDKN2B, AQP3, TPM2, TJP2 (ID NO 320), TJP2 (ID No 321), PSAT1, DAPK1, SYK, XPA, ARMCX2, RHOXF1, FHL1, MAGEB2, TIMP1, AR, ZNF711, CD24, ABL1, ACTB, APC, CDH1 (Ecad 1), CDH1 (Ecad2), FMR1, GNAS, H19, HIC1, IGF2, KCNQ1, GNAS, CDKN2A (P14), CDKN2B (P15), CDKN2A (P16_VL), PITXA, PITXB, PITXC, PITXD, RB1, SFRP2, SNRPN, XIST, IRF4, UNC13B, GSTP1. Table 1 lists some marker genes in the double such as for different loci and control sequences. It should be understood that any methylation specific region which is readily known to the skilled man in the art from prior publications or available databases (e.g. PubMeth at www.pubmeth.org) can be used according to the present invention. Of course, double listed genes only need to be represented once in an inventive marker set (or set of probes or primers therefor) but preferably a second marker, such as a control region is included (IDs given in the list above relate to the gene ID (or gene loci ID) given in table 1 of the example section).


One advantage making DNA methylation an attractive target for biomarker development, is the fact that cell free methylated DNA can be detected in body-fluids like serum, sputum, and urine from patients with cancerous neoplastic conditions and disease. For the purpose of biomarker screening, clinical samples have to be available. For obtaining a sufficient number of samples with clinical and “outcome” or survival data, the first step would be using archived (tissue) samples. Preferably these materials should fulfill the requirements to obtain intact RNA and DNA, but most archives of clinical samples are storing formalin fixed paraffin embedded (FFPE) tissue blocks. This has been the clinic-pathological routine done over decades, but that fixed samples are if at all only suitable for extraction of low quality of RNA. It has now been found that according to the present invention any such samples (as any comprising tumor DNA) can be used for the method of generating an inventive subset, including fixed samples. The samples can be of lung tissue or any body fluid, e.g. sputum, bronchial lavage, or serum derived from peripheral blood or blood cells. Blood or blood derived samples preferably have reduced, e.g. <95%, or no leukocyte content but comprise DNA of the cancerous cells or tumor. Preferably the inventive markers are of human genes. Preferably the samples are human samples.


The present invention provides a multiplexed methylation testing method which 1) outperforms the “classification” success when compared to genomewide screenings via RNA-expression profiling, 2) enables identification of biomarkers for a wide variety of diseases, without the need to prescreen candidate markers on a genomewide scale, and 3) is suitable for minimal invasive testing and 4) is easily scalable.


In contrast to the rational strategy for elucidation of biomarkers for differentiation of disease, the invention presents a targeted multiplexed DNA-methylation test which outperforms genome-scaled approaches (including RNA expression profiling) for disease diagnosis, classification, and prognosis.


The inventive set of 359 markers enables selection of a subset of markers from this 359 set which is highly characteristic of lung cancer and a given lung cancer type. Further indicators differentiating between cancer types or generally neoplastic conditions are e.g. benign (non (or limited) proliferative) or malignant, metastatic or non-metastatic tumors or nodules. It is sometimes possible to differentiate the sample type from which the methylated DNA is isolated, e.g. urine, blood, tissue samples.


The present invention is suitable to differentiate diseases, in particular neoplastic conditions, or tumor types. Diseases and neoplastic conditions should be understood in general including benign and malignant conditions. According to the present invention benign nodules (being at least the potential onset of malignancy) are included in the definition of a disease. After the development of a malignancy the condition is a preferred disease to be diagnosed by the markers screened for or used according to the present invention. The present invention is suitable to distinguish benign and malignant tumors (both being considered a disease according to the present invention). In particular the invention can provide markers (and their diagnostic or prognostic use) distinguishing between a normal healthy state together with a benign state on one hand and malignant states on the other hand. A diagnosis of lung cancer may include identifying the difference to a normal healthy state, e.g. the absence of any neoplastic nodules or cancerous cells. The present invention can also be used for prognosis of lung cancer, in particular a prediction of the progression of lung cancer or lung cancer type. A particularly preferred use of the invention is to perform a diagnosis or prognosis of metastasizing lung cancer (distinguished from non-metastasizing conditions).


In the context of the present invention “prognosis”, “prediction” or “predicting” should not be understood in an absolute sense, as in a certainty that an individual will develop lung cancer or lung cancer type (including cancer progression), but as an increased risk to develop cancer or the lung cancer type or of cancer progression. “Prognosis” is also used in the context of predicting disease progression, in particular to predict therapeutic results of a certain therapy of the disease, in particular neoplastic conditions, or lung cancer types. The prognosis of a therapy can e.g. be used to predict a chance of success (i.e. curing a disease) or chance of reducing the severity of the disease to a certain level. As a general inventive concept, markers screened for this purpose are preferably derived from sample data of patients treated according to the therapy to be predicted. The inventive marker sets may also be used to monitor a patient for the emergence of therapeutic results or positive disease progressions.


Some of the inventive, rationally selected markers have been found methylated in some instances. DNA methylation analyses in principle rely either on bisulfite deamination-based methylation detection or on using methylation sensitive restriction enzymes. Preferably the restriction enzyme-based strategy is used for elucidation of DNA-methylation changes. Further methods to determine methylated DNA are e.g. given in EP 1 369 493 A1 or U.S. Pat. No. 6,605,432. Combining restriction digestion and multiplex PCR amplification with a targeted microarray-hybridization is a particular advantageous strategy to perform the inventive methylation test using the inventive marker sets (or subsets). A microarray-hybridization step can be used for reading out the PCR results. For the analysis of the hybridization data statistical approaches for class comparisons and class prediction can be used. Such statistical methods are known from analysis of RNA-expression derived microarray data.


If only limiting amounts of DNA were available for analyses an amplification protocol can be used enabling selective amplification of the methylated DNA fraction prior methylation testing. Subjecting these amplicons to the methylation test, it was possible to successfully distinguish DNA from sensitive cases from normal healthy controls. In addition it was possible to distinguish lung-cancer patients from healthy normal controls using DNA from serum by the inventive methylation test upon preamplification. Both examples clearly illustrate that the inventive multiplexed methylation testing can be successfully applied when only limiting amounts of DNA are available. Thus, this principle might be the preferred method for minimal invasive diagnostic testing.


In most situations several genes are necessary for classification. Although the 359 marker set test is not a genome-wide test and might be used as it is for diagnostic testing, running a subset of markers—comprising the classifier which enables best classification—would be easier for routine applications. The test is easily scalable. Thus, to test only the subset of markers, comprising the classifier, the selected subset of primers/probes could be applied directly to set up of the lower multiplexed test (or single PCR-test). Serum DNA can be used to classify or distinguish healthy patients from individuals with lung-tumors. Only the specific primers comprising the gene-classifier obtained from the methylation test may be set up together in multiplexed PCR reactions.


In summary the inventive methylation test is a suitable tool for differentiation and classification of neoplastic disease. This assay can be used for diagnostic purposes and for defining biomarkers for clinical relevant issues to improve diagnosis of disease, and to classify patients at risk for disease progression, thereby improving disease treatment and patient management.


The first step of the inventive method of generating a subset, step a) of obtaining data of the methylation status, preferably comprises determining data of the methylation status, preferably by methylation specific PCR analysis, methylation specific digestion analysis. Methylation specific digestion analysis can include either or both of hybridization of suitable probes for detection to non-digested fragments or PCR amplification and detection of non-digested fragments.


The inventive selection can be made by any (known) classification method to obtain a set of markers with the given diagnostic (or also prognostic) value to categorize a lung cancer or lung cancer type. Such methods include class comparisons wherein a specific p-value is selected, e.g. a p-value below 0.1, preferably below 0.08, more preferred below 0.06, in particular preferred below 0.05, below 0.04, below 0.02, most preferred below 0.01.


Preferably the correlated results for each gene b) are rated by their correct correlation to lung cancer or lung cancer type positive state, preferably by p-value test or t-value test or F-test. Rated (best first, i.e. low p- or t-value) markers are the subsequently selected and added to the subset until a certain diagnostic value is reached, e.g. the herein mentioned at least 70% (or more) correct classification of lung cancer or lung cancer type.


Class comparison procedures include identification of genes that were differentially methylated among the two classes using a random-variance t-test. The random-variance t-test is an improvement over the standard separate t-test as it permits sharing information among genes about within-class variation without assuming that all genes have the same variance (Wright G. W. and Simon R, Bioinformatics 19:2448-2455, 2003). Genes were considered statistically significant if their p value was less than a certain value, e.g. 0.1 or 0.01. A stringent significance threshold can be used to limit the number of false positive findings. A global test can also be performed to determine whether the expression profiles differed between the classes by permuting the labels of which arrays corresponded to which classes. For each permutation, the p-values can be re-computed and the number of genes significant at the e.g. 0.01 level can be noted. The proportion of the permutations that give at least as many significant genes as with the actual data is then the significance level of the global test. If there are more than 2 classes, then the “F-test” instead of the “t-test” should be used.


Class Prediction includes the step of specifying a significance level to be used for determining the genes that will be included in the subset. Genes that are differentially methylated between the classes at a univariate parametric significance level less than the specified threshold are included in the set. It doesn't matter whether the specified significance level is small enough to exclude enough false discoveries. In some problems better prediction can be achieved by being more liberal about the gene sets used as features. The sets may be more bio-logically interpretable and clinically applicable, however, if fewer genes are included. Similar to cross-validation, gene selection is repeated for each training set created in the cross-validation process. That is for the purpose of providing an unbiased estimate of prediction error. The final model and gene set for use with future data is the one resulting from application of the gene selection and classifier fitting to the full dataset.


Models for utilizing gene methylation profile to predict the class of future samples can also be used. These models may be based on the Compound Covariate Predictor (Radmacher et al. Journal of Computational Biology 9:505-511, 2002), Diagonal Linear Discriminant Analysis (Dudoit et al. Journal of the American Statistical Association 97:77-87, 2002), Nearest Neighbor Classification (also Dudoit et al.), and Support Vector Machines with linear kernel (Ramaswamy et al. PNAS USA 98:15149-54, 2001). The models incorporated genes that were differentially methylated among genes at a given significance level (e.g. 0.01, 0.05 or 0.1) as assessed by the random variance t-test (Wright G. W. and Simon R. Bioinformatics 19:2448-2455, 2003). The prediction error of each model using cross validation, preferably leave-one-out cross-validation (Simon et al. Journal of the National Cancer Institute 95:14-18, 2003), is preferably estimated. For each leave-one-out cross-validation training set, the entire model building process was repeated, including the gene selection process. It may also be evaluated whether the cross-validated error rate estimate for a model was significantly less than one would expect from random prediction. The class labels can be randomly permuted and the entire leave-one-out cross-validation process is then repeated. The significance level is the proportion of the random permutations that gave a cross-validated error rate no greater than the cross-validated error rate obtained with the real methylation data. About 1000 random permutations may be usually used.


Another classification method is the greedy-pairs method described by Bo and Jonassen (Genome Biology 3(4):research0017.1-0017.11, 2002). The greedy-pairs approach starts with ranking all genes based on their individual t-scores on the training set. The procedure selects the best ranked gene gi and finds the one other gene gi that together with provides the best discrimination using as a measure the distance between centroids of the two classes with regard to the two genes when projected to the diagonal linear discriminant axis. These two selected genes are then removed from the gene set and the procedure is repeated on the remaining set until the specified number of genes have been selected. This method attempts to select pairs of genes that work well together to discriminate the classes.


Furthermore, a binary tree classifier for utilizing gene methylation profile can be used to predict the class of future samples. The first node of the tree incorporated a binary classifier that distinguished two subsets of the total set of classes. The individual binary classifiers were based on the “Support Vector Machines” incorporating genes that were differentially expressed among genes at the significance level (e.g. 0.01, 0.05 or 0.1) as assessed by the random variance t-test (Wright G. W. and Simon R. Bioinformatics 19:2448-2455, 2003). Classifiers for all possible binary partitions are evaluated and the partition selected was that for which the cross-validated prediction error was minimum. The process is then repeated successively for the two subsets of classes determined by the previous binary split. The prediction error of the binary tree classifier can be estimated by cross-validating the entire tree building process. This overall cross-validation included re-selection of the optimal partitions at each node and re-selection of the genes used for each cross-validated training set as described by Simon et al. (Simon et al. Journal of the National Cancer Institute 95:14-18, 2003). 10-fold cross validation in which one-tenth of the samples is withheld can be utilized, a binary tree developed on the remaining 9/10 of the samples, and then class membership is predicted for the 10% of the samples withheld. This is repeated 10 times, each time withholding a different 10% of the samples. The samples are randomly partitioned into 10 test sets (Simon R and Lam A. BRB-ArrayTools User Guide, version 3.2. Biometric Research Branch, National Cancer Institute).


Preferably the correlated results for each gene b) are rated by their correct correlation to lung cancer or lung cancer type positive state, preferably by p-value test. It is also possible to include a step in that the genes are selected d) in order of their rating.


Independent from the method that is finally used to produce a subset with certain diagnostic or predictive value, the subset selection preferably results in a subset with at least 60%, preferably at least 65%, at least 70%, at least 75%, at least 80% or even at least 85%, at least 90%, at least 92%, at least 95%, in particular preferred 100% correct classification of test samples of lung cancer or lung cancer type. Such levels can be reached by repeating c) steps a) and b) of the inventive method, if necessary.


To prevent increase of the number of the members of the subset, only marker genes with at least a significance value of at most 0.1, preferably at most 0.8, even more preferred at most 0.6, at most 0.5, at most 0.4, at most 0.2, or more preferred at most 0.01 are selected.


In particular preferred embodiments the at least 50 genes of step a) are at least 70, preferably at least 90, at least 100, at least 120, at least 140, at least 160, at least 180, at least 190, at least 200, at least 220, at least 240, at least 260, at least 280, at least 300, at least 320, at least 340, at least 350 or all, genes.


Since the subset should be small it is preferred that not more than 60, or not more than 40, preferably not more than 30, in particular preferred not more than 20, marker genes are selected in step d) for the subset.


In a further aspect the present invention provides a method of identifying lung cancer or lung cancer type in a sample comprising DNA from a patient, comprising providing a diagnostic subset of markers identified according to the method depicted above, determining the methylation status of the genes of the subset in the sample and comparing the methylation status with the status of a confirmed lung cancer or lung cancer type positive and/or negative state, thereby identifying lung cancer or lung cancer type in the sample.


The methylation status can be determined by any method known in the art including methylation dependent bisulfite deamination (and consequently the identification of mC—methylated C—changes by any known methods, including PCR and hybridization techniques). Preferably, the methylation status is determined by methylation specific PCR analysis, methylation specific digestion analysis and either or both of hybridisation analysis to non-digested or digested fragments or PCR amplification analysis of non-digested fragments. The methylation status can also be determined by any probes suitable for determining the methylation status including DNA, RNA, PNA, LNA probes which optionally may further include methylation specific moieties.


As further explained below the methylation status can be particularly determined by using hybridisation probes or amplification primer (preferably PCR primers) specific for methylated regions of the inventive marker genes. Discrimination between methylated and non-methylated genes, including the determination of the methylation amount or ratio, can be performed by using e.g. either one of these tools.


The determination using only specific primers aims at specifically amplifying methylated (or in the alternative non-methylated) DNA. This can be facilitated by using (methylation dependent) bisulfite deamination, methylation specific enzymes or by using methylation specific nucleases to digest methylated (or alternatively non-methylated) regions—and consequently only the non-methylated (or alternatively methylated) DNA is obtained. By using a genome chip (or simply a gene chip including hybridization probes for all genes of interest such as all 359 marker genes), all amplification or non-digested products are detected. I.e. discrimination between methylated and non-methylated states as well as gene selection (the inventive set or subset) is before the step of detection on a chip.


Alternatively it is possible to use universal primers and amplify a multitude of potentially methylated genetic regions (including the genetic markers of the invention) which are, as described either methylation specific amplified or digested, and then use a set of hybridisation probes for the characteristic markers on e.g. a chip for detection. I.e. gene selection is performed on the chip.


Either set, a set of probes or a set of primers, can be used to obtain the relevant methylation data of the genes of the present invention. Of course, both sets can be used.


The method according to the present invention may be performed by any method suitable for the detection of methylation of the marker genes. In order to provide a robust and optionally re-useable test format, the determination of the gene methylation is preferably performed with a DNA-chip, real-time PCR, or a combination thereof. The DNA chip can be a commercially available general gene chip (also comprising a number of spots for the detection of genes not related to the present method) or a chip specifically designed for the method according to the present invention (which predominantly comprises marker gene detection spots).


Preferably the methylated DNA of the sample is detected by a multiplexed hybridization reaction. In further embodiments a methylated DNA is preamplified prior to hybridization, preferably also prior to methylation specific amplification, or digestion. Preferably, also the amplification reaction is multiplexed (e.g. multiplex PCR).


The inventive methods (for the screening of subsets or for diagnosis or prognosis of lung cancer or lung cancer type) are particularly suitable to detect low amounts of methylated DNA of the inventive marker genes. Preferably the DNA amount in the sample is below 500 ng, below 400 ng, below 300 ng, below 200 ng, below 100 ng, below 50 ng or even below 25 ng. The inventive method is particularly suitable to detect low concentrations of methylated DNA of the inventive marker genes. Preferably the DNA amount in the sample is below 500 ng, below 400 ng, below 300 ng, below 200 ng, below 100 ng, below 50 ng or even below 25 ng, per ml sample.


In another aspect the present invention provides a subset comprising or consisting of nucleic acid primers or hybridization probes being specific for a potentially methylated region of at least marker genes selected from a set of nucleic acid primers or hybridization probes being specific for a potentially methylated region of marker genes being suitable to diagnose or predict lung cancer or a lung cancer type, preferably being selected from adenocarcinoma or squamous cell carcinoma, the marker genes comprising WT1, SALL3, TERT, ACTB, CPEB4 or any other subset selected from one of the following groups

    • a) WT1, DLX2, SALL3, TERI, PITX2, HOXA10, F2R, CPEB4, NHLH2, SMAD3, ACTB, HOXA1, BOLL, APC, MT1G, PENK, SPARC, DNAJA4, RASSF1, HLA-G, ERCC1, ONECUT2, APC, ABCB1, ZNF573, KCNJ15, ZDHHC11, SFRP2, GDNF, PTTG1, SERPINI1, TNFRSF10C
    • b) WT1, PITX2, SALL3, F2R, DLX2, TERI, HOXA10, MSH4, NHLH2, GNA15, PENK, RASSF1, BOLL, HOXA1, ONECUT2, ABCB1, SPARC, MT1G, HSPA2, SFRP2, PYCARD, GAD1, C5orf4, C5AR1, GDNF, ZDHHC11, SERPINE1, NKX2-1, PITX2, C5AR1, ZNF256, FAM43A, SFRP2, MT3, SERPINE1, CLIC4, TNFRSF10C, GABRA2, MTHFR, ESR2, NEUROG1, PITX2, PLAGL1, TMEFF2, PTTG1, CADM1, S100A8, EFS, JUB, ITGA4, MAGEB2, ERBB2, SRGN, GNAS, TJP2, KCNJ15, SLC25A31, ZNF573, TNFRSF25, APC, KCNQ1, LAMC2, SPHK1, DNAJA4, APC, MBD2, ERCC1, HLA-G, CXADR, TP53, ACTB, KL, SMAD3, HIST1H2AG, CPEB4
    • c) WT1, DLX2, SALL3, TERT, TNFRSF25, ACTB, SMAD3, CPEB4
    • d) WT1, DLX2, SALL3, TERT, PITX2, TNFRSF25, KL, ACTB, SMAD3, CPEB4
    • e) WT1, PITX2, SALL3, DLX2, TERT, HOXA10, RASSF1, SPARC, IRAK2, ZNF711, DNAJA4, HLA-G, CXADR, TP53, ACTB, CPEB4
    • f) WT1, PITX2, SALL3, F2R, TERT, HOXA10, RASSF1, SPARC, IRAK2, ZNF711, DRD2, DNAJA4, CXADR, TP53, ACTB, CPEB4
    • g) WT1, ACTB, DLX2, PITX2, SALL3, HOXA10, TERT, CPEB4, HLA-G, SPARC, RASSF1, DNAJA4, CXADR, TP53, IRAK2, ZNF711
    • h) F2R, ZNF256, CDH13, SERPINB5, KRT14, DLX2, AREG, THRB, HSD17B4, SPARC, HECW2, COL21A1
    • i) KL, HIST1H2AG, TJP2, SRGN, CDX1, TNFRSF25, APC, HIC1, APC, GNA15, ACTB, WT1, KRT17, AIM1L, DPH1, PITX2, PITX2, KIF5B, BMP2K, GBP2, NHLH2, GDNF, BOLL
    • j) WT1, DLX2, SALL3, TERT, PITX2, HOXA10, F2R, CPEB4, NHLH2, SMAD3, ACTB, HOXA1, BOLL, APC, MT1G, PENK, SPARC, DNAJA4, RASSF1, HLA-G, ERCC1, ONECUT2, APC, ABCB1, ZNF573, KCNJ15, ZDHHC11, SFRP2, GDNF, PTTG1, SERPINI1, TNFRSF10C
    • k) HOXA10, NEUROD1
    • l) WT1, PITX2, SALL3, F2R, TERT, HOXA10, RASSF1, SPARC, IRAK2, ZNF711, DRD2, DNAJA4, CXADR, TP53, ACTB, CPEB4, DLX2, TN-FR5F25, KL, SMAD3
    • m) TNFRSF25, SALL3, RASSF1, TERT, SPARC, F2R, HOXA10, ZNF711, PITX2
    • n) SALL3, PITX2, SPARC, F2R, TERT, RASSF1, HOXA10, CXADR, KL
    • o) SALL3, SPARC, PITX2, F2R, TERT, RASSF1, HOXA10, KL
    • p) SALL3, PITX2, SPARC, F2R, HOXA10, DRD2, ACTB, DNAJA4, CXADR, KL
    • q) SALL3, SPARC, PITX2, F2R, TERT, RASSF1, HOXA10, TNFRSF25, DNAJA4, TP53, CXADR, KL
    • r) SPARC, SALL3, F2R, PITX2, RASSF1, HOXA10, TERT, KL, TNFRSF25
    • s) SALL3, SPARC, PITX2, F2R, TERT, RASSF1, HOXA10, KL, TN-FR5F25, CXADR
    • t) HOXA10, RASSF1, F2R
    • or


a set of at least 50%, preferably at least 60%, at least 70%, at least 80%, at least 90%, 100% of the markers of anyone of the above a) to t). The present inventive set also includes sets with at least 50% of the above markers for each set since it is also possible to substitute parts of these subsets being specific for—in the case of binary conditions/differentiations—e.g. good or bad prognosis or distinguish between lung cancer or lung cancer types, wherein one part of the subset points into one direction for a certain lung cancer type or cancer/differentiation. It is possible to further complement the 50% part of the set by additional markers specific for diagnosing lung cancer or determining the other part of the good or bad differentiation or differentiation between two lung cancer types. Methods to determine such complementing markers follow the general methods as outlined herein.


Each of these marker subsets is particularly suitable to diagnose lung cancer or lung cancer type or distinguish between certain cancers, samples or cancer types in a methylation specific assay of these genes.


The inventive primers or probes may be of any nucleic acid, including RNA, DNA, PNA (peptide nucleic acids), LNA (locked nucleic acids). The probes might further comprise methylation specific moieties.


The present invention provides a (master) set of 360 marker genes, further also specific gene locations by the PCR products of these genes wherein significant methylation can be detected, as well as subsets therefrom with a certain diagnostic value to detect or diagnose lung cancer or distinguish lung cancer type(s). Preferably the set is optimized for a lung cancer or a lung cancer type. Lung cancer types include, without being limited thereto, adenocarcinoma and squamous cell carcinoma. Further indicators differentiating between disease(s), including the diagnosis of any type of lung cancer or lung tumor, or between tumor type(s) are e.g. benign (non (or limited) proliferative) or malignant, metastatic or non-metastatic. The set can also be optimized for a specific sample type in which the methylated DNA is tested. Such samples include blood, urine, saliva, hair, skin, tissues, in particular tissues of the cancer origin mentioned above, in particular lung tissue such as potentially affected or potentially cancerous lung tissue, or serum, sputum, bronchial lavage. The sample my be obtained from a patient to be diagnosed. In preferred embodiments the test sample to be used in the method of identifying a subset is from the same type as a sample to be used in the diagnosis.


In practice, probes specific for potentially aberrant methylated regions are provided, which can then be used for the diagnostic method.


It is also possible to provide primers suitable for a specific amplification, like PCR, of these regions in order to perform a diagnostic test on the methylation state.


Such probes or primers are provided in the context of a set corresponding to the inventive marker genes or marker gene loci as given in table 1.


Such a set of primers or probes may have all 359 inventive markers present and can then be used for a multitude of different cancer detection methods. Of course, not all markers would have to be used to diagnose a lung cancer or lung cancer type. It is also possible to use certain subsets (or combinations thereof) with a limited number of marker probes or primers for diagnosis of certain categories of lung cancer.


Therefore, the present invention provides sets of primers or probes comprising primers or probes for any single marker subset or any combination of marker subsets disclosed herein. In the following sets of marker genes should be understood to include sets of primer pairs and probes therefor, which can e.g. be provided in a kit.


Set a, WT1, DLX2, SALL3, TERT, PITX2, HOXA10, F2R, CPEB4, NHLH2, SMAD3, ACTB, HOXA1, BOLL, APC, MT1G, PENK, SPARC, DNAJA4, RASSF1, HLA-G, ERCC1, ONECUT2, APC, ABCB1, ZNF573, KCNJ15, ZDHHC11, SFRP2, GDNF, PTTG1, SERPINI1, TNFRSF10C and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers are in particular suitable to detect lung cancer and to distinguish between normal lung tissue (non-cancerous) from lung tumor tissue.


Set b, WIT1, PITX2, SALL3, F2R, DLX2, TERT, HOXA10, MSH4, NHLH2, GNA15, PENK, RASSF1, BOLL, HOXA1, ONECUT2, ABCB1, SPARC, MT1G, HSPA2, SFRP2, PYCARD, GAD1, C5orf4, C5AR1, GDNF, ZDHHC11, SERPINE1, NKX2-1, PITX2, C5AR1, ZNF256, FAM43A, SFRP2, MT3, SERPINE1, CLIC4, TNFRSF10C, GABRA2, MTHFR, ESR2, NEUROG1, PITX2, PLAGL1, TMEFF2, PTTG1, CADM1, S100A8, EFS, JUB, ITGA4, MAGEB2, ERBB2, SRGN, GNAS, TJP2, KCNJ15, SLC25A31, ZNF573, TNFRSF25, APC, KCNQ1, LAMC2, SPHK1, DNAJA4, APC, MBD2, ERCC1, HLA-G, CXADR, TP53, ACTB, KL, SMAD3, HIST1H2AG, CPEB4 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers are also suitable to detect lung cancer and to distinguish between normal lung tissue and lung tumor tissue. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.


Set c, WT1, DLX2, SALL3, TERT, TNFRSF25, ACTB, SMAD3, CPEB4 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers are suitable to detect lung cancer and to distinguish between normal lung tissue (non-cancerous) from lung tumor tissue. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.


Set d, WT1, DLX2, SALL3, TERT, PITX2, TNFRSF25, KL, ACTB, SMAD3, CPEB4 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers are in particular suitable to detect lung cancer and to distinguish between normal lung tissue (non-cancerous) from lung tumor tissue. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.


Set e, WT1, PITX2, SALL3, DLX2, TERT, HOXA10, RASSF1, SPARC, IRAK2, ZNF711, DNAJA4, HLA-G, CXADR, TP53, ACTB, CPEB4 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers are also suitable to detect lung cancer and to distinguish between normal lung tissue (non-cancerous) from lung tumor tissue. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.


Set f, WT1, PITX2, SALL3, F2R, TERT, HOXA10, RASSF1, SPARC, IRAK2, ZNF711, DRD2, DNAJA4, CXADR, TP53, ACTB, CPEB4 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to detect lung cancer and to distinguish between normal lung tissue (non-cancerous) from lung tumor tissue. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.


Set g, WT1, ACTB, DLX2, PITX2, SALL3, HOXA10, TERT, CPEB4, HLA-G, SPARC, RASSF1, DNAJA4, CXADR, TP53, IRAK2, ZNF711 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung carcinoma, in particular using blood samples, e.g. to distinguish blood from healthy persons from tumor samples, including tumor tissue sample or blood from tumor patients. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.


Set h, F2R, ZNF256, CDH13, SERPINB5, KRT14, DLX2, AREG, THRB, HSD17B4, SPARC, HECW2, COL21A1 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer and distinguish the grade of differentiation of poor, moderate and well predictions. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.


Set i, KL, HIST1H2AG, TJP2, SRGN, CDX1, TNFRSF25, APC, HIC1, APC, GNA15, ACTB, WT1, KRT17, AIM1L, DPH1, PITX2, PITX2, KIF5B, BMP2K, GBP2, NHLH2, GDNF, BOLL and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer and distinguish between malign states (in particular adenocarcinoma and squamous cell carcinoma) together with lung tissue against healthy blood or serum samples. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.


Set j, WT1, DLX2, SALL3, TERT, PITX2, HOXA10, F2R, CPEB4, NHLH2, SMAD3, ACTB, HOXA1, BOLL, APC, MT1G, PENK, SPARC, DNAJA4, RASSF1, HLA-G, ERCC1, ONECUT2, APC, ABCB1, ZNF573, KCNJ15, ZDHHC11, SFRP2, GDNF, PTTG1, SERPINI1, TNFRSF10C and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose, lung cancer and distinguish between malign states selected from adenocarcinoma and squamous cell carcinoma from healthy lung tissue. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.


Set k, HOXA10, NEUROD1 and/or either HOXA10 or NEUR001 can be used to diagnose lung cancer and further to distinguish between adenocarcinoma from squamous cell carcinoma.


Set l, WT1, PITX2, SALL3, F2R, TERT, HOXA10, RASSF1, SPARC, IRAK2, ZNF711, DRD2, DNAJA4, CXADR, TP53, ACTB, CPEB4, DLX2, TNFRSF25, KL, SMAD3 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer and distinguish between cancerous lung tissue from healthy lung tissue.


Set m, TNFRSF25, SALL3, RASSF1, TERT, SPARC, F2R, HOXA10, ZNF711, PITX2 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer and distinguish between cancerous lung tissue from healthy lung tissue. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.


Set n, SALL3, PITX2, SPARC, F2R, TERT, RASSF1, HOXA10, CXADR, KL and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer and distinguish between cancerous lung tissue from healthy lung tissue. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.


Set o, SALL3, SPARC, PITX2, F2R, TERT, RASSF1, HOXA10, KL and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer and distinguish between cancerous lung tissue from healthy lung tissue. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.


Set p, SALL3, PITX2, SPARC, F2R, HOXA10, DRD2, ACTB, DNAJA4, CXADR, KL and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer and to distinguish between normal lung tissue (non-cancerous) from lung tumor tissue.


Set q, SALL3, SPARC, PITX2, F2R, TERT, RASSF1, HOXA10, TNFRSF25, DNAJA4, TP53, CXADR, KL and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer and to distinguish between normal lung tissue (non-cancerous) from lung tumor tissue. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.


Set r, SPARC, SALL3, F2R, PITX2, RASSF1, HOXA10, TERT, KL, TNFRSF25 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer, distinguish between adenocarcinoma, healthy lung tissue and squamous cell carcinoma. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.


Set s, SALL3, SPARC, PITX2, F2R, TERT, RASSF1, HOXA10, KL, TNFRSF25, CXADR and 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer, distinguish adenocarcinoma and squamous cell carcinoma from healthy (benign) lung tissue. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.


Set t, HOXA10, RASSF1, F2R and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer, distinguish between adenocarcinoma and squamous cell carcinoma. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.


Also provided are combinations of the above mentioned subsets a) to t), in particular sets comprising markers of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more of these subsets, preferably for the lung cancer type or preferably complete sets a) to t). One preferred set comprises gene markers WT1, SALL3, TERT, ACTB and CPEB4. These markers are common in a set for the diagnosis of lung cancer and suitable to distinguish normal from lung cancer samples. This set preferably is supplemented by the marker genes DLX2, TNFRSF25 or SMAD3. Furthermore, the inventive set may comprise any one of the markers ABCB1, ACTB, AIM1L, APC, AREG, BMP2K, BOLL, C5AR1, C5orf4, CADM1, CDH13, CDX1, CLIC4, COL21A1, CPEB4, CXADR, DLX2, DNAJA4, DPH1, DRD2, EFS, ERBB2, ERCC1, ESR2, F2R, FAM43A, GABRA2, GAD1, GBP2, GDNF, GNA15, GNAS, HECW2, HIC1, HIST1H2AG, HLA-G, HOXA1, HOXA10, HSD17B4, HSPA2, IRAK2, ITGA4, JUB, KCNJ15, KCNQ1, KIF5B, KL, KRT14, KRT17, LAMC2, MAGEB2, MBD2, MSH4, MT1G, MT3, MTHFR, NEUROD1, NHLH2, NKX2-1, ONECUT2, PENK, PITX2, PLAGL1, PTTG1, PYCARD, RASSF1, S100A8, SALL3, SERPINB5, SERPINE1, SERPINI1, SFRP2, SLC25A31, SMAD3, SPARC, SPHK1, SRGN, TERT, THRB, TJP2, TMEFF2, TNFRSF10C, TNFRSF25, TP53, ZDHHC11, ZNF256, ZNF711, F2R, HOXA10, KL, SALL3, SPARC, TNFRSF25, WT1 or any combination thereof, in particular preferred are markers ACTB, APC, CPEB4, CXADR, DLX2, DNAJA4, F2R, HOXA10, KL, PITX2, RASSF1, SALL3, SPARC, TERT, (either TNFRSF10C or TNFRSF25 or both), WT1 or any combination thereof, even more preferred are markers HOXA10, PITX2, RASSF1, SALL3, SPARC, TERT or any combination thereof, in a marker set according to the present invention, in particular as additional markers for any one of sets a) to t), especially the marker set of markers WT1, SALL3, TERT, ACTB and CPEB4.


According to a preferred embodiment of the present invention, the methylation of at least two genes, preferably of at least three genes, especially of at least four genes, is determined. Specifically if the present invention is provided as an array test system, at least ten, especially at least fifteen genes, are preferred. In preferred test set-ups (for example in microarrays (“gene-chips”)) preferably at least 20, even more preferred at least 30, especially at least 40 genes, are provided as test markers. As mentioned above, these markers or the means to test the markers can be provided in a set of probes or a set of primers, preferably both.


In a further embodiment the set comprises up to 100000, up to 90000, up to 80000, up to 70000, up to 60000 or 50000 probes or primer pairs (set of two primers for one amplification product), preferably up to 40000, up to 35000, up to 30000, up to 25000, up to 20000, up to 15000, up to 10000, up to 7500, up to 5000, up to 3000, up to 2000, up to 1000, up to 750, up to 500, up to 400, up to 300, or even more preferred up to 200 probes or primers of any kind, particular in the case of immobilized probes on a solid surface such as a chip.


In certain embodiments the primer pairs and probes are specific for a methylated upstream region of the open reading frame of the marker genes.


Preferably the probes or primers are specific for a methylation in the genetic regions defined by SEQ ID NOs 1081 to 1440, including the adjacent up to 500 base pairs, preferably up to 300, up to 200, up to 100, up to 50 or up to 10 adjacent, corresponding to gene marker IDs 1 to 359 of table 1, respectively. I.e. probes or primers of the inventive set (including the full 359 set, as well as subsets and combinations thereof) are specific for the regions and gene loci identified in table 1, last column with reference to the sequence listing, SEQ ID NOs: 1081 to 1440. As can be seen these SEQ IDs correspond to a certain gene, the latter being a member of the inventive sets, in particular of the subsets a) to t), e.g.


Examples of specific probes or primers are given in table 1 with reference to the sequence listing, SEQ ID NOs 1 to 1080, which form especially preferred embodiments of the invention.


Preferably the set of the present invention comprises probes or primers for at least one gene or gene product of the list according to table 1, wherein at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, especially preferred at least 100%, of the total probes or primers are probes or primers for genes of the list according to table 1. Preferably the set, in particular in the case of a set of hybridization probes, is provided immobilized on a solid surface, preferably a chip or in form of a microarray. Since—according to current technology—detection means for genes on a chip allow easier and more robust array design, gene chips using DNA molecules (for detection of methylated DNA in the sample) is a preferred embodiment of the present invention. Such gene chips also allow detection of a large number of nucleic acids.


Preferably the set is provided on a solid surface, in particular a chip, whereon the primers or probes can be immobilized. Solid surfaces or chips may be of any material suitable for the immobilization of biomolecules such as the moieties, including glass, modified glass (aldehyde modified) or metal chips.


The primers or probes can also be provided as such, including lyophilized forms or being in solution, preferably with suitable buffers. The probes and primers can of course be provided in a suitable container, e.g. a tube or micro tube.


The present invention also relates to a method of identifying lung cancer or lung cancer type in a sample comprising DNA from a subject or patient, comprising obtaining a set of nucleic acid primers (or primer pairs) or hybridization probes as defined above (comprising each specific subset or combinations thereof), determining the methylation status of the genes in the sample for which the members of the set are specific for and comparing the methylation status of the genes with the status of a confirmed lung cancer or lung cancer type positive and/or negative state, thereby identifying the lung cancer or lung cancer type in the sample. In general the inventive method has been described above and all preferred embodiments of such methods also apply to the method using the set provided herein.


The inventive marker set, including certain disclosed subsets and subsets, which can be identified with the methods disclosed herein, are suitable to diagnose lung cancer and distinguish between different lung cancer forms, in particular for diagnostic or prognostic uses. Preferably the markers used (e.g. by utilizing primers or probes of the inventive set) for the inventive diagnostic or prognostic method may be used in smaller amounts than e.g. in the set (or kit) or chip as such, which may be designed for more than one fine tuned diagnosis or prognosis. The markers used for the diagnostic or prognostic method may be up to 100000, up to 90000, up to 80000, up to 70000, up to 60000 or 50000, preferably up to 40000, up to 35000, up to 30000, up to 25000, up to 20,000, up to 15000, up to 10000, up to 7500, up to 5000, up to 3000, up to 2000, up to 1000, up to 750, up to 500, up to 400, up to 300, up to 200, up to 100, up to 80, or even more preferred up to 60. The inventive set of marker primers or probes can be employed in chip (immobilised) based assays, products or methods, or in PCR based kits or methods. Both, PCR and hybridisation (e.g. on a chip) can be used to detect methylated genes.


The inventive marker set, including certain disclosed subsets, which can be identified with the methods disclosed herein, are suitable to distinguish between lung cancer from normal tissue, in particular for diagnostic or prognostic uses.


The inventive marker set, including certain disclosed subsets, which can be identified with the methods disclosed herein, are suitable to distinguish between adenocarcinoma from squamous cell carcinoma, in particular for diagnostic or prognostic uses.


The present invention is further illustrated by the following examples, without being restricted thereto.





FIGURES


FIG. 1: Cross-Validation ROC curve from the Bayesian Compound Covariate Predictor.





EXAMPLES
Example 1
Gene List









TABLE 1







360 master set (with the 359 marker genes and one control)


and sequence annotation
















hybrid-
primer
primer






isation
1
2
PCR





probe
(lp)
(rp)
product




alt.
(SEQ
(SEQ
(SEQ
(SEQ


gene
Gene
Gene
ID
ID
ID
ID


ID
Symbol
Symbol
NO:)
NO:)
NO:)
NO:)
















1
NHLH2
NHLH2
1
361
721
1081


2
MTHFR
MTHFR
2
362
722
1082


3
PRDM2
RIZ1
3
363
723
1083




(PRDM2)






4
MLLT11
MLLT11
4
364
724
1084


5
S100A9
control_
5
365
725
1085




S100A9






6
S100A9
S100A9
6
366
726
1086


7
S100A8
S100A8
7
367
727
1087


8
S100A8
control_
8
368
728
1088




S100A8






9
S100A2
S100A2
9
369
729
1089


10
LMNA
LMNA
10
370
730
1090


11
DUSP23
DUSP23
11
371
731
1091


12
LAMC2
LAMC2
12
372
732
1092


13
PTGS2
PTGS2
13
373
733
1093


14
MARK1
MARK1
14
374
734
1094


15
DUSP10
DUSP10
15
375
735
1095


16
PARP1
PARP1
16
376
736
1096


17
PSEN2
PSEN2
17
377
737
1097


18
CLIC4
CLIC4
18
378
738
1098


19
RUNX3
RUNX3
19
379
739
1099


20
AIM1L
NM_
20
380
740
1100




017977






21
SFN
SFN
21
381
741
1101


22
RPA2
RPA2
22
382
742
1102


23
TP73
TP73
23
383
743
1103


24
TP73
p73
24
384
744
1104


25
POU3F1
01.10.06
25
385
745
1105


26
MUTYH
MUTYH
26
386
746
1106


27
UQCRH
UQCRH
27
387
747
1107


28
FAF1
FAF1
28
388
748
1108


29
TACSTD2
TACSTD2
29
389
749
1109


30
TNFRSF25
TNFRSF25
30
390
750
1110


31
DIRAS3
DIRAS3
31
391
751
1111


32
MSH4
MSH4
32
392
752
1112


33
GBP2
Control
33
393
753
1113


34
GBP2
GBP2
34
394
754
1114


35
LRRC8C
LRRC8C
35
395
755
1115


36
F3
F3
36
396
756
1116


37
NANOS1
NM_
37
397
757
1117




001009553






38
MGMT
MGMT
38
398
758
1118


39
EBF3
EBF3
39
399
759
1119


40
DCLRE1C
DCLRE1C
40
400
760
1120


41
KIF5B
KIF5B
41
401
761
1121


42
ZNF22
ZNF22
42
402
762
1122


43
PGBD3
ERCC6
43
403
763
1123


44
SRGN
Control
44
404
764
1124


45
GATA3
GATA3
45
405
765
1125


46
PTEN
PTEN
46
406
766
1126


47
MMS19
MMS19L
47
407
767
1127


48
SFRP5
SFRP5
48
408
768
1128


49
PGR
PGR
49
409
769
1129


50
ATM
ATM
50
410
770
1130


51
DRD2
DRD2
51
411
771
1131


52
CADM1
IGSF4
52
412
772
1132


53
TEAD1
Control
53
413
773
1133


54
OPCML
OPCML
54
414
774
1134


55
CALCA
CALCA
55
415
775
1135


56
CTSD
CTSD
56
416
776
1136


57
MYOD1
MYOD1
57
417
777
1137


58
IGF2
IGF2
58
418
778
1138


59
BDNF
BDNF
59
419
779
1139


60
CDKN1C
CDKN1C
60
420
780
1140


61
WT1
WT1
61
421
781
1141


62
HRAS
HRAS1
62
422
782
1142


63
DDB1
DDB1
63
423
783
1143


64
GSTP1
GSTP1
64
424
784
1144


65
CCND1
CCND1
65
425
785
1145


66
EPS8L2
EPS8L2
66
426
786
1146


67
PIWIL4
PIWIL4
67
427
787
1147


68
CHST11
CHST11
68
428
788
1148


69
UNG
UNG
69
429
789
1149


70
CCDC62
CCDC62
70
430
790
1150


71
CDK2AP1
CDK2AP1
71
431
791
1151


72
CHFR
CHFR
72
432
792
1152


73
GRIN2B
GRIN2B
73
433
793
1153


74
CCND2
CCND2
74
434
794
1154


75
VDR
VDR
75
435
795
1155


76
B4GALNT3
control
76
436
796
1156




(wrong chr








of HRAS1)






77
NTF3
NTF3
77
437
797
1157


78
CYP27B1
CYP27B1
78
438
798
1158


79
GPR92
GPR92
79
439
799
1159


80
ERCC5
ERCC5
80
440
800
1160


81
GJB2
GJB2
81
441
801
1161


82
BRCA2
BRCA2
82
442
802
1162


83
KL
KL
83
443
803
1163


84
CCNA1
CCNA1
84
444
804
1164


85
SMAD9
SMAD9
85
445
805
1165


86
C13orf15
RGC32
86
446
806
1166


87
DGKH
DGKH
87
447
807
1167


88
DNAJC15
DNAJC15
88
448
808
1168


89
RB1
RB1
89
449
809
1169


90
RCBTB2
RCBTB2
90
450
810
1170


91
PARP2
PARP2
91
451
811
1171


92
APEX1
APEX1
92
452
812
1172


93
JUB
JUB
93
453
813
1173


94
JUB
control_
94
454
814
1174




NM_19808






95
EFS
EFS
95
455
815
1175


96
BAZ1A
BAZ1A
96
456
816
1176


97
NKX2-1
TITF1
97
457
817
1177


98
ESR2
ESR2
98
458
818
1178


99
HSPA2
HSPA2
99
459
819
1179


100
PSEN1
PSEN1
100
460
820
1180


101
PGF
PGF
101
461
821
1181


102
MLH3
MLH3
102
462
822
1182


103
TSHR
TSHR
103
463
823
1183


104
THBS1
THBS1
104
464
824
1184


105
MYO5C
MYO5C
105
465
825
1185


106
SMAD6
SMAD6
106
466
826
1186


107
SMAD3
SMAD3
107
467
827
1187


108
NOX5
SPESP1
108
468
828
1188


109
DNAJA4
DNAJA4
109
469
829
1189


110
CRABP1
CRABP1
110
470
830
1190


111
BCL2A1
BCL2A1
111
471
831
1191


112
BCL2A1
BCL2A1
112
472
832
1192


113
BNC1
BNC1
113
473
833
1193


114
ARRDC4
ARRDC4
114
474
834
1194


115
SOCS1
SOCS1
115
475
835
1195


116
ERCC4
ERCC4
116
476
836
1196


117
NTHL1
NTHL1
117
477
837
1197


118
PYCARD
PYCARD
118
478
838
1198


119
AXIN1
AXIN1
119
479
839
1199


120
CYLD
NM_015247
120
480
840
1200


121
MT3
MT3
121
481
841
1201


122
MT1A
MT1A
122
482
842
1202


123
MT1G
MT1G
123
483
843
1203


124
CDH1
CDH1
124
484
844
1204


125
CDH13
CDH13
125
485
845
1205


126
DPH1
DPH1
126
486
846
1206


127
HIC1
HIC1
127
487
847
1207


128
NEUROD2
control_
128
488
848
1208




NEUROD2






129
NEUROD2
NEUROD2
129
489
849
1209


130
ERBB2
ERBB2
130
490
850
1210


131
KRT19
KRT19
131
491
851
1211


132
KRT14
KRT14
132
492
852
1212


133
KRT17
KRT17
133
493
853
1213


134
JUP
JUP
134
494
854
1214


135
BRCA1
BRCA1
135
495
855
1215


136
COL1A1
COL1A1
136
496
856
1216


137
CACNA1G
CACNA1G
137
497
857
1217


138
PRKAR1A
PRKAR1A
138
498
858
1218


139
SPHK1
SPHK1
139
499
859
1219


140
SOX15
SOX15
140
500
860
1220


141
TP53
TP53_
141
501
861
1221




CGI23_1kb






142
TP53
TP53_
142
502
862
1222




bothCGIs_








1kb






143
TP53
TP53_
143
503
863
1223




CGI36_1kb






144
TP53
TP53
144
504
864
1224


145
NPTX1
NPTX1
145
505
865
1225


146
SMAD2
SMAD2
146
506
866
1226


147
DCC
DCC
147
507
867
1227


148
MBD2
MBD2
148
508
868
1228


149
ONECUT2
ONECUT2
149
509
869
1229


150
BCL2
BCL2
150
510
870
1230


151
SERPINB5
SERPINB5
151
511
871
1231


152
SERPINB2
Control
152
512
872
1232


153
SERPINB2
SERPINB2
153
513
873
1233


154
TYMS
TYMS
154
514
874
1234


155
LAMA1
LAMA1
155
515
875
1235


156
SALL3
SALL3
156
516
876
1236


157
LDLR
LDLR
157
517
877
1237


158
STK11
STK11
158
518
878
1238


159
PRDX2
PRDX2
159
519
879
1239


160
RAD23A
RAD23A
160
520
880
1240


161
GNA15
GNA15
161
521
881
1241


162
ZNF573
ZNF573
162
522
882
1242


163
SPINT2
SPINT2
163
523
883
1243


164
XRCC1
XRCC1
164
524
884
1244


165
ERCC2
ERCC2
165
525
885
1245


166
ERCC1
ERCC1
166
526
886
1246


167
C5AR1
NM_001736
167
527
887
1247


168
C5AR1
C5AR1
168
528
888
1248


169
POLD1
POLD1
169
529
889
1249


170
ZNF350
ZNF350
170
530
890
1250


171
ZNF256
ZNF256
171
531
891
1251


172
C3
C3
172
532
892
1252


173
XAB2
XAB2
173
533
893
1253


174
ZNF559
ZNF559
174
534
894
1254


175
FHL2
FHL2
175
535
895
1255


176
IL1B
IL1B
176
536
896
1256


177
IL1B
control_IL1B
177
537
897
1257


178
PAX8
PAX8
178
538
898
1258


179
DDX18
DDX18
179
539
899
1259


180
GAD1
GAD1
180
540
900
1260


181
DLX2
DLX2
181
541
901
1261


182
ITGA4
ITGA4
182
542
902
1262


183
NEUROD1
NEUROD1
183
543
903
1263


184
STAT1
STAT1
184
544
904
1264


185
TMEFF2
TMEFF2
185
545
905
1265


186
HECW2
HECW2
186
546
906
1266


187
BOLL
BOLL
187
547
907
1267


188
CASP8
CASP8
188
548
908
1268


189
SERPINE2
SERPINE2
189
549
909
1269


190
NCL
NCL
190
550
910
1270


191
CYP1B1
CYP1B1
191
551
911
1271


192
TACSTD1
TACSTD1
192
552
912
1272


193
MSH2
MSH2
193
553
913
1273


194
MSH6
MSH6
194
554
914
1274


195
MXD1
MXD1
195
555
915
1275


196
JAG1
JAG1
196
556
916
1276


197
FOXA2
FOXA2
197
557
917
1277


198
THBD
THBD
198
558
918
1278


199
CTCFL
BORIS
199
559
919
1279


200
CTSZ
CTSZ
200
560
920
1280


201
GATA5
GATA5
201
561
921
1281


202
CXADR
CXADR
202
562
922
1282


203
APP
APP
203
563
923
1283


204
TTC3
TTC3
204
564
924
1284


205
KCNJ15
Control
205
565
925
1285


206
RIPK4
RIPK4
206
566
926
1286


207
TFF1
TFF1
207
567
927
1287


208
SEZ6L
SEZ6L
208
568
928
1288


209
TIMP3
TIMP3
209
569
929
1289


210
BIK
BIK
210
570
930
1290


211
VHL
VHL
211
571
931
1291


212
IRAK2
IRAK2
212
572
932
1292


213
PPARG
PPARG
213
573
933
1293


214
MBD4
MBD4
214
574
934
1294


215
RBP1
RBP1
215
575
935
1295


216
XPC
XPC
216
576
936
1296


217
ATR
ATR
217
577
937
1297


218
LXN
LXN
218
578
938
1298


219
RARRES1
RARRES1
219
579
939
1299


220
SERPINI1
SERPINI1
220
580
940
1300


221
CLDN1
CLDN1
221
581
941
1301


222
FAM43A
FAM43A
222
582
942
1302


223
IQCG
IQCG
223
583
943
1303


224
THRB
THRB
224
584
944
1304


225
RARB
RARB
225
585
945
1305


226
TGFBR2
TGFBR2
226
586
946
1306


227
MLH1
MLH1
227
587
947
1307


228
DLEC1
DLEC1
228
588
948
1308


229
CTNNB1
CTNNB1
229
589
949
1309


230
ZNF502
ZNF502
230
590
950
1310


231
SLC6A20
SLC6A20
231
591
951
1311


232
GPX1
GPX1
232
592
952
1312


233
RASSF1
RASSF1A
233
593
953
1313


234
FHIT
FHIT
234
594
954
1314


235
OGG1
OGG1
235
595
955
1315


236
PITX2
PITX2
236
596
956
1316


237
SLC25A31
SLC25A31
237
597
957
1317


238
FBXW7
FBXW7
238
598
958
1318


239
SFRP2
SFRP2
239
599
959
1319


240
CHRNA9
CHRNA9
240
600
960
1320


241
GABRA2
GABRA2
241
601
961
1321


242
MSX1
MSX1
242
602
962
1322


243
IGFBP7
IGFBP7
243
603
963
1323


244
EREG
EREG
244
604
964
1324


245
AREG
AREG
245
605
965
1325


246
ANXA3
ANXA3
246
606
966
1326


247
BMP2K
BMP2K
247
607
967
1327


248
APC
APC
248
608
968
1328


249
HSD17B4
HSD17B4
249
609
969
1329


250
HSD17B4
HSD17B4
250
610
970
1330


251
LOX
LOX
251
611
971
1331


252
TERT
TERT
252
612
972
1332


253
NEUROG1
NEUROG1
253
613
973
1333


254
NR3C1
NR3C1
254
614
974
1334


255
ADRB2
ADRB2
255
615
975
1335


256
CDX1
CDX1
256
616
976
1336


257
SPARC
SPARC
257
617
977
1337


258
C5orf4
Control
258
618
978
1338


259
PTTG1
PTTG1
259
619
979
1339


260
DUSP1
DUSP1
260
620
980
1340


261
CPEB4
CPEB4
261
621
981
1341


262
SCGB3A1
SCGB3A1
262
622
982
1342


263
GDNF
GDNF
263
623
983
1343


264
ERCC8
ERCC8
264
624
984
1344


265
F2R
F2R
265
625
985
1345


266
F2RL1
F2RL1
266
626
986
1346


267
VCAN
CSPG2
267
627
987
1347


268
ZDHHC11
ZDHHC11
268
628
988
1348


269
RHOBTB3
RHOBTB3
269
629
989
1349


270
PLAGL1
PLAGL1
270
630
990
1350


271
SASH1
SASH1
271
631
991
1351


272
ULBP2
ULBP2
272
632
992
1352


273
ESR1
ESR1
273
633
993
1353


274
RNASET2
RNASET2
274
634
994
1354


275
DLL1
DLL1
275
635
995
1355


276
HIST1H2AG
HIST1H2AG
276
636
996
1356


277
HLA-G
HLA-G
277
637
997
1357


278
MSH5
MSH5
278
638
998
1358


279
CDKN1A
CDKN1A
279
639
999
1359


280
TDRD6
TDRD6
280
640
1000
1360


281
COL21A1
COL21A1
281
641
1001
1361


282
DSP
DSP
282
642
1002
1362


283
SERPINE1
SERPINE1
283
643
1003
1363


284
SERPINE1
SERPINE1
284
644
1004
1364


285
FBXL13
FBXL13
285
645
1005
1365


286
NRCAM
NRCAM
286
646
1006
1366


287
TWIST1
TWIST1
287
647
1007
1367


288
HOXA1
HOXA1
288
648
1008
1368


289
HOXA10
HOXA10
289
649
1009
1369


290
SFRP4
SFRP4
290
650
1010
1370


291
IGFBP3
IGFBP3
291
651
1011
1371


292
RPA3
RPA3
292
652
1012
1372


293
ABCB1
ABCB1
293
653
1013
1373


294
TFPI2
TFPI2
294
654
1014
1374


295
COL1A2
COL1A2
295
655
1015
1375


296
ARPC1B
ARPC1B
296
656
1016
1376


297
PILRB
PILRB
297
657
1017
1377


298
GATA4
GATA4
298
658
1018
1378


299
MAL2
NM_052886
299
659
1019
1379


300
DLC1
DLC1
300
660
1020
1380


301
EPPK1
NM_031308
301
661
1021
1381


302
LZTS1
LZTS1
302
662
1022
1382


303
TNFRSF10B
TNFRSF10B
303
663
1023
1383


304
TNFRSF10C
TNFRSF10C
304
664
1024
1384


305
TNFRSF10D
TNFRSF10D
305
665
1025
1385


306
TNFRSF10A
TNFRSF10A
306
666
1026
1386


307
WRN
WRN
307
667
1027
1387


308
SFRP1
SFRP1
308
668
1028
1388


309
SNAI2
SNAI2
309
669
1029
1389


310
RDHE2
RDHE2
310
670
1030
1390


311
PENK
PENK
311
671
1031
1391


312
RDH10
RDH10
312
672
1032
1392


313
TGFBR1
TGFBR1
313
673
1033
1393


314
ZNF462
ZNF462
314
674
1034
1394


315
KLF4
KLF4
315
675
1035
1395


316
CDKN2A
p14_
316
676
1036
1396




CDKN2A






317
CDKN2B
CDKN2B
317
677
1037
1397


318
AQP3
AQP3
318
678
1038
1398


319
TPM2
TPM2
319
679
1039
1399


320
TJP2
TJP2
320
680
1040
1400


321
TJP2
TJP2
321
681
1041
1401


322
PSAT1
PSAT1
322
682
1042
1402


323
DAPK1
DAPK1
323
683
1043
1403


324
SYK
SYK
324
684
1044
1404


325
XPA
XPA
325
685
1045
1405


326
ARMCX2
ARMCX2
326
686
1046
1406


327
RHOXF1
OTEX
327
687
1047
1407


328
FHL1
FHL1
328
688
1048
1408


329
MAGEB2
MAGEB2
329
689
1049
1409


330
TIMP1
TIMP1
330
690
1050
1410


331
AR
AR_humara
331
691
1051
1411


332
ZNF711
ZNF6
332
692
1052
1412


333
CD24
CD24
333
693
1053
1413


334
ABL1
ABL
334
694
1054
1414


335
ACTB
Aktin_VL
335
695
1055
1415


336
APC
APC
336
696
1056
1416


337
CDH1
Ecad1
337
697
1057
1417


338
CDH1
Ecad2
338
698
1058
1418


339
FMR1
FX
339
699
1059
1419


340
GNAS
GNASexAB
340
700
1060
1420


341
H19
H19
341
701
1061
1421


342
HIL1
Igf2
342
702
1062
1422


343
IGF2
Igf2
343
703
1063
1423


344
KCNQ1
LIT1
344
704
1064
1424


345
GNAS
NESP55
345
705
1065
1425


346
CDKN2A
P14
346
706
1066
1426


347
CDKN2B
P15
347
707
1067
1427


348
CDKN2A
P16_VL
348
708
1068
1428


349
PITX2
PitxA
349
709
1069
1429


350
PITX2
PitxB
350
710
1070
1430


351
PITX2
PitxC
351
711
1071
1431


352
PITX2
PitxD
352
712
1072
1432


353
RB1
Rb
353
713
1073
1433


354
SFRP2
SFRP2_VL
354
714
1074
1434


355
SNRPN
SNRPN
355
715
1075
1435


356
XIST
XIST
356
716
1076
1436


357
IRF4
chr6_
357
717
1077
1437




control






358
UNC13B
chr9_
358
718
1078
1438




control






359
GSTP1
GSTP1
360
720
1080
1440


360
Lamda
lambda_
359
719
1079
1439



(control)
PCR













Example 2
Samples

Samples from solid tumors were derived from initial surgical resection of primary tumors. Tumor tissue sections were derived from histopathology and histopathological data as well clinical data were monitored over the time of clinical management of the patients and/or collected from patient reports in the study center. Anonymised data and DNA were provided.


Example 3
Principle of the Assay and Design

The invention assay is a multiplexed assay for DNA methylation testing of up to (or even more than) 360 methylation candidate markers, enabling convenient methylation analyses for tumor-marker definition. In its best mode the test is a combined multiplex-PCR and microarray hybridization technique for multiplexed methylation testing. The inventive marker genes, PCR primer sequences, hybridization probe sequences and expected PCR products are given in table 1, above.


Targeting hypermethylated DNA regions in the inventive marker genes in several neoplasias, methylation analysis is performed via methylation dependent restriction enzyme (MSRE) digestion of 500 ng of starting DNA. A combination of several MSREs warrants complete digestion of unmethylated DNA. All targeted DNA regions have been selected in that way that sequences containing multiple MSRE sites are flanked by methylation independent restriction enzyme sites. This strategy enables pre-amplification of the methylated DNA fraction before methylation analyses. Thus, the design and pre-amplification would enable methylation testing on serum, urine, stool etc. when DNA is limiting.


When testing DNA without pre-amplification upon digestion of 500 ng the methylated DNA fraction is amplified within 16 multiplex PCRs and detected via microarray hybridization. Within these 16 multiplex-PCR reactions 360 different human DNA products can be amplified. From these about 20 amplicons serve as digestion & amplification controls and are either derived from known differentially methylated human DNA regions, or from several regions without any sites of MSREs used in this system. The primer set (every reverse primer is biotinylated) used is targeting 347 different sites located in the 5′UTR of 323 gene regions.


After PCR amplicons are pooled and positives are detected using strepavidin-Cy3 via microarray hybridization. Although the melting temperature of CpG rich DNA is very high, primer and probe-design as well as hybridization conditions have been optimized, thus this assay enables unequivocal multiplexed methylation testing of human DNA samples. The assay has been designed such that 24 samples can be run in parallel using 384well PCR plates.


Handling of many DNA samples in several plates in parallel can be easily performed enabling completion of analyses within 1-2 days.


The entire procedure provides the user to setup a specific PCR test and subsequent gel-based or hybridization-based testing of selected markers using single primer-pairs or primer-subsets as provided herein or identified by the inventive method from the 360 marker set.


Example 4
MSRE Digestion of DNA

MSRE digestion of DNA (about 500 ng) was performed at 37° C. over night in a volume of 30 μl in 1× Tango-restriction enzyme digestion buffer (MBI Fermentas) using 8 units of each MSREs AciI (New England Biolabs), Hin 6 I and Hpa II (both from MBI Fermentas). Digestions were stopped by heat inactivation (10 min, 75° C.) and subjected to PCR amplification.


Example 5
PCR Amplification

An aliquot of 20 μl MSRE digested DNA (or in case of preamplification of methylated DNA—see below—about 500 ng were added in a volume of 20 μl) was added to 280 μl of PCR-Premix (without primers). Premix consisted of all reagents obtaining a final concentration of 1× HotStarTaq Buffer (Qiagen); 160 μM dNT-Ps, 5% DMSO and 0.6 U Hot Firepol Taq (Solis Biodyne) per 20 μl reaction. Alternatively an equal amount of HotStarTaq (Qiagen) could be used. Eighteen (18) μl of the Pre-Mix including digested DNA were aliquoted in 16 0.2 ml PCR tubes and to each PCR tube 2 μl of each primer-premix 1-16 (containing 0.83pmol/μl of each primer) were added. PCR reactions were amplified using a thermal cycling profile of 15 min/95° C. and 40 cycles of each 40 sec/95° C., 40 sec/65° C., 1 min20 sec/72° C. and a final elongation of 7 min/72° C., then reactions were cooled. After amplification the 16 different multiplex-PCR amplicons from each DNA sample were pooled. Successful amplification was controlled using 10 μl of the pooled 16 different PCR reactions per sample. Positive amplification obtained a smear in the range of 100-300 bp on EtBr stained agarose gels; negative amplification controls must not show a smear in this range.


Example 6
Microarray Hybridization and Detection

Microarrays with the probes of the 360 marker set are blocked for 30 min in 3M Urea containing 0.1% SDS, at room temperature submerged in a stirred choplin char. After blocking slides are washed in 0.1×SSC/0.2% SDS for 5 min, dipped into water and dried by centrifugation.


The PCR-amplicon-pool of each sample is mixed with an equal amount of 2× hybridization buffer (7×SSC, 0.6% SDS, 50% formamide), desaturated for 5 min at 95° C. and held at 70° C. until loading an aliquot of 100 μl onto an array covered by a gasket slide (Agilent). Arrays are hybridized under maximum speed of rotation in an Agilent-hybridization oven for 16 h at 52° C. After removal of gasket-slides microarray-slides are washed at room temperature in wash-solution I (1×SSC, 0.2% SDS) for 5 min and wash solution II (0.1×SSC, 0.2% SDS) for 5 min, and a final wash by dipping the slides 3 times into wash solution III (0.1×SSC), the slides are dried by centrifugation.


For detection of hybridized biotinylated PCR amplicons, streptavidin-Cy3-conjugate (Caltag Laboratories) is diluted 1:400 in PBST-MP (1×PBS, 0.1% Tween 20; 1% skimmed dry milk powder [Sucofin; Germany]), pipetted onto microarrays covered with a coverslip and incubated 30 min at room temperature in the dark. Then coverslips are washed off from the slides using PBST (1×PBS, 0.1% Tween 20) and then slides are washed in fresh PEST for 5 min, rinsed with water and dried by centrifugation.


Example 7
DNA Preamplification for Methylation Profiling (Optional)

In many situations DNA amount is limited. Although the inventive methylation test is performing well with low amounts of DNA (see above), especially minimal invasive testing using cell free DNA from serum, stool, urine, and other body fluids is of diagnostic relevance.


Samples can be preamplified prior methylation testing as follows: DNA was digested with restriction enzyme FspI (and/or Csp6I, and/or MseI, and/or Tsp5091; or their isoschizomeres) and after (heat) inactivation of the restriction enzyme the fragments were circularized using T4 DNA ligase. Ligation-products were digested using a mixture of methylation sensitive restriction enzymes. Upon enzyme-inactivation the entire mixture was amplified using rolling circle amplification (RCA) by phi29-phage polymerase. The RCA-amplicons were then directly subjected to the multiplex-PCRs of the inventive methylation test without further need of digestion of the DNA prior amplification.


Alternatively the preamplified DNA which is enriched for methylated DNA regions can be directly subjected to fluorescent-labelling and the labeled products can be hybridized onto the microarrays using the same conditions as described above for hybridization of PCR products. Then the streptavidin-Cy3 detection step has to be omitted and slides should be scanned directly upon stringency washes and drying the slides. Based on the experimental design for microarray analyses, either single labeled or dual-labeled hybridizations might be generated. From our experiences we successfully used the single label-design for class comparisons. Although the preamplification protocol enables analyses of spurious amounts of DNA, it is also suited for performing genomic methylation screens.


To elucidate methylation biomarkers for prediction of meta-stasis risk on a genomewide level we subjected 500 ng of DNA derived from primary tumor samples to amplification of the methylated DNA using the procedure outlined above. RCA-amplicons derived from metastasized and non-metastasized samples were labelled using the CGH Labeling Kit (Enzo, Farmingdale, N.Y.) and labelled products hybridized onto human 244 k CpG island arrays (Agilent, Waldbronn, Germany). All manipulations were according the instructions of the manufacturers.


Example 8
Data Analysis

Hybridizations performed on a chip with probes for the inventive 360 marker genes were scanned using a GenePix 4000A scanner (Molecular Devices, Ismaning, Germany) with a PMT set-ting to 700V/cm (equal for both wavelengths). Raw image data were extracted using GenePix 6.0 software (Molecular Devices, Ismaning, Germany).


Microarray data analyses were performed using BRB-ArrayTools developed by Dr. Richard Simon and BRB-ArrayTools Development Team. The software package BRB Array Tools (version 3.6; in the www at linus.nci.nih.gov/BRB-ArrayTools.html) was used according recommendations of authors and settings used for analyses are delineated in the results if appropriate. For every hybridization, background intensities were subtracted from foreground intensities for each spot. Global normalization was used to median center the log-ratios on each array in order to adjust for differences in spot/label intensities.


P-values (p) used for feature selection for classification and prediction were based on the univariate significance levels (alpha). P-values (p) and mis-classification rate during cross validation (MCR) were given along the result data.


Example 9
Lung Cancer Test

DNA methylation analysis of 96 DNA samples derived from both normal and lung-tumour tissue of 48 patient samples and 8 DNA samples isolated from peripheral blood (PB) of healthy individuals were analysed for methylation deviations in the inventive set of 359 genes.


From this analysis DNA-methylation-biomarkers suitable for distinction of tumour and normal lung DNA as well as DNA-methylation-profiles from blood DNA of healthy controls were deduced. Diagnostic and prognostic markers subsets are suitable for diagnostic testing and presymptomatic screening for early detection of lung cancer were determined, in DNA derived from lung tissue, but also in DNA extracts from patients other than lung, like sputum, serum or plasma.


DNA Methylation testing results and data analyses of chip results as well as qPCR validation of a subset of markers derived from chip-based testing are provided.


DNA Samples analysed were from blood of 8 healthy individuals (PB), 19 tumours (AdenoCa, adenocarcinoma) and 19 normal lung tissue (N) of adenocarcinoma patients and 29 tumours (SqCCL, squamous cell carcinoma) and 29 normal lung tissue (N) of squamous cell carcinoma patients.


For DNA methylation testing 600 ng of DNA were digested and data derived from DNA-microarray hybridizations analysed using the BRB array tools statistical software package. Class comparison, and class prediction analysis were performed with respect to sample groups as listed above or for delineation of biomarkers for tumour samples both AdenoCa and SqCCL were treated as one tumour sample group (TU).


The design of the test enables methylation testing on DNA directly derived from the biological source. The test is also suitable for using a DNA preamplification upon MSRE digestion (as outlined above). Thus using the methylation specific preamplification of minute amounts of DNA samples, biomarker testing is feasible on small samples and limited amounts of DNA. Thus multiplexed PCR and methylation testing is easily performed on preamplified DNA obtained from these DNA samples. This strategy would improve also testing of serum, urine, stool, synovial fluid, sputum and other body fluids using the conceptual design of the methylation test.


The possibility of preamplification enables also differential methylation hybridization of the preamplified DNA itself. This option is warranted by the design of the test and the probes. Thus using the probes of the methylation test (or the array) for hybridization of labelled DNA after enrichment of either the methylated as well as the unmethylated DNA fractions of any DNA sample, can be used for methylation testing omitting the multiplex PCR.


In addition the biomarkers described herein could be applied for methylation testing using alternative approaches, e.g. methylation sensitive PCR and strategies which are sodium-bisulfite DNA deamination based and not based on MSRE digestion of DNA. These sets of methylation markers are suitable markers for disease-monitoring, -progression, -prediction, therapy-decision and -response.


Example 10
Biomarkers from Microarray-Testing of Patient Samples
Example 10a
CLASS COMPARISON: TU Vs. Normal: p<0.005, Unpaired Samples; 2 Fold Change

These list of methylation markers were found significant (p<0.005) between TU and N using “unpaired” statistical testing of DNA methylation of 48 tumour samples versus 48 healthy lung tissue samples. Significant markers with 2 fold difference of signal intensities of both classes with p<0.005 are listed.









TABLE 2







Sorted by p-value of the univariate test.


Class 1: N; Class 2: T.


The 32 genes are significant at the nominal 0.005 level of the


univariate test with the fold change 2

















Per-
Geom
Geom


















muta-
mean of
mean of
















Parametric

tion p-
intensities
intensities
Fold-
Gene



p-value
FDR
value
in class 1
in class 2
change
symbol

















1
<1e−07
<1e−07
<1e−07
1411.8016
13554.578246
0.1041568
WT1


2
<1e−07
<1e−07
<1e−07
85.5069224
1125.7940428
0.0759525
DLX2


3
<1e−07
<1e−07
  1e−07
852.3850013
7392.282404
0.1153074
SALL3


4
  1e−07
<1e−07
  1e−07
235.4745892
592.5077157
0.3974203
TERT


5
<1e−07
<1e−07
<1e−07
274.9097126
833.6648468
0.3297605
PITX2


6
<1e−07
<1e−07
<1e−07
80.5286413
265.3042755
0.3035331
HOXA10


7
<1e−07
<1e−07
<1e−07
112.6645619
855.6410585
0.1316727
F2R


8
  1e−07
4.5e−06 
<1e−07
2002.2452679
266.6906343
7.507745
CPED4


9
  4e−07
1.46e−05  
  1e−07
718.311462
4609.4380991
0.1558349
NHLH2


10
  4e−07
1.46e−05  
<1e−07
10347.8184959
3603.9811381
2.8712188
SMAD3


11
  5e−07
1.65e−05  
<1e−07
2993.3054637
1117.4218527
2.6787604
ACTB


12
2.8e−06
8.49e−05  
  1e−07
296.6448711
3941.769913
0.0752568
HOXA1


13
3.6e−06
0.0001008
<1e−07
2792.0699393
17199.6551909
0.1623329
BOLL


14
5.9e−06
0.0001342
<1e−07
8664.2840567
2178.4607085
3.9772506
APC


15
1.21e−05 
0.0002591
<1e−07
96.7848387
472.6945117
0.2047513
MT1G


16
1.36e−05 
0.000275
  1e−07
653.0579403
2188.6201533
0.298388
PENK


17
1.97e−05 
0.0003774
<1e−07
1710.9865406
4044.9737351
0.4229908
SPARC


18
3.16e−05 
0.0005751
<1e−07
1639.128227
811.4430136
2.0200164
DNAJA4


19
3.85e−05 
0.0006673
<1e−07
114.7065029
292.8694482
0.3916643
RASSF1


20
4.28e−05 
0.0007081
<1e−07
564.6571983
189.2105463
2.9842797
HLA-G


21
4.98e−05 
0.0007881
  1e−04
1339.8175413
446.1370253
3.0031525
ERCC1


22
  6e−05
0.00091
  1e−04
395.6248705
1158.1502714
0.3416006
ONECUT2


23
6.58e−05 
0.000958
<1e−07
2517.3232246
1024.0897145
2.4581081
APC


24
8.45e−05 
0.0011392
<1e−07
232.2537844
701.7843246
0.3309475
ABCB1


25
0.0002382
0.0029898
  1e−04
3027.5067641
1165.5391698
2.5975161
ZNF573


26
0.0003469
0.003946
<1e−07
360.9888133
148.6109072
2.4290869
KCNJ15


27
0.0003582
0.0039511
  3e−04
1818.1186026
4147.2970277
0.4383864
ZDHHC11


28
0.0012332
0.01192
0.0013
238.5488592
512.9101159
0.465089
SFRP2


29
0.0019349
0.0176076
0.0015
310.5591882
1215.8855725
0.2554181
GDNF


30
0.002818
0.0227945
0.0022
4930.1368809
2261.9370298
2.1796084
PTTG1


31
0.0038228
0.0267596
0.0045
2402.9850212
974.5347994
2.4657765
SERPIN-I1


32
0.0039256
0.0269326
0.0031
208.6539745
417.3186041
0.4999872
TN-FRSF10C









Example 10b
CLASS Prediction: TU Vs Normal: p<0.005, Unpaired Samples; 2Fold Change

Class prediction using different statistical methods for elucidating marker panels enabling best correct classification of TU and N (p<0.005).


Performance of Classifiers During Cross-Validation.






















Diagonal







Mean
Compound
Linear

3-

Support



Number of
Covariate
Discriminant
1-
Nearest
Nearest
Vector



genes in
Predictor
Analysis
Nearest
Neighbors
Centroid
Machines



classifier
Correct?
Correct?
Neighbor
Correct?
Correct?
Correct?






















Mean percent

100
100
98
98
98
98


of correct









classification:























TABLE 3







Composition of classifier: Sorted by t -value
















Geometric mean




Parametric

% CV
of intensities
Gene



p-value
t-value
support
(class N/class T)
symbol
















1

<1e−07

−10.859
100
0.1041568
WT1


2

<1e−07

−7.903
100
0.3297605
PITX2


3

<1e−07

−7.314
100
0.1153074
SALL3


4

<1e−07

−7.063
100
0.1316727
F2R


5

<1e−07

−7.028
100
0.0759525
DLX2


6

<1e−07

−6.592
100
0.3974203
TERT


7

<1e−07

−6.539
100
0.3035331
HOXA10


8

<1e−07

−6.495
100
0.7772068
MSH4


9

<1e−07

−6.357
100
0.1558349
NHLH2


10
  4e−07
−5.915
100
0.5405671
GNA15


11
  4e−07
−5.908
100
0.298388
PENK


12
 4.2e−06
−5.206
100
0.3916643
RASSF1


13
  5e−06
−5.155
100
0.1623329
BOLL


14
1.05e−05
−4.935
100
0.0752568
HOXA1


15
 3.1e−05
−4.61
100
0.3416006
ONECUT2


16
4.26e−05
−4.514
100
0.3309475
ABCB1


17
4.59e−05
−4.491
100
0.4229908
SPARC


18
4.96e−05
−4.467
100
0.2047513
MT1G


19
8.53e−05
−4.301
100
0.6381881
HSPA2


20
0.0002478
−3.966
100
0.465089
SFRP2


21
0.0002786
−3.929
100
0.7532617
PYCARD


22
0.0003286
−3.876
100
0.6491186
GAD1


23
0.0004296
−3.789
100
0.8137828
C5orf4


24
0.0004695
−3.76
100
0.7676414
C5AR1


25
0.0004699
−3.76
100
0.2554181
GDNF


26
0.0006369
−3.66
100
0.4383864
ZDHHC11


27
0.0008023
−3.584
100
0.8171479
SERPINE1


28
0.0009028
−3.544
100
0.6392075
NKX2-1


29
0.0009179
−3.539
100
0.5993327
PITX2


30
0.0010255
−3.501
100
0.7691876
C5AR1


31
0.0011267
−3.47
100
0.5118859
ZNF256


32
0.0014869
−3.375
100
0.5593175
FAM43A


33
0.0015714
−3.356
100
0.6862518
SFRP2


34
0.0019233
−3.287
100
0.3698669
MT3


35
0.0019731
−3.278
100
0.7715219
SERPINE1


36
0.0019838
−3.276
100
0.8088555
CLIC4


37
0.0023911
−3.21
100
0.4999872
TNFRSF10C


38
0.0027742
−3.158
92
0.8776257
GABRA2


39
0.0028024
−3.154
92
0.7069999
MTHFR


40
0.0030868
−3.12
81
0.6837301
ESR2


41
0.0033263
−3.093
79
0.6327604
NEUROG1


42
0.0036825
−3.057
67
0.6444277
PITX2


43
0.0044243
−2.99
44
0.732542
PLAGL1


44
0.004896 
−2.953
40
0.4992372
TMEFF2


45
0.0037996
3.046
65
2.1796084
PTTG1


46
0.0034628
3.079
73
1.1394289
CADM1


47
0.0024932
3.196
100
1.0870547
S100A8


48
0.0024284
3.205
100
1.3497772
EFS


49
0.0020087
3.271
100
1.2801593
JUB


50
0.0017007
3.329
100
1.1823596
ITGA4


51
0.0015061
3.371
100
1.5959594
MAGEB2


52
0.0013429
3.41
100
1.294098
ERBB2


53
0.0011103
3.475
100
1.3485708
SRGN


54
0.0007894
3.589
100
1.3193821
GNAS


55
0.0007437
3.609
100
1.9621539
TJP2


56
0.000457 
3.769
100
2.4290869
KCNJ15


57
0.0004291
3.789
100
1.3004513
SLC25A31


58
0.0001587
4.107
100
2.5975161
ZNF573


59
0.0001331
4.163
100
1.4996674
TNFRSF25


60
9.26e−05
4.276
100
2.4581081
APC


61
4.88e−05
4.472
100
1.9612086
KCNQ1


62
3.62e−05
4.564
100
1.4971047
LAMC2


63
1.82e−05
4.77
100
1.5467277
SPHK1


64
1.68e−05
4.794
100
2.0200164
DNAJA4


65
1.45e−05
4.838
100
3.9772506
APC


66
  9e−06
4.979
100
1.388284
MBD2


67
 8.6e−06
4.994
100
3.0031525
ERCC1


68
 4.5e−06
5.182
100
2.9842797
HLA-G


69
 4.2e−06
5.202
100
1.7516486
CXADR


70
 1.4e−06
5.521
100
1.9112579
TP53


71
 1.1e−06
5.605
100
2.6787604
ACTB


72
  9e−07
5.647
100
1.9365988
KL


73
  6e−07
5.755
100
2.8712188
SMAD3


74
  2e−07
6.05
100
1.4368727
HIST1H2AG


75
  2e−07
6.115
100
7.507745
CPEB4









Example 10c
4 Greedy Pairs>>92% Correct Using SVM (Support Vector Machine)

Using “4 pairs of methylation markers” derived from greedy pairs class prediction with supportive vector machines enables 92% correct classification of TU and N.


Performance of Classifiers During Cross-Validation.




















Diagonal







Compound
Linear



Support



Covariate
Discriminant

3-Nearest
Nearest
Vector



Predictor
Analysis
1-Nearest
Neighbors
Centroid
Machines



Correct?
Correct?
Neighbor
Correct?
Correct?
Correct?





















Mean percent
90
90
90
89
91
92


of correct








classification:









Performance of the Support Vector Machine Classifier:



















Class
Sensitivity
Specificity
PPV
NPV






















N
0.917
0.917
0.917
0.917



T
0.917
0.917
0.917
0.917

















TABLE 4







Composition of classifier: Sorted by t-value


(Sorted by gene pairs)


Class 1: N; Class 2: T.















Parametric


Geom mean
Geom mean





p-
t-
% CV
of intensities
of intensities
Fold-
Gene



value
value
support
in class 1
in class 2
change
symbol

















1
<1e−07
−9.452
100
1411.8016
13554.578246
0.1041568
WT1


2
<1e−07
−7.222
100
85.5069224
1125.7940428
0.0759525
DLX2


3
<1e−07
−6.648
99
852.3850013
7392.282404
0.1153074
SALL3


4
<1e−07
−6.48
70
235.4745892
592.5077157
0.3974203
TERT


5
0.0017994
3.213
27
437.7037557
291.867223
1.4996674
TNFRSF25


6
  5e−07
5.391
100
2993.3054637
1117.4218527
2.6787604
ACTB


7
  4e−07
5.474
76
10347.818495
3603.9811381
2.8712188
SMAD3


8
<1e−07
5.832
98
2002.2452679
266.6906343
7.507745
CPEB4









Example 10d
(BRB v3.8) 5 Greedy Pairs

Using “5 pairs of methylation markers” derived from greedy pairs class prediction with supportive vector machines enables 95% correct classification of TU and N.


Performance of Classifiers During Cross-Validation:





















Mean

Diagonal




Bayesian



Number
Compound
Linear

3-

Support
Compound



of genes
Covariate
Discriminant
1-
Nearest
Nearest
Vector
Covariate



in
Predictor
Analysis
Nearest
Neighbors
Centroid
Machines
Predictor



classifier
Correct?
Correct?
Neighbor
Correct?
Correct?
Correct?
Correct?







Mean percent

92
94
90
94
92
95
95


of correct










classification:





Note:


NA denotes the sample is unclassified. These samples are excluded in the compuation of the mean percent of correct classification






Performance of the Support Vector Machine Classifier:



















Class
Sensitivity
Specificity
PPV
NPV






















N
0.958
0.938
0.939
0.957



T
0.938
0.958
0.957
0.939

















TABLE 5







Composition by classifier: Sorted by t-value (Sorted by gene pairs)


Class 1: N; Class 2: T.


















Geom mean
Geom mean





Parametric

% CV
of intensities
of intensities
Fold-
Gene



p-value
t-value
support
in class 1
in class 2
change
symbol

















1
  <1e−07
−9.531
100
1378.5556347
13613.2679786
0.1012656
WT1


2
  <1e−07
−7.419
100
78.691453
1122.0211285
0.0701337
DLX2


3
  <1e−07
−6.702
100
832.1044249
7415.7421008
0.1122078
SALL3


4
  <1e−07
−6.625
100
223.339058
595.0731922
0.03753136
TERT


5
  <1e−07
−6.586
100
267.2568518
837.2745062
0.3191986
PITX2


6
0.0029082
3.057
35
427.3964613
286.9546694
1.4894215
TNFRSF25


7
  1.26e−05
4.612
70
7297.8279144
3875.9637585
1.8828421
KL


8
    9e−07
5.255
99
2922.8174216
1122.2601272
2.6044028
ACTB


9
    9e−07
5.266
98
10104.1419624
3617.8969167
2.792822
SMAD3


10
    2e−07
5.603
100
1911.6531674
265.654275
7.1960188
CPEB4









Example 10e
Recursive Feature Elimination Method

Using “16 methylation markers” derived from the Recursive Feature Elimination method for class prediction with Diagonal Linear Discriminant Analysis enables 100% correct classification of TU and N.


Performance of Classifiers During Cross-Validation.




















Mean

Diagonal







Number
Compound
Linear

3-

Support



of genes
Covariate
Discriminant
1-
Nearest
Nearest
Vector



in
Predictor
Analysis
Nearest
Neighbors
Centroid
Machines



classifier
Correct?
Correct?
Neighbor
Correct?
Correct?
Correct?







Mean percent

98
100
96
96
94
96


of correct









classification:
















TABLE 6







Composition of classifier: Sorted by t-value
















Geometric mean




Parametric

% CV
of intensities
Gene



p-value
t-value
support
(class N/class T)
symbol
















1
<1e−07
−10.859
100
0.1041568
WT1


2
<1e−07
−7.903
100
0.3297605
PITX2


3
<1e−07
−7.314
98
0.1153074
SALL3


4
<1e−07
−7.028
81
0.0759525
DLX2


5
<1e−07
−6.592
98
0.3974203
TERT


6
<1e−07
−6.539
98
0.3035331
HOXA10


7
4.2e−06 
−5.206
98
0.3916643
RASSF1


8
4.59e−05
−4.491
94
0.4229908
SPARC


9
0.0329896
−2.197
88
0.5237754
IRAK2


10
0.0496307
−2.015
98
0.6640548
ZNF711


11
1.68e−05
4.794
79
2.0200164
DNAJA4


12
4.5e−06 
5.182
79
2.9842797
HLA-G


13
4.2e−06 
5.202
79
1.7516486
CXADR


14
1.4e−06 
5.521
75
1.9112579
TP53


15
1.1e−06 
5.605
100
2.6787604
ACTB


16
 2e−07
6.115
100
7.507745
CPEB4









Example 10f
(BRB v3.8) Recursive Feature Elimination Method

Due to some differences in data importing/normalisation repeated collation of data for statistics (using BRB v. 3.8) a genelist with minor differences (compared to example 12e) has been calculated form data, and is as given below:


Performance of Classifiers During Cross-Validation.




















Mean

Diagonal







Number
Compound
Linear

3-

Support



of genes
Covariate
Discriminant
1-
Nearest
Nearest
Vector



in
Predictor
Analysis
Nearest
Neighbors
Centroid
Machines



classifier
Correct?
Correct?
Neighbor
Correct?
Correct?
Correct?







Mean percent

96
100
96
96
96
96


of correct









classification:
















TABLE 7







Composition of classifier: Sorted by t-value
















Geometric mean




Parametric

% CV
of intensities
Gene



p-value
t-value
support
(class N/class TU)
symbol
















1
<1e−07
−10.777
100
0.1012656
WT1


2
<1e−07
−8.046
88
0.3191986
PITX2


3
<1e−07
−7.336
98
0.1122078
SALL3


4
<1e−07
−7.232
85
0.1264427
F2R


5
<1e−07
−6.712
100
0.3753136
TERT


6
<1e−07
−6.524
98
0.2930706
HOXA10


7
1.6e−06 
−5.49
98
0.3695951
RASSF1


8
3.87e−05
−4.543
83
0.4112493
SPARC


9
0.0313421
−2.219
88
0.5143877
IRAK2


10
0.0366617
−2.151
98
0.6452171
ZNF711


11
0.3333009
0.978
58
1.1102014
DRD2


12
4.91e−05
4.471
77
1.9749991
DNAJA4


13
2.25e−05
4.707
75
1.7030259
CXADR


14
7.4e−06 
5.036
88
1.8582045
TP53


15
2.1e−06 
5.402
100
2.6044028
ACTB


16
 5e−07
5.815
100
7.1960188
CPEB4









Example 10g
Recursive Geneset for “PB-N-TU” Distinction Using CLASS Prediction

To distinguish PB, N, and TU is of interest when minimal invasive testing for lung cancer has to be performed using serum- or plasma from peripheral blood. The markers distinguishing PB, N and TU will be best suited therefore. Using “16 methylation markers” derived from the Recursive Feature Elimination method for class prediction with Diagonal Linear Discriminant Analysis enables 91% correct classification.


Performance of Classifiers During Cross-Validation:


















Diagonal Linear

3-Nearest
Nearest



Discriminant
1-Nearest
Neighbors
Centroid



Analysis Correct?
Neighbor
Correct?
Correct?




















Mean percent
91
89
87
88


of correct


classification:









Performance of the Diagonal Linear Discriminant Analysis Classifier:



















Class
Sensitivity
Specificity
PPV
NPV






















N
0.875
0.946
0.933
0.898



PB
1
0.948
0.615
1



T
0.938
0.982
0.978
0.948










Performance of the 1-Nearest Neighbor Classifier:



















Class
Sensitivity
Specificity
PPV
NPV






















N
0.979
0.821
0.825
0.979



PB
0.75
0.99
0.857
0.979



T
0.833
1
1
0.875










Performance of the 3-Nearest Neighbors Classifier:



















Class
Sensitivity
Specificity
PPV
NPV






















N
1
0.75
0.774
1



PB
0.125
1
1
0.932



T
0.854
1
1
0.889










Performance of the Nearest Centroid Classifier:



















Class
Sensitivity
Specificity
PPV
NPV






















N
0.812
0.929
0.907
0.852



PB
1
0.917
0.5
1



T
0.917
0.982
0.978
0.932

















TABLE 8







Composition by classifier: Sorted by p-value


Class 1: N; Class 2: PB; Class 3: T.


















Geom mean
Geom mean
Geom mean




Parametric

% CV
of intensities
of intensities
of intensities
Gene



p-value
t-value
support
in class 1
in class 2
in class 3
symbol

















1
  <1e−07
65.961
100
1411.8016
335.9542052
13554.578246
WT1


2
  <1e−07
34.742
100
2993.3054637
240.5599546
1117.4218527
ACTB


3
  <1e−07
30.862
100
85.5069224
70.3843498
1125.7940428
DLX2


4
  <1e−07
30.03
100
274.9097126
128.8159291
833.6648468
PITX2


5
  <1e−07
28.153
100
852.3850013
349.2428569
7392.282404
SALL3


6
  <1e−07
23.333
100
80.5286413
62.0661721
265.3042755
HOXA10


7
  <1e−07
21.159
100
235.4745892
296.8149796
592.5077157
TERT


8
    2e−07
17.8
100
2002.2452679
1697.5965438
266.6906343
CPEB4


9
   4.3e−06
13.991
100
564.6571983
1254.1750649
189.2105463
HLA-G


10
  1.54e−05
12.388
100
1710.9865406
1310.5286603
4044.9737351
SPARC


11
   1.9e−05
12.132
100
114.7065029
81.1382549
292.8694482
RASSF1


12
  6.55e−05
10.614
100
1639.128227
1576.0887022
811.4430136
DNAJA4


13
0.0008203
7.63
100
1484.6917542
1429.9219493
847.5968076
CXADR


14
0.0008501
7.589
100
11761.052468
9062.1655722
6153.5665863
TP53


15
0.041843
3.276
100
105.5844903
94.1143599
201.5835284
IRAK2


16
0.3946752
0.938
100
483.3048928
567.8776158
727.8087385
ZNF711









Example 10h
Class Prediction “Differentiation”→Poor-Moderate-Well

Distinguishing the grade of differentiation of the tumours could be also achieved by DNA methylation marker testing. Although the correct classification is only about 60% in this example, the lung tumour groups “AdenoCa” and “SqCCL” can be split and used separately for determining the grade of tumour-differentiation for better performance.


Performance of Classifiers During Cross-Validation.


















Diagonal Linearn

3-Nearest
Nearest



Discriminant
1-Nearest
Neighbors
Centroid



Analysis Correct?
Neighbor
Correct?
Correct?




















Mean percent
50
52
57
62


of correct


classification:
















TABLE 9







Composition by classifier: Sorted by p-value


Class 1: moderate; Class 2: poor; Class 3: well.


















Geom mean
Geom mean
Geom mean




Parametric

% CV
of intensities
of intensities
of intensities
Gene



p-value
t-value
support
in class 1
in class 2
in class 3
symbol

















1
0.0002337
10.127
100
2426.5840626
190.6171197
840.042225
F2R


2
0.002636
6.796
100
409.0809522
178.099004
3103.6338503
ZNF256


3
0.0034931
6.432
100
67.1145733
81.4305823
63.5786575
CDH13


4
0.0044626
6.118
100
30915.9294466
15055.465308
6829.1471271
SERPINB5


5
0.0082321
5.35
100
289.011498
400.2767665
163.1721958
KRT14


6
0.0092929
5.2
100
2890.2702155
418.2345934
211.3575002
DLX2


7
0.0111512
4.977
100
68.3488191
83.3593382
60.6607364
AREG


8
0.0286999
3.846
98
62.1904027
62.94364
74.3029102
THRB


9
0.0326517
3.696
92
64.7904336
80.1596633
60.6607364
HSD17B4


10
0.0414877
3.418
62
5631.0373836
2622.6315852
3310.1373187
SPARC


11
0.0449927
3.325
79
894.5655128
1191.0908574
510.2671098
HECW2


12
0.0480858
3.249
40
441.1103703
1018.9640546
852.4793505
COL21A1









Example 10i
BinTreePred “Differentiation” AdenoCa, SqCCL, N PB

Using Binary Tree prediction (applicable for elucidation of markers for more than 2 classes) provides several sets of predictors which enable classification of PB, AdenoCa, SqCCL, N. These marker sets could be used alternatively for classification.


Optimal Binary Tree:
Cross-Validation Error Rates for a Fixed Tree Structure Shown Below


















Mis-classifi-


Node
Group 1 Classes
Group 2 Classes
cation rate (%)


















1
AdenoCa, N, SqCCL
PB
0.0


2
AdenoCa, SqCCL
N
9.4


3
AdenoCa
SqCCL
31.2







embedded image








Results of Classification, Node 1:









TABLE 10







Composition of classifier (23 genes): Sorted by p-value

















Geom mean of
Geom mean of




Parametric

% CV
intensities in group
intensities in group




p-value
t-value
support
1
2
Gene symbol
















1
  <1e−07
11.494
100
5370.6044342
241.377309
KL


2
  <1e−07
13.624
100
15595.1182874
226.4099812
HIST1H2AG


3
  <1e−07
14.042
100
15562.4306923
62.0661607
TJP2


4
  <1e−07
20.793
100
36238.4478078
169.7749739
SRGN


5
  <1e−07
8.845
92
2847.6405879
176.5970582
CDX1


6
  <1e−07
7.452
100
357.4232278
64.4047416
TNFRSF25


7
  <1e−07
6.909
97
4344.5133099
90.5259025
APC


8
  <1e−07
6.607
100
38027.3831138
10046.5061814
HIC1


9
  <1e−07
6.428
100
1605.6039019
115.3436683
APC


10
    2e−07
5.611
100
439.58106
107.9138518
GNA15


11
    2e−07
5.53
100
1828.8750958
240.5597144
ACTB


12
  2.47e−05
4.42
100
4374.5147937
335.954606
WT1


13
  3.53e−05
−4.327
100
693.9070151
2419.282873
KRT17


14
  4.73e−05
−4.251
100
3086.6035554
8432.6551975
AIM1L


15
  5.58e−05
−4.207
100
11780.3636838
25260.4242674
DPH1


16
0.0001755
3.895
96
2120.616338
688.5899191
PITX2


17
0.0005056
3.593
100
478.7300449
128.8159563
PITX2


18
0.0012022
−3.332
100
167.4354555
461.2140013
KIF5B


19
0.0015431
−3.254
100
865.090709
2041.1567322
BMP2K


20
0.0020491
−3.164
100
10857.4258468
26743.6730071
GBP2


21
0.0023603
3.119
100
1819.6185255
218.3422479
NHLH2


22
0.0040506
2.941
96
614.495327
62.0661607
GDNF


23
0.0043281
2.918
98
6929.8366248
784.5416613
BOLL









Results of Classification, Node 2:









TABLE 11







Composition of classifier (32 genes): Sorted by p-value

















Geom mean of
Geom mean of




Parametric

% CV
intensities in group
intensities in group




p-value
t-value
support
1
2
Gene symbol
















1
  <1e−07
9.452
92
13554.5792299
1411.801824
WT1


2
  <1e−07
7.222
92
1125.7939487
85.5069135
DLX2


3
  <1e−07
6.648
69
7392.2771156
852.3852836
SALL3


4
  <1e−07
6.48
92
592.5077475
235.4746794
TERT


5
  <1e−07
6.445
92
833.6646395
274.909652
PITX2


6
  <1e−07
6.123
92
265.3043233
80.5286481
HOXA10


7
  <1e−07
6.019
92
855.6411657
112.6645794
F2R


8
  <1e−07
−5.832
92
266.6907851
2002.2457379
CPEB4


9
    4e−07
5.482
92
4609.4395265
718.3111003
NHLH2


10
    4e−07
−5.474
92
3603.9808376
10347.8149677
SMAD3


11
    5e−07
−5.391
92
1117.4212918
2993.3062317
ACTB


12
   2.8e−06
4.984
92
3941.7717994
296.6448908
HOXA1


13
   3.6e−06
4.922
92
17199.6559171
2792.0695552
BOLL


14
   5.9e−06
−4.802
92
2178.4609569
8664.280092
APC


15
  1.21e−05
4.622
92
472.6943985
96.784825
MT1G


16
  1.36e−05
4.593
69
2188.6204084
653.0580827
PENK


17
  1.97e−05
4.497
92
4044.9730493
1710.9865557
SPARC


18
  3.16e−05
−4.373
92
811.4434055
1639.128128
DNAJA4


19
  3.85e−05
4.321
92
292.869462
114.7064501
RASSF1


20
  4.28e−05
−4.293
92
189.210499
564.6573579
HLA-G


21
  4.98e−05
−4.253
92
446.1371701
1339.8173509
ERCC1


22
    6e−05
4.203
92
1158.1503785
395.6249449
ONECUT2


23
  6.58e−05
−4.178
92
1024.089614
2517.3225611
APC


24
  8.45e−05
4.11
92
701.7840426
232.2538242
ABCB1


25
0.0002382
−3.821
92
1165.5392514
3027.5052576
ZNF573


26
0.0003469
−3.713
92
148.6108699
360.9887854
KCNJ15


27
0.0003582
3.704
92
4147.2987214
1818.1188972
ZDHHC11


28
0.0012332
3.332
46
512.9098469
238.5488699
SFRP2


29
0.0019349
3.19
92
1215.8855046
310.5592635
GDNF


30
0.002818
−3.068
92
2261.9371454
4930.1357863
PTTG1


31
0.0038228
−2.966
92
974.5345902
2402.9849125
SERPINI1


32
0.0039256
2.957
90
417.3184202
208.6541481
TNFRSF10C









Results of Classification, Node 3:









TABLE 12







Composition of classifier (2 genes): Sorted by p-value

















Geom mean
Geom mean







of
of




Parametric
t-
% CV
intensities
intensities
Gene



p-value
value
support
in group 1
in group 2
symbol
















1
0.000302
3.91
40
584.5327307
158.116767
HOXA10


2
0.0038089
3.048
46
180.3474561
67.115885
NEUROD1









Example 11
qPCR Validation of Biomarkers

Quantitative PCR with primers for markers elucidated by microarray analysis were run on MSRE-digested DNAs from the same sample groups as analyzed on microarrays. Marker sets for SYBRGreen qPCR were from Example 10f and Example 10d.









TABLE 13







Markers used for SYBRGreen-qPCR:









Gene


Unique id
symbol





Ahy_61_chr11:32411664-32412266 +_401-464
WT1


349_hy_35-PitxA_chr4:111777754-111778067
PITX2


Ahy_156_chr18:74841510-74841935 +_336-389
SALL3


Ahy_265_chr5:76046889-76047178 +_134-197
F2R


Ahy_252_chr5:1348529-1348893 +_138-187
TERT


Ahy_289_chr7:27180142-27180796 +_181-238
HOXA10


Ahy_233_chr3:50352877-50353278 +_108-157
RASSF1


Ahy_257_chr5:151046476-151047183 +_57-106
SPARC


Ahy_212_chr3:10181572-10181986 +_249-298
IRAK2


Ahy_332_chrX:84385510-84385717 +_42-106
ZNF711


Ahy_51_chr11:112851438-112851650 +_57-107
DRD2


Ahy_109_chr15:76343347-76343876 +_373-428
DNAJA4


Ahy_202_chr21:17806218-17806561 +_104-167
CXADR


Ahy_143_chr17:7532353-7532949 +_415-476
TP53


335_hy_4-Aktin_VL_chr7:5538506-5538805
ACTB


Ahy_261_chr5:173247753-173248208 +_350-404
CPEB4


Ahy_181_chr2:172672873-172673656 +_177-227
DLX2


Ahy_30_chr1:6448693-6448938 +_57-107
TNFRSF25


Ahy_83_chr13:32489371-32489688 +_181-245
KL


Ahy_107_chr15:65146236-65146654 +_305-366
SMAD3









Negative amplification (no Cp-value generated upon 45 cycles of PCR amplification with SYBR green) were set to Cp=45; all qPCR-Cp-values were subtracted from 45.01 to obtain transformed data directly comparable to microarray data,—thus the higher the value the more product was generated (resembles a lower Cp-value. Statistical testing of the transformed data was performed in the same manner as the microarray data using BRB-AT software.


Class comparison and different strategies/methods for class prediction using the qPCR enables correct classification of different sample groups. Although qPCR conditions were not optimized but run under our standard conditions, successful classification of groups with markers deduced from microarray-analysis confirms reliability of methylation markers.









TABLE 14







9 markers from Table 13 showed significant class difference fold changes













mean of log
mean of log




Gene
intensities
intensities


Unique id
symbol
for N
for T
FoldDiff














Ahy_30_chr1:6448693-6448938 +_57-107
TNFRSF25
7.40354
8.5125
0.46


Ahy_156_chr18:74841510-74841935 +_336-389
SALL3
1.59063
7.04229
0.02


Ahy_233_chr3:50352877-50353278 +_108-157
RASSF1
5.80167
7.95708
0.22


Ahy_252_chr5:1348529-1348893 +_138-187
TERT
0.01
1.1725
0.45


Ahy_257_chr5:151046476-151047183 +_57-106
SPARC
11.76
14.10521
0.20


Ahy_265_chr5:76046889-76047178 +_134-197
F2R
0.70917
4.87917
0.06


Ahy_289_chr7:27180142-27180796 +_181-238
HOXA10
1.67708
3.88125
0.22


Ahy_332_chrX:84385510-84385717 +_42-106
ZNF711
4.635
6.48875
0.28


349_hy_35-PitxA_chr4:111777754-111778067
PITX2
5.48854
8.61813
0.11









Example 11a
CLASS Prediction: TU Vs Normal: p<0.01>>SVM 100%, Paired Samples
Performance of Classifiers During Cross-Validation
Mean Percentage of Correction Classification:




















Diagonal







Compound
Linear

3-

Support



Covariate
Discriminant
1-
Nearest
Nearest
Vector



Predictor
Analysis
Nearest
Neighbors
Centroid
Machines



Correct?
Correct?
Neighbor
Correct?
Correct?
Correct?







Mean percent of
96
98
94
94
94
100


correct classification:





n = 48













TABLE 15







Composition of classifier: Sorted by t -value
















Geometric mean




Parametric

% CV
of intensities
Gene



p-value
t-value
support
(class N/class T)
symbol
















1
1e−07
−6.184
100
0.0228499
SALL3


2
2e−07
−6.162
100
0.1142619
PITX2


3
4e−07
−5.879
100
0.1967986
SPARC


4
3.5e−06
−5.254
100
0.0555527
F2R


5
8.08e−05  
−4.318
100
0.4467377
TERT


6
0.0009183
−3.538
100
0.2244683
RASSF1


7
0.0011335
−3.468
100
0.21701
HOXA10


8
0.0045818
2.978
100
1.7787126
CXADR


9
0.0012761
3.427
100
3.3134481
KL









Example 11b
CLASS Prediction: TU vs Normal: p<0.01
Performance of the Support Vector Machine Classifier:



















Class
Sensitivity
Specificity
PPV
NPV






















N
0.917
0.875
0.88
0.913



T
0.875
0.917
0.913
0.88










Performance of the Bayesian Compound Covariate Classifier:



















Class
Sensitivity
Specificity
PPV
NPV






















N
0.792
0.604
0.667
0.744



T
0.604
0.792
0.744
0.667

















TABLE 16







Composition of classifier: Sorted by t-value


Class 1: N; Class 2: T.



















Geom mean
Geom mean






Parametric

% CV
of intensities
of intensities
Fold-

Gene



p-value
t-value
support
in class 1
in class 2
change
Unique id
symbol


















1
  <1e−07
−6.713
100
3.011798
131.8077746
0.0228499
Ahy_156_chr18:74841510-
SALL3









74841935 +_336-389



2
  <1e−07
−6.491
100
3468.2688243
17623.4446406
0.1967986
Ahy_257_chr5:151046476-
SPARC









151047183 +_57-106



3
  <1e−07
−6.208
100
44.8968301
392.9290497
0.1142619
349_hy_35-PitxA_chr4:
PITX2









111777754-111778067



4
    1e−06
−5.248
100
1.6348595
29.429
0.0555527
Ahy_265_chr5:76046889-
F2R









76047178 +_134-197



5
  3.91e−05
−4.318
100
1.0069555
2.2540195
0.4467377
Ahy_252_chr5:1348529-
TERT









1348893 +_138-187



6
0.0003748
−3.691
100
55.7796365
248.4967761
0.2244683
Ahy_233_chr3:50352877-
RASSF1









50353278 +_108-157



7
0.0009309
−3.419
100
3.1978081
14.7357642
0.21701
Ahy_289_chr7:27180142-
HOXA10









27180796 +_181-238



8
0.0009772
3.404
100
3114.5146028
939.9618007
3.3134481
Ahy_83_chr13:32489371-
KL









32489688 +_181-245
















TABLE 16b







Prediction rule from the linear predictors











Table.

Compound
Diagonal Linear
Support


Gene

Covariate
Discriminant
Vector


Weights
Genes
Predictor
Analysis
Machines














1
Ahy_83_chr13:32489371-32489688 +_181-245
3.4041
0.2794
1.2796


2
Ahy_156_chr18:74841510-74841935 +_336-389
−6.7126
−0.3444
−0.2136


3
Ahy_233_chr3:50352877-50353278 +_108-157
−3.6907
−0.2633
0.0512


4
Ahy_252_chr5:1348529-1348893 +_138-187
−4.3175
−0.6681
−1.1674


5
Ahy_257_chr5:151046476-151047183 +_57-106
−6.4911
−0.7486
−0.7093


6
Ahy_265_chr5:76046889-76047178 +_134-197
−5.2477
−0.2752
−0.0135


7
Ahy_289_chr7:27180142-27180796 +_181-238
−3.419
−0.221
−0.3187


8
349_hy_35-PitxA_chr4:111777754-111778067
−6.2083
−0.5132
−0.353









The prediction rule is defined by the inner sum of the weights (wi) and expression (xi) of significant genes. The expression is the log ratios for dual-channel data and log intensities for single-channel data.


A sample is classified to the class N if the sum is greater than the threshold; that is, Σiwi xi>threshold.


The threshold for the Compound Covariate predictor is −172.255


The threshold for the Diagonal Linear Discriminant predictor is −15.376


The threshold for the Support Vector Machine predictor is 0.838


Example 11c
Recursive Feature Extraction (n=10) Prediction: TU Vs Normal 98% Correct, Paired Samples









TABLE 17







Composition of classifiers: Sorted by t-value
















Geometric mean




Parametric

% CV
of intensities
Gene



p-value
t-value
support
(class N/class T)
symbol
















1
1e−07
−6.184
100
0.0228499
SALL3


2
2e−07
−6.162
100
0.1142619
PITX2


3
4e−07
−5.879
100
0.1967986
SPARC


4
3.5e−06
−5.254
100
0.0555527
F2R


5
0.0011335
−3.468
100
0.21701
HOXA10


6
0.0188086
−2.434
92
0.5671786
DRD2


7
0.3539709
0.936
94
1.2886257
ACTB


8
0.1083921
1.637
100
1.8305684
DNAJA4


9
0.0045818
2.978
98
1.7787126
CXADR


10
0.0012761
3.427
100
3.3134481
KL









Example 11d
Greedy Pairs (6) Prediction: TU Vs Normal: 88% SVM, UNpaired Samples
Performance of the Support Vector Machine Classifier:



















Class
Sensitivity
Specificity
PPV
NPV






















N
0.896
0.854
0.86
0.891



T
0.854
0.896
0.891
0.86










Performance of the Bayesian Compound Covariate Classifier:



















Class
Sensitivity
Specificity
PPV
NPV






















N
0.812
0.604
0.672
0.763



T
0.604
0.812
0.763
0.672

















TABLE 18







Composition of classifier: Sorted by t-value (Sorted by gene pairs)


Class 1: N; Class 2: T.


















Geom mean
Geom mean





Parametric

% CV
of intensities
of intensities
Fold-
Gene



p-value
t-value
support
in class 1
in class 2
change
symbol

















1
  <1e−07
−6.713
100
3.011798
131.8077746
0.0228499
SALL3


2
  <1e−07
−6.491
100
3468.2688243
17623.4446406
0.1967986
SPARC


3
  <1e−07
−6.208
100
44.8968301
392.9290497
0.1142619
PITX2


4
    1e−06
−5.248
100
1.6348595
29.429
0.0555527
F2R


5
  3.91e−05
−4.318
100
1.0069555
2.2540195
0.4467377
TERT


6
−0.0003748
−3.691
100
55.7796365
248.4967761
0.2244683
RASSF1


7
0.0009309
−3.419
100
3.1978081
14.7357642
0.21701
HOXA10


8
0.0137274
−2.512
100
169.3121483
365.1891236
0.4636287
TNFRSF25


9
0.1465343
1.464
98
4255.1669082
2324.5057894
1.8305684
DNAJA4


10
0.1463194
1.465
50
326.8534389
203.1873409
1.6086309
TP53


11
0.0176345
2.416
100
2588.5288498
1455.2822633
1.7787126
CXADR


12
0.0009772
3.404
100
3114.5146028
939.9618007
3.3134481
KL










Cross-Validation ROC curve from the Bayesian Compound Covariate Predictor. The area under the curve is 0.944 (FIG. 1).


Example 11e
CLASS Prediction: Histology: p<0.05 Using all qPCRs for Class Prediction Analysis of Tumor-Subtype Versus Normal Lung Tissue









TABLE 19







Composition of classifier: Sorted by p-value


Class 1: AdenoCa; Class 2: N; Class 3: SqCCL.


















Geom mean
Geom mean
Geom mean




Parametric

% CV
of intensities
of intensities
of intensities
Gene



p-value
t-value
support
in class 1
in class 2
in class 3
symbol

















1
  <1e−07
23.305
100
11832.9848147
3468.2688243
22878.8045137
SPARC


2
  <1e−07
22.546
100
98.6115161
3.011798
159.4048479
SALL3


3
    1e−07
19.146
100
7.6044403
1.6348595
71.4209691
F2R


4
    1e−07
19.124
100
359.9316118
44.8968301
416.1715345
PITX2


5
  2.81e−05
11.753
100
90.8736104
55.7796365
480.3462809
RASSF1


6
  3.15e−05
11.611
100
48.8581148
3.1978081
6.7191365
HOXA10


7
0.0001543
9.66
100
1.9602703
1.0069555
2.4699516
TERT


8
0.0042218
5.802
100
1047.8074626
3114.5146028
875.3966524
KL


9
0.0233243
3.914
100
263.7738716
169.3121483
451.9439364
TNFRSF25









Performance of Classifiers During Cross-Validation

Mean Percent of Correct Classification, n=96:


















Diagonal Linear

3-Nearest
Nearest



Discriminant
1-Nearest
Neighbors
Centroid



Analysis Correct?
Neighbor
Correct?
Correct?




















Mean percent
72
74
74
72


of correct


classification:









Example 11f
Bintree Prediction: Histology—p<0.05 UNpaired Samples “Compound Covariate Classifier”
Optimal Binary Tree: Cross-Validation Error Rates for a Fixed Tree Structure Shown Below
















Group 1
Group 2
Mis-classification rate


Node
Classes
Classes
(%)







1
AdenoCa,
N
14.6



SqCCL


2
AdenoCa
SqCCL
31.2









Results of Classification, Node 1:









TABLE 20







Composition of classifiers (10 genes): Sorted by p-value

















Geom mean of
Geom mean of




Parametric

% CV
intensities in group
intensities in group




p-value
t-value
support
1
2
Gene symbol
















1
  <1e−07
6.713
100
131.8077753
3.011798
SALL3


2
  <1e−07
6.491
100
17623.4448347
3468.2687994
SPARC


3
  <1e−07
6.208
100
392.9290438
44.8968296
PITX2


4
    1e−06
5.248
100
29.4290011
1.6348595
F2R


5
  3.91e−05
4.317
100
2.2540195
1.0069556
TERT


6
0.0003748
3.691
100
248.4967776
55.779638
RASSF1


7
0.0009309
3.419
100
14.7357644
3.197808
HOXA10


8
0.0009772
−3.404
100
939.9618108
3114.5147006
KL


9
0.0137274
2.511
100
365.1891266
169.3121466
TNFRSF25


10
0.0176345
−2.416
100
1455.2823102
2588.528822
CXADR









Results of Classification, Node 2:









TABLE 21







Composition of classifier (3 genes): Sorted by p-value

















Geom mean
Geom mean




Parametric

% CV
of intensities
of intensities
Gene



p-value
t-value
support
in group 1
in group 2
symbol
















1
0.0058346
2.892
50
48.8581156
6.7191366
HOXA10


2
0.0253305
−2.312
50
90.8736092
480.3462899
RASSF1


3
0.0330755
−2.197
49
7.6044405
71.4209719
F2R



















SEQUENCE LISTING








SEQ ID NO:
DNA-SEQUENCE











1
CGGCCGGTCAGGAATCCCCATCCTGGAGCGCAGGCGGAGAGCCAGTGGCT





2
CCAAAAAAGGTGACACTGCCCCCTCCCAGTGGCTCCATGCTCCTCAGCTATGGCTGTCCGGGCC





3
CGCCCCGCCCCCGCCAACAACCGCCGCTCTGATTGGCCCGGCGCTTGTCTCTT





4
AGCGGCCTCAGCCTGCGCACCCCAGGAGCGTGGATGACTACGGCCACCCC





5
GCAGCCGAGAGGGTCAGGCCCCCATAGGTCCTCAGCCTGCTTCAACCTCAAAGGGGATGGGGG





6
TCCTGGCAGCATTACCACACTGCTCACCTGTGAAGCAATCTTCCGGAGACAGGGCCAAAGGGCCA





7
CTGACAAGAGACATGCAGGGCTGAGAGGCAGCTCCTTTTTATAGCGGTTAGGCTTGGCCAGCTGC





8
TGGCATCCACTTGCTTGATCCAGCCAGATTCCCACTCCCATGCCCTCTCCACTATTGCGATTGC





9
CTGCTTCGTGCCCTCTGGTGGCTAAGGCGTGTCATTGCAGTGCCGGCCTCCTGTCATCCTCC





10
CCGGCGCACTCCGACTCCGAGCAGTCTCTGTCCTTCGACCCGAGCCCCGC





11
TAGGTGGTGAGTTACTTGGCTCGGAGCGGGCGAGGGGACGCGTGGGCGGAGCG





12
AACCACCTGATCAAGGAAAAGGAAGGCACAGCGGAGCGCAGAGTGAGAACCACCAACCGAGGCGC





13
CGGGGGTAGGCTTTGCTGTCTGAGGGCGTCTGGCTGTGGAGCTGAAGGAGGCGCTGCTGAG





14
GCCCCGCATCCCTAATGAGGGAATGAATGGAGAGGCCCCCTCGGCTGGCGCCC





15
CGGGGCCACGCGCTAAGGGCCCGAACTTGGCAGCTGACCGTCCCGGACAG





16
CCACCGAACACGCCGCACCGGCCACCGCCGTTCCCTGATAGATTGCTGATGC





17
GAACTGGGTCGTGGAAGGATCGCGGGGAGCGGCCCTCAGGCCTTCGGCCTCACT





18
CCAGCACTTTGGGAGGCCGAGGCGGGCGGATCACGAGGTCAGGAGATCGAGACCATCCT





19
GGCGGCTGGTGCTTGGGTCTACGGGAATACGCATAACAGCGGCCGTCAGGGCGCC





20
TCAGATTCCTCAGGGCCGCAGAGGTGTGGAGCTGGTTGGGCCGGTTCTTCACCCTCCTCCC





21
CTGGCCGAGGTGGCCACCGGTGACGACAAGAAGCGCATCATTGACTCAGCCCGG





22
CTAACCTTCCTCGCCGCCTTCCTGCGGGTGACCCCCAAACGCCCCAGCTCCGC





23
CCGACTTGGACGCGGCCAGCTGGAGAGGCGGAGCGCCGGGAGGAGACCTT





24
ACAGAGTCGGCACCGGCGTCCCCAGCTCTGCCGAAGATCGCGGTCGGGTC





25
GGGGATGGAGAACTCTCCTCGCTTCGTCCTCTCTCCCGGGGAATCCCTAACCCCGCACTGCG





26
GTGGCTCGGGTCCACCCGGGCTGCGAGCCGGCAGCACAGGCCAATAGGCAATTAG





27
CTCACCCCGCGACTTACCCCACACCCCGCTCTCCAGAACCCCCATATGGGCGCTCACC





28
ACACACCACTGCAGCGTTCAAACGCTGGGAAGAAGACTCCCTTGTGGCACCGGAAACCCACGAGG





29
CCGCCACGAACTTGGGGTGCAGCCGATAGCGCTCGCGGAAGAGCCGCCTC





30
CTCCATAGCCCTCCGACGGGCGCCCAGGGGCTTCCCGGCTCCGTGCTCTCT





31
TGGACACCCCAAGAGCTCACTCCTCCGCGGTTTTATATTCCGACTTGCGCACAGGAGCGGGGTGC





32
CCGCCCGTTTCAGCGGCGCAGCTTCTGTAGTTGGGCTACTGGAGGGGTCGCTCAGAAACCTCA





33
AACCCAGGCTTGTCAGCCTAAGAACACGGGATCTCTTCACTGTGGTTCATGTGTAGAGTG-



GAGTTTCCA





34
CAGTCCCCTGCCGTGCGCTCGCATTCCTCAGCCCTTGGGTGGTCCATGGGA





35
CAGGTGGGCGTCTCAGGGGTGGGAGTGGCCGCGTCGTGAAGCGGAGAGAGGA





36
CTGCAAGGGATGACTCACCCCAGTGATTCAACCGCGCCACCGAGCGCGGAGCTG





37
TTGTATGGATTTCGCCCAGGGGAAAGCGCTCCAACGCGCGGTGCAAACGGAAGCCACTG





38
GAGGACCAGGGCCGGCGTGCCGGCGTCCAGCGAGGATGCGCAGACTGCCT





39
ACGCACCGCGGCTCCTCGCGTCCAGCCGCGGCCAAGGAAGTTACTACTCGCCCAAAT





40
CGCTGCCTCGCCATTGGGCGGCCGAACGCAGCCACGTCCAATCAGAGGAGT





41
GAGGTTCTGGGGACCGGGAGAGTGGCCACCTTCTTCCTCCTCGCGAAGAGCAGGCCGGG





42
AGTGGGATTGGGGCACTTGGGGCGCTCGGGGCCTGCGTCGGATACTCGGGTC





43
TCAAGCCGCCTCAGGTGAGCGCTCCTTGGCGCTACTTCCGGTCTCAGGTGAGGCCGC





44
TTGTGACGTGTGTTCTGGGCAGGGTTTGAGGTTTTGGAACATTTTCTAAAAGGGACAGAGAGCAC-



CCTGC





45
CGGGGGGAGAAGTCCTGGAGCGGGTTTGGGTTGCAGTTTCCTTGTGCCGGGGATCCTGTCC





46
GAGGATTATTCGTCTTCTCCCCATTCCGCTGCCGCCGCTGCCAGGCCTCTGGCTGCTGAGG





47
CCCTCTCTCCCCTGGCCCGCAAAGTTTTGGCGGAGCCATCGCTGGGGCTGAGC





48
CCAGGGGGAACTTGTGGCAGTGCAGCATCTCAGGCCAGGGGAAGCCGTAGGCCTCCATGA





49
CGCCACCCAGAGCCCGAGGTTTGCCCTTCAGAAGCGGACCCGCAGACTCCTCGGACT





50
CGCCGAAATGAAACCCGCCTCCGTTCGCCTTCGGAACTGTCGTCACTTCCGTCCTCAGACTTGGA





51
TCCCTTGTTTTGAGGCGGGAACGCAACCCTCGACCGCCCACTGCGCTCCCA





52
GGCAGCCGGGAAATCCCGTGTCCCCACTCGTGGCAGAGGACGCTGTGGGG





53
CCCCCACAGTTTTCATGTGATCAGGAATTCAGCATAGGCTATAAGACGGAGTGCTCCATGTCAA





54
GGGGTTGTCATGGCAGCAGCTCCATCCCTGACCGCCACTTTCTCCCGGTGCCG





55
AAGTTCCGCCAGTGCACAGCAACCAATGGGCGGAGGGGTCCTTTGCCCCTGGGTTGC





56
AGTTGGGCCGGATCAGCTGACCCGCGTGTTTGCACCCGGACCGGTCACGTG





57
GGGCCGCTGCCTACTGTGGGCCTGCAAGGCGTGCAAGCGCAAGACCACCA





58
ACCTCCCTGCTGCGTGTCGCAAACCGAACAGCGGGCGTTGGCCCTCCTGC





59
GGGACCCGGAGCTCCAGGCTGCGCCTTGCGCCCGGGTCAGACATTATTTAGCTCTTCGGTTGAGC





60
GGCCGTGCGGGGCTCACCGGAGATCAGAGGCCCGGACAGCTTCTTGATCGCC





61
CCACTGCCTGCGGTAGAACCTGGTCCCGCATAGCTTGGACTCGGATAAGTCAAGTTCTCTTCCA





62
GGGCCGCAGGCCCCTGAGGAGCGATGACGGAATATAAGCTGGTGGTGGTGGGC





63
GCAGGACCCGGATGAGAGCGCACGCTTCGGGGTCTCCGGGAAGTCGCGGC





64
AAGAGGGAAAGGCTTCCCCGGCCAGCTGCGCGGCGACTCCGGGGACTCCAG





65
AGGGATGGCTTTTGGGCTCTGCCCCTCGCTGCTCCCGGCGTTTGGCGCCC





66
GCCCGCTCTCGGGTGACTCCGCAACCTGTCGCTCAGGTTCCTCCTCTCCCGGCC





67
TGCTGGACATCCACCGCCTCCAGGCAGTTTCGCCGTCACACCGTCGCCATCTGTAGC





68
GGCCGCGAAGCGACTCCGATCCTCCCTCTGAGCCTTGCTCAGCTCTGCCCCGC





69
CGCGCGTTCGCTGCCTCCTCAGCTCCAGGATGATCGGCCAGAAGACGCTCTACTCCTTTTTCTCC





70
CGGGGGCGGAGGAAACACCTATGAACCCTCCGGCAGCCTTCCTTGCCGGGCG





71
AGGGCCAGCCCTTGGGGGCTCCCAGATGGGGCGTCCACGTGACCCACTGC





72
GTGAAAGGTCGGCGAAAGAGGAGTAAAGACGGCGAGACGCGTCCACGCAGGGGGAGTCTGTGCG





73
GCGCTGAGGTGCAGCGCACGGGGCTTCACCTGCAACGTGTCGATTGGACG





74
GAGGCCTCATGCCTCCGGGGAAAGGAAGGGGTGGTGGTGTTTGCGCAGGGGGAGC





75
CGAAGTGGAAACCGGAGTTGCGTCATTGCTCCCACCCGATATCACCTTGGCAGCGACCGCG





76
ATGGGGTGCTCATCTTCCTGGAGCTGAGGAGCTGGGACGGGCATGGGGTGCTCATCCTCCTG





77
TTCCAGCCGGTGATTGCAATGGACACCGAACTGCTGCGACAACAGAGACGCTACAACTCACCGCG





78
CAGAGAAGACTCACGCAGTGAGCAGTCCGCAAGCCCGCTGGCGGCAGCGGC





79
GACACACCCACCTCAGCAGATCTCAGCCCATCCCTCCCAGCTCAGTGCACTCACCCAACCCCAC





80
CGGAGTGCTGCAAGCGCAGAAAATATACGTCATGTGCGGAGGCGGAGCTTCCGCCCTGCG





81
GGCCCAAGGACGTGTGTTGGTCCAGCCCCCCGGTTCCCCGAGACCCACGC





82
ACCTCTGGAGCGGGTTAGTGGTGGTGGTAGTGGGTTGGGACGAGCGCGTCTTCCGCAGTCCCA





83
CCCTTGGAAGGCGTGGAATTAGGAGAGAAATCCCTTAGTGGGCACACGAGTGAGTGCCCCTTGGA





84
CCGGCCGCCTCCCAGGCTGGAATCCCTCGACACTTGGTCCTTCCCGCCCC





85
TGCGTGGGTCGCCTCGCGTCTCTCTCTCCCACCCCACCTCTGAGATTTCTTGCCAGCACC





86
GACTTCGCGTCGCCCTTCCACGAGCGCCACTTCCACTACGAGGAGCACCTGGAGCGCAT





87
GAGGCTGCGAGCCTGGGCTCCCAGGGAGTTCGACTGGCAGAGGCGGGTGCAG





88
CCATTCTCCTGCCTCAGCCTCCCAAGTAGCTGGGACTACAGGCGCCTGCCACCACTCCCGGC





89
CAGGGGACGTTGAAATTATTTTTGTAACGGGAGTCGGGAGAGGACGGGGCGTGCCCCGACGTGCG





90
ACCCTGGAACGACGCCAAACGCGACCCCTACCAGAGGACTCGCGCATGCGCAGC





91
GTTCCCAAAGGGTTTCTGCAGTTTCACGGAGCTTTTCACATTCCACTCGG





92
GAAAGACACCGCGGAACTCCCGCGAGCGGAGACCCGCCAAGGCCCCTCCAG





93
CCCTCTCCGCCCCAAACAGCTCCCCACTCCCCCAGCCTGCCCCCACCCTC





94
ATTGGGGCTACACTCACCACAAGAGCAGCAAACAAAGCACTGGGTGTGGTAGAGGCTGTCCAGGG





95
CCCAGCGGGGCCCTTAGCAGAGCCTCTCCAATCCTCGGCGCCTCCCCTACACAGGGTTCG





96
GCGCCCAAGGCCCTGCTTCTTCCCCCTTCCTCTTCCCCTTGCCCAGCCGCGACTTC





97
CCCAGCCGAGCAGGGGGAAGCATCCCCAGCTCCCGCACCCAAGTCCCTGG





98
GCCGCCACCTGTTGAGGAAAGCGAGCGCACCTCCTGCAGCTCAGGCTCCGGG





99
CGGGAGCGGATTGGGTCTGGGAGTTCCCAGAGGCGGCTATAAGAACCGGGAACTGGGCGCG





100
GGCGGGGAAGCGTATGTGCGTGATGGGGAGTCCGGGCAAGCCAGGAAGGCACC





101
GGAGCCCGCAGTGCGTGCGAGGGGCTCTCGGCAGGTCCAGACGCCTCGCC





102
CGCATCCGGCTCCGAAAGCTGCGCGCAGCCATCATCAGGGCCCTTCTGGTGTT





103
GCCGCTGCCAGTCGACTCAACCACCGGAGTGGCCCCTGCAGTTGGATAGCAACGAGAATCCTCC





104
GGCAGGAAAGGGCCCGAAGGCAGCGAAGGCGAACGCGGCGCACCAACCTG





105
ACAGGGTCTTCCCACCCACAGGGCACCCAGGCGCAGCGGAGCCAGGAGGG





106
ACCAGCCGCACAACTTTTGAAGGCTCGCCGGCCCATGTGGGGTCTTTCTGGCGGC





107
CAGCCGGGCAGATAACAAAACACACCCCAAAGTGGGCCTCGCATCGGCCCTCGCATTCCTGT





108
GGCCTCGACGCCGAGGGGTGTCCCTCTCCTCTCCTGGTCAGGGAACGCAGCAACTGA





109
GGGCGGCAGTCAGAGCTGGAGCTCCGGGGAATCAGACGGGCAGCCAAAGGAGCAGA





110
CGGAAGTGCCCCGGTCCTGGAGGGGGTGGAAGTTGGGGAGCCCAGGCAGGA





111
CCGAGAGGGAAGAAAAAAATACCCTCTTTGGGCCAGGCACGGTGGCTCACCCCTGTAATCCCAGC





112
TCCCAGCACTTTGGGAGGCTGAGGCGAGCGGATCACGAGATCAGAAGATCGAGACCATCCTGGC





113
CCCCGGGACCGGATAACGCCCTAAATCAGCGCAGCTGAGGCGAGGCCGTGGCC





114
CTCGCGACCCCGGCTCCGGGCCTCTGCCGACCTCAGGGGCAGGAAAGAGTC





115
CCCGAGGCTCGCCCGACTCCTGGCTGCCCTGGACTCCCCTCCCTCCTCCCT





116
CTCCAGCTGCACTGCCACCCAGCCTGCCTGGTGCTGGTGCTCAACACGCAGC





117
CCGGCCTTTCCGCCAGAGGGCGGCACAGAACTACAACTCCCAGCAAGCTCCCAAGGCG





118
GGGAAGGAGCCTCAGCTCCGCTCCAGGTCCTCCACCAGGTAGGACTGGGACTCCCTTAGGGCCTG





119
GGGAGTGTCCTCCTCCGGGACAGCCGGACTCCCGCCGACTTCTGGGCGGC





120
GGGGAGCGTGCGGGGTCGCCACCATCGGGACCCCCAGAGGAGAGAGGACTTG





121
GACAGATGCAGTGCGTGCGCCGGAGCCCAAGCGCACAAACGGAAAGAGCGGG





122
TCCTTTGCGTCCGGCCCTCTTTCCCCTGACCATAAAAGCAGCCGCTGGCTGCTGGGCC





123
TGCGGCTTCTCTCACCCTGCCAGGCCTTCCCAGCTTCCCTGAGGTTGCCTGCTACACCCG





124
GCCCCAGCCCTGCGCCCCTTCCTCTCCCGTCGTCACCGCTTCCCTTCTTCCA





125
CCCGCACCCCTATTGTCCAGCCAGCTGGAGCTCCGGCCAGATCCCGGGCTG





126
GCAGAGTTCGTGCAGGGAGTTCGCACATAGGAGAGCACCGGTCCGGGAGTGCCAGGCTCG





127
CGGCCGGTGTGTGTCCCCGCAGGAGAGTGTGCTGGGCAGACGATGCTGGAC





128
TTTTTGGGACAACCATGGAGGGGTCCTCCGTCTCGGCCTCTTCGCATATCCCCCTCCGTGATCC





129
CGGCGGGTCAGATCTCGCTCCCTTTCGGACAACTTACCTCGGAGAGGAGTCAAGGGGAGAGGGGA





130
CCCGGACGAGCTCTCCTATCCCGAAGTTGTGGACAGTCGAGACGCTCAGGGCAGCCGGGC





131
CGGCCGGTGGAGGGGGGAAGGGAGGAATGGTGTCAGGGGCGGATATCTGAGCCCTGAG





132
CACCAAAGCCACCACCCAAGCCAGCACCAAGGCCACCACCATATCCTCCCCCAAAGCCACTACCA





133
CCGCCAGGCCCGCTGGGTGGAATGTGGTCATGTTTCAGACTGCCGATGGCTTCCA





134
CCTGTCCGGATCCCTCCCCGCCTTGCTCAGATCTCTGGTTCGCGGAGCTCCGAGGC





135
GCGCAGGGGCCCAGTTATCTGAGAAACCCCACAGCCTGTCCCCCGTCCAGGAAGTCTCAGCGAG





136
TCCTGCCCCAGTAAGCGTTGGACCGGGAGACGCAGTGCTCAGCATCGGTCAGCAGGG





137
GCGCCGAGGAGTCGGGACAGCCCCGGAGCTTCATGCGGCTCAACGACCTG





138
GGCCCCAGCGGAGACTCGGCAGGGCTCAGGTTTCCTGGACCGGATGACTGACCTGAGC





139
CGCCGGCTGCGAAGTTGAGCGAAAAGTTTGAGGCCGGAGGGAGCGAGGCCGG





140
GGAGCCGCTTGGCCTCCTCCACGAAGGGCCGCTTCTCGTCCTCGTCCAGCAGC





141
AAATGTGGAGCCAAACAATAACAGGGCTGCCGGGCCTCTCAGATTGCGACGGTCCTCCTCGGCC





142
CCTCTCAGATTGCGACGGTCCTCCTCGGCCTGGCGGGCAAACCCCTGGTTTAGCACTTCTCA





143
TCTCCCCACGCTTCCCCGATGAATAAAAATGCGGACTCTGAACTGATGCCACCGCCTCCCGA





144
GCCCAATCGGAAGGTGGACCGAAATCCCGCGACAGCAAGAGGCCCGTAGCGACCCG





145
CGTGGGGGGCTGTTTCCCGTCTGTCCAGCCGCGCCCACTTCTCAGGCCCAAAG





146
GGGGCCCTCGTGTTGCTGAACGAGGGCGGGTTCGCGATGTAAATAAGCCCAGAGGTGGGGTC





147
CCTGGGTCCCCTCGGCTCTCGGAAGAAAAACCAACAGCATCTCCAGCTCTCGCGCGGAATTGTC





148
CATAAGATGCCCTCCTGCGGGCCCTCACCTTTTGACACTGCCTCCCACCGCACTGGGGTCAA





149
ATCCCGCTGCACCACGCCATGAGCATGTCCTGCGACTCGTCTCCGCCTGGC





150
GCGCGGTGAAGGGCGTCAGGTGCAGCTGGCTGGACATCTCGGCGAAGTCG





151
CATTTCTTTCAATTGTGGACAAGCTGCCAAGAGGCTTGAGTAGGAGAGGAGTGCCGCCGAGGCGG





152
AAGTTCACTGAGGGTTGTAAGAGTCAGAATGGACTCCATGGAAGTTATGGGGTGTGAATCAAACCT-



CACA





153
CAGCACTTTGGGAGGCCGAGGTGGGCGGATTGCCTGAGGTCAGGAGTTTGAGACCAGCCTGG





154
GGGCAACACACACAGCAGCGACAGCCGGGAGGTAAGCCGCGTCCCAGCGG





155
CTGAGGGGAGGAGAAACTGGGCTGCGGGGGTCCGGGAGGGTGGATTCCGAGAAACTATGTGCCC





156
GTGTCCCAGCGCGTTGACGCAGCCTGTGATCCCTCGCGAGGCGAGGAGAAGGTC





157
AACCCCGACCTCAGGTGATCTGCCCAAAAGTGCTGGGATTACAGGCGTCAGCCACCGCGCC





158
AGGACGAAGTTGACCCTGACCGGGCCGTCTCCCAGTTCTGAGGCCCGGGTCCCACTGGAACT





159
GGAGACGCGTTGCCTTCGGCCGGGACCACTGCACCTGCCCGCGTGGGTAAT





160
CACAAAGGCCAAGGAGGGAGTGCGCAGGTCACGTGCGCCGGTGGTCAGCG





161
CTGACCTGGCGCTGCTGCCCCTGGTGCCTGACGGAGGATGAGAAGGCCGCC





162
AAAAGTGGCTCGGAACCCCAAATCCCGGTTAGATTGCAGGCACCGCCGGACGCTGGCTCCC





163
GTTCTGTTGGGGGCGAGGCCCGCGCAAGCCCCGCCTCTTCCCCGGCACCAG





164
GCGTCGACACTGCGCAAGCCCAGTCGCGCCTCTCCAGAGCGGGAAGAGCG





165
TGTCTGAGTATTGATCGAACCCAGGAGTTCGAGATCAGCTTGAGCAAGATAGCGAGAACCCCCGC





166
GAAAGACTGCAGAGGGATCGAGGCGGCCCACTGCCAGCACGGCCAGCGTGG





167
TTAGAGTCCCCTGGGTGTGTGCCCCGCAGAGGGAGCTCTGGCCTCAGTGCCCAGTGTGC





168
TTAGAGTCCCCTGGGTGTGTGCCCCGCAGAGGGAGCTCTGGCCTCAGTGCCCAGTGTGC





169
GGGGACGAGCAGGAAAAGGCCGGGGTGGGGGTGGAATTCCTCGGCGGGCAG





170
GGGAGCCTGAGGCAGGAGAATCGCTTGAATCCGGGAGGCGGAGGTTGCAGTAAGCCGAGATCGC





171
CTTTCGGAGGCCTCATTGGCTGAAGGTCGCCGTCGCCCAACGCAGGCCATTCTGGGT





172
CCTCCTGGGGTCAAGTGATCATCCTGGCTCAACCACCCAAGTAGCCGGGACTACGGGTGGCCGC





173
CCAATGCCCCAACGCAGGCCACCCCCGGCTCCTCTGTGGACTCACGAAGACAAGGTC





174
CTCTGAGAGCCACAGTCAGGTCTGTCCTCAGGGGTCGAGGCGGCTGCGCTGGGGCCT





175
GGACAGCCCGCTCGGGAGTCGGGCCTGGAAGCAGGCGGACAGCGTCACCT





176
GCCAGGATGGTCTCGATCTCCTGACCTTGTGATCTGCCCGCCTCGGCCTCCCAAAGTGTTGGG





177
TGCCCAGGGGAGCCCTCCATTTGTAGAATGAATGAGAGTCCAGGTTATGAACAGTGCCTGGAGTG





178
GACCGGTTTTATCCCGCTGAGGCCCTGGGAGATGGGTCTGGCGAGGCTCGTAGGCCGC





179
GCGGAACCTCAAATTGCGGCAGCGGAACCTAAAGTTTCAGGGTGAGATGCGTTGACTCGCGGTGG





180
GCTCAGTCCCTCCGGTGTGCAGGACCCCGGAAGTCCTCCCCGCACAGCTCTCGCT





181
CGGGCAGGCGGGACCGGGAGGTCAATAACTGCAGCGTCCGAGCTGAGCCCA





182
CGCGGTGGGCCGACTTCCCCTCCTCTTCCCTCTCTCCTTCCTTTAGCCCGCTGGCGCC





183
TCCCCGGCATGCGCCATATGGTCTTCCCGGTCCAGCCAAGAGCCTGGAACCACG





184
CTCCGCGCTCAGCCAATTAGACGCGGCTGTTCCGTGGGCGCCACCGCCTC





185
GCGAGAGGGTCGTCCGCTGAGAAGCTGCGCCGGAGACGCGGGAAGCTGCTG





186
GACCCGCCTGCGTCGCCACCCTCTCGCCGCTCCCTGCCGCCACCTTCCTC





187
GAGGGGTCCGGGACGAAGCCACCCGCGCGGTAGGGGGCGACTTAGCGGTTTCA





188
CCCCGAACAAAAAATTCAAATGGGAAAGAGAGGCAGATGGCAGAGAACAGGGGAGGGGCTGGGCA





189
GCGGCGAGGAGGGTCACAGCCGGAAAGAGGCAGCGGTGGCGCCTGCAGAC





190
GGCGGTCTCCGGTTCGCCAATGTGGCTGGGTCCGTAGGCTTGGGCAGCCT





191
CCTCCCCTTTGCGTGCGGAGCTGGGCTTTGCGTGCGCCGCTTCTGGAAAGTCG





192
AGCCTACTCACTCCCCCAACTCCCGGGCGGTGACTCATCAACGAGCACCAGCGGCCAGA





193
CAGGAGGTGAGGAGGTTTCGACATGGCGGTGCAGCCGAAGGAGACGCTGCAGTTGGAGAGCG





194
AGATTTCCCGCCAGCAGGAGCCGCGCGGTAGATGCGGTGCTTTTAGGAGCTCCGTCCGACA





195
CGGGCGTGGTGGTGGGCACCTGTAATCCCAGCTACTCAGAAGGTTGAGGCAGGAGAATCGCTTGA





196
TCCCAAATCCGAGTCTGCGGAGCCTGGGAGGGCTCCCAGCTTCCTATCCAAACCGCGCC





197
CCCTGGTCGAGCCCCCTTTCCTCCCGGGTCCACAGCGAGTCCCCTGAGGAAGGAGGG





198
CAGGGACCCGCGAGTCCCTGGACACGCACTGGCCAACGCCAGACCCCATC





199
CAAGCAGCCCTCGGCCAGACCAAGCACACTCCCTCGGAGGCCTGGCAGGG





200
GAGAAGGAGCGACCCCCAAAACGAAGCGGCTGGATCTGACCTTCCAAGGCCTGTTGGCGACGC





201
TTCTTCCCCGCAGGGTCAGCGCTGGGGCTCCGGCCGTAGAGCCACGTGACC





202
ATTCATTTCTGTTATGGAACTGTCGCGGCACTACAAAGTCTCTATGTAGTTATAAATAAACGTT





203
ACCGAGTGCGCTGCTGTGCGAGTGGGATCCGCCGCGTCCTTGCTCTGCCC





204
GTGTGGTGAGTGTGGGTGTGTGCGCGTCTCCTCGCGTCCCTCGCTGAGGTGCCT





205
GCCTGGGCTGCCAGACGTCGCCATCATTGTTCCATGCAGATCATGCCCATCCTGTGCAGAAG





206
GCGGGTCCGAGGCGCAAGGCGAGCTGGAGACCCCGAAAACCAGGGCCACTC





207
TCTCCATGGTGGCCATTGCCTCCTCTCTGCTCCAAAGGCGACCCCGAGTCAGGGATGAGAGGC





208
CGCGGGACTCCGCGGGATCTCGCTGTTCCTCGCTCTGCTCCTGGGGAGCC





209
CGCCCCCTTTTTGGAGGGCCGATGAGGTAATGCGGCTCTGCCATTGGTCTGAGGGGGC





210
GTTCTGTTGCCAATGCCATTCAGACCCCAGTCCGGGATTCCGCGCTCGGGGTGCG





211
TTTCCGCGAGCGCGTTCCATCCTCTACCGAGCGCGCGCGAAGACTACGGAGGTCGA





212
ACCCGGGTTCAGCGGGTCCCGATCCGAGGGCGTGCGAGCTGAGCCTCCTG





213
GAGAGTGGACGCGGGAAAGCCGGTGGCTCCCGCCGTGGGCCCTACTGTGC





214
GGCTACAGCCGCCATTTCCACGCTCCACCAATCAAATCCATTCTCGAGGAAGACGCACCGCCCC





215
AGCGCGCACAAAGCCTGCGGGAGGATCCATTGTAGCGGTCGCTCCTCCCCGCTTAGCG





216
ATCGGGCGAAGCTCGCGGGAAACCGCTCTGGGTGCGCAGGACAAAGACGCG





217
CGACGGAGCCGTGTGGAGGCCAAAACTCCTCCCGGAAGCCGCTACTGGCCCCG





218
CGCCCCACTACTGCCTGCAGCGGGCTTCCTTACTCCGCCTGCTGGTTCCTACTGGAGGAGAGGCC





219
GCACTCGTAGCGCGCTGGGCGAGCCGGACCGGAAGTTGAAGAAGTGAAGCGCCG





220
TGAAGGGAGGGCTTGGTGTGGGGACTTGCACTGGGCAGAGGGGCAGCTTCCCTGAGAGCAGCTA





221
CGGGAGCGCCCGGTTGGGGAACGCGCGGCTGGCGGCGTGGGGACCACCCG





222
CAGCACCGGAGAGGGCGCACTGCAAAGGCGGGCAGCAGACCGTGGAGAGC





223
GGCGCAGAGGCGTCACGCACTCCATGGTAACGACGCTCGGCCCGAAGATGGC





224
GCCGCGTCTGCGAACCGGTGACCTGGTTTCCCCTCCAGCCCTCACGGCTG





225
CGAGCTGTTTGAGGACTGGGATGCCGAGAACGCGAGCGATCCGAGCAGGGTTTGTCTGGGC





226
GCAGCGCTGAGTTGAAGTTGAGTGAGTCACTCGCGCGCACGGAGCGACGACACCCC





227
CGCGCGCTCGCCGTCCGCCACATACCGCTCGTAGTATTCGTGCTCAGCCTCGTAGTGGC





228
CGGAAGGGGTGAGGCCGGAAGCCGAAGTGCCGCAGGGAGTTAGCGGCGTCTCG





229
GGGGGCGTCGGGCTTGGGACAGGGGAGGATACCAGGGCCACCTTCCCCAACCC





230
CGGGCTGGAGGGTTATCTGGGAAGTCAGCCCCGGCCTCGGTCCTCTCCACGTTGCTGC





231
GGAACGAGGTGTCCTGGGAACACTCCCGGGTCTGTAACTTCGGACAAATCACGCTCGCTTTCCCG





232
AAACGAGAGAGTAGCCAGACTCTCCGCGCATGGAGCCGACGGCACCCACCAGCACACCG





233
TACTCACGCGCGCACTGCAGGCCTTTGCGCACGACGCCCCAGATGAAGTC





234
TGACCGGACAGAGCAGAGCGGGGACTGCAATTCCCAGAAGACCCCACGGTAGGGGCGG





235
AGACAATCCCGGAGGGGGAAAGGCGAGCAGCTGGCAGAGAGCCCAGTGCCGGCC





236
GGCCGAAGAGTCGGGAGCCGGAGCCGGGAGAGCGAAAGGAGAGGGGACCTGGC





237
CCAGGCTCCGCTCGTAGAAGTGCGCAGGCGTCACCGCGCATCCAGGAGCCAC





238
CTCTGATGACGCTCCAAGGGAAGAGGAAGTGGGGATCGGCGAGCGGGTGGGTGCGC





239
TGAAGGGTAATCCGAGGAGGGCTGGTCACTACTTTCTGGGTCTGGTTTTGCGTTGAGAATGCCCC





240
CGGTCCTGCATGCAATGCAAGCCTGAGCTCTCCCGCCATAAGGCTGCAGCGGTGTGG





241
CCTGGAGGAGGAGGAGTCAGGCCGGGTAGGAGGGCTAAGGAGGTTCCCGGGAAGGCAGGGCCC





242
GCTGCTGACATGACTTCTTTGCCACTCGGTGTCAAAGTGGAGGACTCCGCCTTCGGCAAGCCGGC





243
GAGCGGCGCAGGGTTGGAGAGGGAAGCGCTCGTGCCCACCTTGCTCGCAG





244
CCGATGACCGCGGGGAGGAGGATGGAGATGCTCTGTGCCGGCAGGGTCCC





245
GCCGCCCTACAGACGTTCGCACACCTGGGTGCCAGCGCCCCAGAGGTCCC





246
GGGCCGCAATCAGGTGGAGTCGAGAGGCCGGAGGAGGGGCAGGAGGAAGGGGTG





247
CGGCGGGACCATGAAGAAGTTCTCTCGGATGCCCAAGTCGGAGGGCGGCAGCGG





248
GTGGGCGCACGTGACCGACATGTGGCTGTATTGGTGCAGCCCGCCAGGGTGT





249
GAAAGAGCCGGAAACACCTGGTCTCTCAAGCAGGTACAGCCCGCTTCTCCCCAGCACCCCGGTG





250
GCAGCCGCAGCTGAGGTCACCCCGCTGAGGTGGTGGGGAGGGGAATGGTT





251
GGGCGGCCAGCGGTGACTCCAGATGAGCCGGCCGTCCGCGTTCGCGCCGC





252
GGGCACCACGAATGCCGGACGTGAAGGGGAGGACGGAGGCGCGTAGACGC





253
GAGGCCGCCATCGCCCCTCCCCCAACCCGGAGTGTGCCCGTAATTACCGCC





254
CGCGGGGAACGATGCAACCTGTTGGTGACGCTTGGCAACTGCAGGGGCGC





255
CTTGAGACCTCAAGCCGCGCAGGCGCCCAGGGCAGGCAGGTAGCGGCCAC





256
CACACCGTCCTCGCCCGGAGCGCAGAGGCCGACGCCCTACGAGTGGATGC





257
CCCTTGCACACGAGCTGACGGCGTGAACGGGGGTGTCGGGGTTGGTGCAA





258
GCAAAGTGATACCTGGCCGTCCCACCCTCTGGTCCCAGAAGGAGCTCTCGCTGGAGCCAGGCA





259
GGTTGGGGGACTGCCCGGGGCTTAGATGGCTCCGAGCCCGTTTGAGCGTGGTCTCG





260
CGTTGAAAGCGAAGAAGGAGCGGCAGTCCAGCAGCAGGCATTGCGCCGCTCGCTC





261
GAGTCCTCAACAACGACAGCGGGGACTGCGGGACCAGGGTAAAGCGGCGACGGCG





262
GCTCCTGAGAAAGCCCTGCCCGCTCCGCTCACGGCCGTGCCCTGGCCAACTT





263
GATGCTGCTGCCGGAGCTGAGGTCTTGCCTGGAGATCCGAACGAGACACCACGTCAACCGG





264
TGGTGGCAGGAGAGCGATGAGACGGGAAAGTGTGGGGCAAAGCTTACAGTCATTGGTCCAGA





265
CCACTCGCAGTCTGCGTGTGGGGGAAACGAGTGCCCGGCGTATGAAACGCCTAACTTCGCGAAA





266
CAGGCGGCTCCCGCAGTCTAAGGGACCTGGCGCGAGTCCGGGAAGCGGAGG





267
CTGCACGCGGTGCGAAGGGGCCAGCAGGGAAGGAGCAGAGGATGGGGGGT





268
CGGGGCCACAGGACCCTGGGGCTTGAGTCACACAAGAATGTCTCTGGGAGACCCGAGAGACTCA





269
CTTAGAGGAGGAGGAGCAGCGGCAGCGGCAGCAGGAGGCGACAGCTGCCAGCCG





270
CTCATACCAGATAGGCGCGAACGCCTCTGGCAGCGGCGTCCAGGGGGTCCGGC





271
GGGTGCTGGCACATCCGAGGCGTTCTCCCGACTCTGGACCGACGTGATGGGTATCCTGG





272
CATGATAAGCCAGGGACCTCGCGGCGCAGGCGGAGGGAGGGAGAGCGTCGC





273
CCCCCCACTCAACAGCGTGTCTCCGAGCCCGCTGATGCTACTGCACCCGCCG





274
TCCCACCTGCTGCCCGAGGAAGACTTCCGGGAGAAACGCTGTCTCCGAGCCCCCG





275
CCAGGTGAAGCCGAAGGGGAAGCGGATGGGGTTGCTGAACGCGGAGTCGGCG





276
CAGTGGCCCTGCGCGACGTTCGGCGCTACCAGAACTCCGAGCTGCTGATCAGCAAGC





277
AAGGATTACCTCGCCCTGAACGAGGACCTGCGCTCCTGGACCGCAGCGGACACTGCGG





278
GCAGGCTCGTGGCGGTCGGTCAGCGGGGCGTTCTCCCACCTGTAGCGACTCAGGTTACTGAAAA





279
GAGGGAAGTGCCCTCCTGCAGCACGCGAGGTTCCGGGACCGGCTGGCCTG





280
GAAGCGCGACCTCGGGCGGTTGGAGGGGCTACCGGGTCTTACCAGTCCGTGGCG





281
CCCAACCCGAGCAAGACCTGCGCTGAAACGGATTGGCTGCCCTCCGCCCG





282
AGCCGCTCTCCCGATTGCCCGCCGACATGAGCTGCAACGGAGGCTCCCACC





283
ACCACACGGCCAAGGGCACCTGACCCTGTCAAAACCCCAAATCCAGCTGGGCGCG





284
CCGAGGCAGCCGGATCACGAAGTCAGGAGTTCGAGACCAGCCTGACCAACATGGTGAAACCCCGT





285
CCGGCGTCTCCGCGTGGGGCGCACCGTCCGACCCCCCCCTCCCGGTGTGC





286
GGCGCAGATGGCGCTCGCTGCGAGATGGATGCTCCAGGGCGGGTAATCACTCCTG





287
CCAGGCCTCCTGGAAACGGTGCCGGTGCTGCAGAGCCCGCGAGGTGTCTG





288
GGCGAGAGGTGAGAAGGGAAGAGGGCTCCCGGCTCTCTCGGGGCGGGAATCAGTGGGC





289
GCCTGCCTCGCCTCTGCCCGAGCTGATGAGCGAGTCGACCAAAAAAGAGTTCGCGGCG





290
CATTGCGGGACCCTATTTATCCCGACACCTCCCCTGACGTGGGCTCGGAACGCTCCCTTGGCAG





291
CGAAGGCCGGAGCCACAGCGCTCGGTGTAGATGCCGCACGGCTGGCCCTC





292
GGGCTGGATGAGTCCGGAAGTGGAGATTGGCTGCTTAGTGACGCGCGGCGTCCCGG





293
CGCCAGTGCGATTCTCCCTCCCGGTTCCAGTCGCCGCGGACGATGCTTCCTC





294
CGTCCGAGAAAGCGCCTGGCGGGAGGAGGTGCGCGGCTTTCTGCTCCAGG





295
TCCGGCTGCGCCACGCTATCGAGTCTTCCCTCCCTCCTTCTCTGCCCCCTCCGCTCC





296
CAGCCTCAGTTTCCCCATTGGTAAAGCATTGACGGTGGTTGCGGACGGCTTCTGCGGACAGAGCC





297
CCTGAGACAGGCCGAACCCAACTCTTCACAGGGCCGAATTCTTTGCCCGCAGCCCAGCACC





298
CAGAGGGGGGTGCCGGGGTCGCGGACTGCCACCAGGTTGAGGAAAGGAGGGG





299
CGACATCCTGCGGACCTACTCGGGCGCCTTCGTCTGCCTGGAGATTGTAAGTGGGGCCGC





300
ACCGCCTCCTCCCCGCTGTCTGGGTCGCAGGCCTTAGCGACGGGCTGTTCTCCG





301
CTCGGGACTCCAGGGCTGTCCCTCCCGCAGGCTGTCCTTCCACCTCCACCCCA





302
CGGCCGCTCCTCGTAGGCCAGGCTGGAGGCAAGCTCCTTCTCCTCAAAGCTGCGCTGC





303
CATCTCTTCCCCCGACTCCGACGACTGGTGCGTCTTGCCCGGACATGCCCGG





304
CCCAAGACCCTAAAGTTCGTCGTCGTCATCGTCGCGGTCCTGCTGCCAGTGAGTCCCGGCC





305
CCCACTCTTCCCCTGACTCCGACGGCGGGTTCGTCCTGCCCAGACATGCCCG





306
GTCCCCCTCTCTCTCTGCCCCCTCCCGGTGCCAGGCGCGCTTTTCCCCAGG





307
CAGCCTGCTGAGGGGAAGAGGGGGTCTCCGCTCTTCCTCAGTGCACTCTCTGACTGAAGCCCGGC





308
ACTGACTCCGGAGGCTGCAGGGCTGGAGTGCGCGGGGCTCCTACGGCCGAG





309
GGCCAGGCTCGGGCAGGGGCCGTGCTCAGGTGCGGCAGACGGACGGGCCG





310
CCGGGCTTCTGGGACGCTCAGCCGTGCGCTACCCGGTGCAGCTGCTTTCTCACC





311
TTTAGGTAGACGTGGAGGCGACTCAGATCGCCTCGCGGTTCCCGGGATGGCGCGGTCG





312
TGACCAGGACCGCAGGCAAGCACCGCGGCGACGGTTCCAGCCAGGAAAATGAG





313
GGGCCGGACCCGGCCTCTGGCTCGCTCCTGCTCTTTCTCAAACATGGCGCG





314
GCCGCGCTCCTCGCACCGCCTTCTCCGCAGGTCTTTATTCATCATCTCATCTCCCTCTTCCCC





315
GAGCTGCGAACTGGTCGGCGGCGCAAGGCGCGGACTCCGGTGAGTTGTGT





316
GCCCGCGTTCCTCTCCCTCCCGCCTACCGCCACTTTCCCGCCCTGTGTGC





317
ACGCGTCGCGGAGTCCTCACTGCCCCGCCTCGCTCTGGCAGAGTGGGGAG





318
GCGAGCAGCGGCCTCCAGCGCTGGTGGCTCCCTTTATAGGAGCGCTGGAGACACGGG





319
GGGGAAGGCGGAGGGCGAGGGGAAGAGTCACTGAGCTGCGGGGCATAGGGGGTCC





320
CTCTGCTCGCGTGCTGCTCTGAAGTTGTTCCCCGATGCGCCGTAGGAAGCTGGGATTCTCCCA





321
AGGGAGGTCGTTTTCTTCAGCTCCCCAGGTGGTCTGTGCTGGGTGTGCTGACGGTCCTTTTGGGA





322
GCCCCTGGCCCTGACTGCTGGTGCGAGGCAGTGCACGACTCAGCTGGCCG





323
GGCCGGGTAACGGAGAGGGAGTCGCCAGGAATGTGGCTCTGGGGACTGCCTCGCTCG





324
GGCCGGGACTTTCTGGTAAGGAGAGGAGGTTACGGGGAACGACGCGCTGCTTTCATGCCC





325
CAGTCTGGGGACCGGGGAGGCGGGGAGAGGGAAGGGGAAAGCGCGGACGC





326
GTCTATCAAAAGTCTTTTCGTTTCCCCCTCCCCCTTTCCCCACCGCCCACCAAAATGAGCCGCG





327
ATGCCGCCATCGCGGTTCATGCCGTTCTCGTGGTTCACACCGCCCTCAGGG





328
TCCCGGTCTTCGGATCCGAGCCGGTCCTCGGGAAAGAGCCTGCCACCGCGT





329
TGAGAGGCTCCGGTAAAGCCGTCCGGCAATGTTCCACCTGGAAAGTTCCAGGGCAGGGGAAGGG





330
CCCAGGGAGAGGGAGAGGAGGCGGGTGGGAGAGGAGGAGGGTGTATCTCCTTTCGTCGGCCCG





331
CCCGTCTTCTCTCCCGCAGCTGCCTCAGTCGGCTACTCTCAGCCAACCCC





332
GACCCCCCTTTGGCCCCCTACCCTGCAGCAAGGGTAGCGTGACGTAATGCAACCTCAGCATGTCA





333
CCCCGAAGCCCTTGCTTTGTTCTGTGAGCGCCTCGTGTCAGCCAGGCGCAGTGAGCTCAC





334
CGCGCGGCCTTCCCCCTGCGAGGATCGCCATTGGCCCGGGTTGGCTTTGGAAAGCGG





335
CCACCCAGTTCAACGTTCCACGAACCCCCAGAACCAGCCCTCATCAACAGGCAGCAAGAAGGGCC





336
AAGCAGCTGTGTAATCCGCTGGATGCGGACCAGGGCGCTCCCCATTCCCGTCGGGAG





337
CCACGCACCCCCTCTCAGTGGCGTCGGAACTGCAAAGCACCTGTGAGCTTGCGGAAGTCAGT





338
CCACGCACCCCCTCTCAGTGGCGTCGGAACTGCAAAGCACCTGTGAGCTTGCGGAAGTCAGT





339
CCCTCCACCGGAAGTGAAACCGAAACGGAGCTGAGCGCCTGACTGAGGCCGAACCCCC





340
TTGTCCCTTTTTCGTTTGCTCATCCTTTTTGGCGCTAACTCTTAGGCAGCCAGCCCAGCAGCCCG





341
TTCTCAGGCCTATGCCGGAGCCTCGAGGGCTGGAGAGCGGGAAGACAGGCAGTGCTCGG





342
CAGCGTTTCCTGTGGCCTCTGGGACCTCTTGGCCAGGGACAAGGACCCGTGACTTCCTTGCTTGC





343
AGGCAGGCCCGCAAGCCGTGTGAGCCGTCGCAGCCGTGGCATCGTTGAGGAGTGCTGTTT





344
GACTCTGGGTATGTTCTCGAAAGTTGTTACAACCCCAACCCAGGGTTGACCTCAAACACAGGAGG





345
CTCTGGCTCTCCTGCTCCATCGCGCTCCTCCGCGCCCTTGCCACCTCCAACGCCCGT





346
CGGGAGCGCGGCTGTTCCTGGTAGGGCCGTGTCAGGTGACGGATGTAGCTAGGGGGCG





347
CCCCAAGCCGCAGAAGGACGACGGGAGGGTAATGAAGCTGAGCCCAGGTCTCCTAGGAAGGAGA





348
GGGCTCTTCCGCCAGCACCGGAGGAAGAAAGAGGAGGGGCTGGCTGGTCACCAGAGGGTG





349
TTCTCTTCCATCCCATCCTCCCTTCTGGTCCTCCTTTCCACAGTGGGAGTCCGTGCTCCTGCTCC





350
CCGCCTCTGTGCCTCCGCCAACCCGACAACGCTTGCTCCCACCCCGATCCCCGCACC





351
CCGCGCCACGTGAGGGCGGCAAGAGGGCACTGGCCCTGCGGCGAGGCCCCAGCGAGG





352
CACTGCTGATAGGTGCAGGCAGGACAGTCCCTCCACCGCGGCTCGGGGCGTCCTGATT





353
CGGGAGCCTCGCGGACGTGACGCCGCGGGCGGAAGTGACGTTTTCCCGCGGTTGGAC





354
TGTCCTCCCGGTGTCCCGCTTCTCCGCGCCCCAGCCGCCGGCTGCCAGCTTTTCGGG





355
GGTGTCGCGACAGGTCCTATTGCGGGTGTCTGCGGTGGGAAGGGCGGTGGTGACTGG





356
ACATATGACAACGCCTGCCATATTGTCCCTGCGGCAAAACCCAACACGAAAAGCACACAGCA





357
GGAAACCCTCACCCAGGAGATACACAGGAGCACTGGCTTTGGCAGCAGCTCACAATGAGAAAGA





358
TTACCATTGGCTTAGGGAAAGGAGCTTACTGGGAACTGGGAGCTAGGTGGCCTGAGGAGACTGGG





359
AAGAACAGGCACGCGTGCTGGCAGAAACCCCCGGTATGACCGTGAAAACGGCCCGCC





360
ccggggactccagggcgcccctctgcggccgacgcccggggtgcagcggccgccggggctggggccg-






gcgggagtccgcgggaccctccagaagagcggccggcgccgtgactca





361
TCACGGGGGCGGGGAGACGC





362
GCACAGGGTGGGGCAGGGAGCA





363
accgggccttccgcgcccct





364
TCCCACCTCCCCCAACATTCCAGTTCCT





365
TCACAGAGCCAGGCAAGCATGGGTGA





366
ggagcagcaggctcgctcgggga





367
gcccaaagtgcggggccaaccc





368
CGGAAAGAGGAAGGCATTTGCTGGGCAAT





369
CCAGCGGCCCCGCGGGATTT





370
ccgacagcgcccggcccaga





371
TGGGCCAATCCCCGCGGCTG





372
GGGCGGCTGCGGGGAGCGAT





373
CGCCAGGACCGCGCACAGCA





374
GCGGGCAAGAGAGCGCGggag





375
AGCGCGCAGCCAGGGGCGAC





376
CGTGCGCTCACCCAGCCGCAG





377
TGAGGGCCCGGGGTGGGGCT





378
ATATGCgcccggcgcggtgg





379
CCGCAGGGGAAGGCCGGGGA





380
TCCTGAGGCGGGGCCGTCCG





381
GGAGGCCGGGGACGCCGAGA





382
GCCGCCGGCTCCCCCGTATG





383
GCAGGAGCGACGCGCGCCAA





384
cgggggaaacgcaggcgtcgg





385
ccccccaccctggacccgcag





386
CGCCCGGCTTTCCGGCGCAC





387
ccgctgggccgccccTTGCT





388
CGCTTCTCCATAGCTCGCCACACACACAC





389
TCCGCGCACGCGCAAGTCCA





390
CGTCTCAACTCACCGCCGCCACCG





391
GACAAATGCGCTGCTCGGAGAGACTGCC





392
TGCGCCTGCGCAGTGCAGCTTAGTG





393
gaagtcaagggctttcaacctcccctgcc





394
tggatcccgcacaggggctgca





395
GCCGCCTGTGGTTTTCCGCGCAT





396
Gcgcgctctcccgcgcctct





397
TTCCGGCCCAGCCCCAACCC





398
TCCGGGTCAGGCGCACAGGGC





399
GGGGGCGGTGCCTGCGCCATA





400
GGCGCGGGCCCTCAGGTTCTCC





401
gcgtccgcggcTCCTCAGCG





402
GGGAGGCGCCCAGCGAGCCA





403
GCGCGCAGGGGGCCTTATACAAAGTCG





404
CCCCCACCCCCTTTCTTTCTGGGTTTTG





405
CGCGCGTTCCCTCCCGTCCG





406
gccggcggAGGCAGCCGTTC





407
TGCCTGGTGCCCCGAGCGAGC





408
CGGCGGCGGCGCTACCTGGA





409
GTGGTGGCCAGCGGGGAGCG





410
GGCGGCACTGAACTCGCGGCAA





411
CCTCGGCGATCCCCGGCCTGA





412
ACGCAGGGAGCGCGCGGAGG





413
TGAAATACTCCCCCACAGTTTTCATGTG





414
TCCGGGCGCACGGGGAGCTG





415
ggcggcggcgTCCAGCCAGA





416
AGGGTCGCCGAGGCCGTGCG





417
CCGCGCCTGATGCACGTGGG





418
gccgggagcgggcggaggaa





419
AGGGGCGCACCGGGCTGGCT





420
TGCCACGGGAGGAGGCGGGAA





421
cgggcatcggcgcgggatga





422
acaccgccggcgcccaccac





423
CCCCCAACAGCGCGCAGCGA





424
GCCCCGCTGGGGACCTGGGA





425
TCCCGGGGGACCCACTCGAGGC





426
GCCCGCGGAGGGGCACACCA





427
GGCCCACGTGCTCGCGCCAA





428
CGGCGGAGCGGCGAGGAGGA





429
GCCTCGCCGGTTCCCGGGTG





430
gcaggcgcgccgATGGCGTT





431
CCTCCCGGCTTCTGCATCGAGGGC





432
GCGGTCCGCGAGTGGGAGCG





433
AGCAGCGCCGCCTCCCACCC





434
CCGACCGTGCTGGCGGCGAC





435
TCCCGGGCTCCGCTCGCCAA





436
GCATGGGGTGCTCATCTTCCCGGAGC





437
CCCGAGAGCCGGAGCGGGGA





438
GCCGCTGCAGGGCGTCTGGG





439
gcgctgccccaagctggcttcc





440
TCAGGATGCCAGCGTGACGGAAGCAA





441
GGGCGGTGCCATCGCGTCCA





442
GGTGGGTCGCCGCCGGGAGA





443
AGGCGGAGGGCCACGCAGGG





444
GGTCCGGGGGCGCCGCTGAT





445
GCGGCCTGCGGCTCGGTTCC





446
CGGGAACCGTGGCGGCCCCT





447
gcggggaaggcggggaaggc





448
gcctcccggtttcaggcc





449
CAGCCCGCGCACCGACCAGC





450
CCCCCAGCCACACCAGACGTGGG





451
tgggcttcctgccccatggttccct





452
TCCGCGCTGGGCCGCAGCTTT





453
gcatggcccggtggcctgca





454
TGGGCAGGGGAGGGGAGTGCTTGA





455
TCCCCGGCGCCTTCCTCCTCC





456
TCCACCGCGCTTCCCGGCTATGC





457
CCCGCATCTGACCGCAGGACCCC





458
TGCGGACACGTGCTTTTCCCGCAT





459
GGAGCTGGAAGAGTTTGTGAGGGCGGTCC





460
CGGCCGCCAACGACGCCAGA





461
AGCGCCCGGTCAGCCCGCAG





462
TCCCGCCAGGCCCAGCCCCT





463
CCGATTCTTCCCAGCAGATGGCCCCAA





464
ACGCACACCGCCCCCAAGCG





465
TAGGCCCCGAGGCCGGAGCG





466
GGGGTTCGCGCGAGCGCTTTG





467
GCCAGTCTCCCGCCCCCTGAGCA





468
TGAGGAGGCAGCGGACCGGGGA





469
GCCGGCTCCACGGACCCACG





470
GCCGCCACCGCCACCATGCC





471
TTGAGTAAGGATGATACCGAGAGGGAAGA





472
tgggccaggcacggtggctca





473
CCCGGCGAAGTGGGCGGCTC





474
GGCGGCCTTACCCTGCCGCGAG





475
ggtggggccggcgAGGGTCA





476
TCGGCGCGGACCGGCTCCTCTA





477
GGCCCATGCGGCCCCGTCAC





478
TGGGATTGCCAGGGGCTGACCG





479
CGCCGGAGCACGCGGCTACTCA





480
CCCTCGGCGCCGGCCCGTTA





481
GCACAGCGGCGGCGAGTGGG





482
TCACCTCGGGCGGGGCGGAC





483
GAGACGGGGCCGGGCGCAGA





484
CGCATTCGGGCCGCAAGCTCC





485
GGCCCGAAAGGGCCGGAGCG





486
ACGGCGGCCGGGTGACCGAC





487
TCCACCGGCGGCCGCTCACC





488
GCGGTCAGGGACCCCCTTCCCC





489
CGGCCGAAGCTGCCGCCCCT





490
GGCGGCCTTGTGCCGCTGGG





491
TCGCGGGAGGAGCGGCGAGG





492
TGCCCACCAGAAGCccatcaccacc





493
TGGGCCATGTGCCCCACCCC





494
CCCGCCAGCCCAGGGCGAGA





495
gccccctgtccctttcccgggact





496
GGTGGGGGTCCGCACCCAGCAAT





497
ggggcccccgggTTGCGTGA





498
TGCCTGCACAGACGACAGCACCCC





499
AGGCCGCGCCGGGCTCAGGT





500
CGGGGTAGTCGCGCAGGTGTCGG





501
tgcaggcggagaatagcagcctccctc





502
ccggaaatgctgctgcaagaggca





503
gcgtcggatccctgagaacttcgaagcca





504
CCCGGCTCCGCGGGTTCCGT





505
GCGTCGCCGGGGCTGGACGTT





506
GGGGCCTGCCGCCTCGTCCA





507
CGCACACCGCTGGCGGACACC





508
CGCAAACCATCTTCCCCGACGCCTT





509
GGGCCCTCCGCCGCCTCCAA





510
CCACCACCGTGGCAAAGCGTCCC





511
TCACAGCCCCTTCCTGCCCGAACA





512
TGCTTGATGCTCACCACTGTTCTTGCTGC





513
ggccaggcccggtggctcaca





514
TGCGGGACGGGTGGCGGGAA





515
gGCTTGGCCCCGCCACCCAG





516
GGCGGGGAAGGCGACCGCAG





517
ggcgcccaaccaccacgcc





518
GAAAAGCCCCGGCCGGCCTCC





519
CCGCAGGTGCGGGGGAGCGT





520
CCCCGCCCACAGCGCGGAGTT





521
AGCAGGGGCCCGGGGGCGAT





522
CCATGACCGCGGTGGCTTGTGGG





523
GGCAGGTGCTCAGCGGGCAGACG





524
GGGTGCGCCCTGCGCTGGCT





525
GAATTTGGTCCTCCTGCGCCTGCCA





526
TGGCTTCCGCGGCGCCAATC





527
GGCCAGGAGAGGGGCCGAGCCT





528
cgagcgccggccccccttct





529
CGGTTGCGAGGGCACCCTTTGGC





530
tacccggacgcggtggcg





531
GCGCCGCCGAGCCTCAGCCA





532
tgcagcctcaacctcctgggg





533
CCTTGCCGACCCAGCCTCGATCCC





534
GGCGGCGTTCGGTGGTGTCCC





535
CCCGGACTCCCCCGCGCAGA





536
cggccccctgcaagttccgc





537
TGCCCAGGGGAGCCCTCCA





538
GCCGGCTGCAGGCCCTCACTGGT





539
TGTCACACCTGCCGATGAAACTCCTGCG





540
CCCCTGCGCACCCCTACCAGGCA





541
TCCTGGGGGAGCGCGGTGGG





542
AGTGGGGCCGGGCGAGTGCG





543
GCGTCCAGGCTGTGCGctcccc





544
GGCGCGGCGGTGCAGCCTCT





545
gaggcggcggcggtggcagt





546
CGCGCGACCCGCCGATTGTG





547
CCGCGGACGCCGCTCTGCAC





548
tgaacccgggaggcggaggttgc





549
TCTCGGCGGCGCGGGGAGTC





550
aggcggccacgggaggggga





551
GGACCCGAGCGGGGCGGAGA





552
AAGCACCTggggcggggcggag





553
GCCGCTCGGGGGACGTGGGA





554
CACCGCCAGCGTGCCAGCCC





555
TATTCTTggccgggtgcggt





556
CCGCTTCCCGCGAGCGAGCC





557
CAGCCGGCGCTCCGCACCTG





558
GCGGAGCGCGCTTGGCCTCA





559
ggcctcgagcccacccagacttggc





560
TGCCGCGCCGTAAGGGCCACC





561
ACGGCGGTGGCGGTGGGTCG





562
AACCTGCCCAGTTACTGCCCCACTCCG





563
TCCAGCGCCCGAGCCGTCCA





564
GCTGCTGCTGCCCGCGTCCG





565
CACTGCTTAGGCCACACGATCCCCCAA





566
GGCCGGACGCGCCTCCCAAG





567
TCGGCCAGGGTGCCGAGGGC





568
tccgcccgcccCACAGCCAG





569
CGCGCCCCAGCCCACCCACT





570
ccgtgctgggcgcaggggaa





571
TGCGCACGCGCACAGCCTCC





572
CGGTGAGTGCGGCCCGGGGA





573
TGGCCGAGAGGGAGCCCCACACC





574
CCCAGCGCCGCAACGCCCAG





575
GCCACAAGCGGGCGGGACGG





576
TCCTCTGGACAACGGGGAGCGGGAA





577
CGCGGGTTCCCGGCGTCTCC





578
GCGCCGCCCGTCCTGCTTGC





579
ACGCGCGGCCCTCCTGCACC





580
GGGCGGGGCAAGCCCTCACCTG





581
GGGAGCGCCCCCTGGCGGTT





582
GCGAATGGTTCGCGCCGGCCT





583
TTTCCGCCGGCTGGGCCCTC





584
TCTCCGGGTcccccgcgtgc





585
GCAGCCCGGGTAGGGTTCACCGAAA





586
GGGCGGAGAGAGGTCCTGCCCAGC





587
CCCTCACCCCAGCCGCGACCCTT





588
GCGATGACGGGATCCGAGAGAAAGGCA





589
TCCGCAGGCCGCGGGAAAGG





590
GGCCCCAGTCCACCTCTGGGAGCG





591
GCTTGGCCGCCCCCGGGATG





592
CCCTCCATGCGCAATCCCAAGGGC





593
gcggcgactgcgctgcccct





594
TGGGCTTGCCTCCCCGCCCCT





595
GGCGGCCCAAGGAGGGCGAA





596
gctgcgcggcTGGCGATCCA





597
TCACCGCCTCCGGACCCCTCCC





598
CCCTTCCAGCCACCCCGCCCTG





599
GCGGGACACCGGGAGGACAGCG





600
CCCTGGGTTCCCGGCTTCTCAGCCA





601
TGGCGGTGATGGGCggaggagg





602
CCAGCCCGCCCGGAGCCCAT





603
TGCCCGCGGGGGAATCGCAG





604
TGCCGCGAGCCCGTCTGCTCC





605
TGCGGCCCCCTCCCGGCTGA





606
GCAGCAGGGCGCGGCTTCCC





607
GCCGCAGCACGCTCGGACGG





608
TGCGGAGTGCGGGTCGGGAAGC





609
GGCGCGGGGGCAGGTGAGCA





610
ggcgcgggggcaggtgagcat





611
CAGTGACGGGCGGTGGGCCTG





612
CGGCGACCCTTTGGCCGCTGG





613
CCGCGGCAGCCCGGGTGAA





614
GGGCGAGCGAGCGGGACCGA





615
TGGGGCAGTGCCGGTGTGCTG





616
TCGCTGGCATTCGGGCCCCCT





617
GGAGCCGTGATGGAGCCGGGAGG





618
TGCCAGGGTGTCTTGGCTCTGGCCT





619
CCGGCTCCGGCGGGGAAGGA





620
GGCCAGGGTGCCGTCGCGCTT





621
TCGGCTCGGTCCTGAGGAGAAGGACTCA





622
GCGCGGGGAACCTGCGGCTG





623
GCCGCCGCTGCTTTGGGTGGG





624
CACCTGAGCCCGCGGGGGAAcc





625
GAACGCCGGCCTCACCGGCA





626
CCCGTGGTCCCAGCGCTCCTGCT





627
GTGCGACCCGGCGCCCAAGC





628
TGGCTCTGCGCTGCCTTTGGTGGC





629
cgcgcgggcggcTCCTTTGT





630
TGGCCCGTTGGCGAGGTTAGAGCG





631
gacccggcatccgggcaggc





632
GCCCGGACTGTAATCACGTCCACTGGGA





633
CCGCCGCCAACGCGCAGGTC





634
CGCTGCCAGCTGCCGCTCCG





635
AGCGCCCACCTGCGCCTCGC





636
GCGGGCCAGGGCGGCATGAA





637
GGCTGCGACCTGGGGTCCGACG





638
GGTTAGGAGGGCGGGGCGCGTG





639
CAGCGCACCAACGCAGGCGAGG





640
TCGGCTGGCCCCGCCCACTC





641
CGGGGTTGCCGTCGCAGCCA





642
TCCGCACTCCCGCCCGGTTCC





643
ggaccccctgggcagcaccctg





644
cgaggcagccggatcacg





645
GGCGCGTGCGGGCGTTGTCC





646
CCAGGATGCGGCAGCGCCCAC





647
cgATGCGGCCCGCGGAGGAG





648
CGTTCTGCGCGCGCCCGACTC





649
CCCCGCCGTGGGCGTAGTAAccg





650
AACCCGCCCGGGCAGCTCCA





651
GCAGCGGTCGCGCCTCGTCG





652
CGCAATCGCGCTGTCTCTGAAAGGGG





653
GGAGCGCCCGCCGTTGATGCC





654
CCATGGCCCGCTGCGCCCTC





655
TGGGGGCGGGGTGCAGGGGT





656
CCGACCCTGCGCCCGGCAGT





657
CGGCTTCAAGTCCACGGCCCTGTGATG





658
ACCCCACCTGCCCGCGCTGC





659
ggcgcgcggagacgcagcag





660
CGTGAGCCGGCGCTCCTGATGC





661
CTGCCGCGGGGGTGCCAAGG





662
CCTGCTGCGCGCGCTGGCTC





663
CCTGGCGGCCCAGGTCGCTCCT





664
GAGCGCCCCGGCCGCCTGAT





665
CGCCGCACGGGACAGCCAGG





666
GCCCGGACATGCCCCGCCAC





667
cgggggccgccgcctgactt





668
CCAGTGGCGGCCCTCGGCCT





669
CGCCCGGCGCGGATAACGGTC





670
TGCTCCGGGTGGGGAGGGAGGC





671
TGCCTGGGCGCAGAACGGGGTC





672
GGGTCCTAATCCCCAGGCTGCGCTGA





673
TCCGCGTCCCCGGCTGCTCC





674
GGGCAGGGCTGACGTTGGGAGCG





675
GCCGTGGGCGCAGGGGCTGT





676
cctgcgcacgcgggaagggc





677
CGCGGACGCAGCCGAGCTCAA





678
CGACCCATGGCGGGGCAGGC





679
tccgctccccgcccctggct





680
tgtgccgcgcggttgggagg





681
TCACTCACGCTCTCAGCCCGGGGA





682
CGGCAAGCGGGCTTCGGGAAGAA





683
CCCCGCGGGCCGGGTGAGAA





684
CGGCGGCGGCTGGAGAGCGA





685
CGGGCCCCGGGACTCGGCTT





686
GACGGAATGTGGGGTGCGGGCCT





687
TGCGGCTGCTGCCGAGGCTCC





688
ACCGCTGCGCGAGGGAgggg





689
GGGGGTGCGGCGTCTGGTCAGC





690
GGCCGGGGGAAATGCGGCCT





691
tgcctggtaggactgacggctgcctttg





692
AGCGCGGGCGCCTCGATCTCC





693
TCCCGGCTGGTCGGCGCTCCT





694
CCGGGGCTGGGACGGCGCTT





695
GGGCGGGGTGGGGCTGGAGC





696
GTGCGGTTGGGCGGGGCCCT





697
GGCGGTGCCTCCGGGGCTCA





698
GGCGGTGCCTCCGGGGCTCA





699
CGGGAGCCCGCCCCCGAGAG





700
TCCTGCCATCCGCGCCTTTGCA





701
AGGCACAGGGGCAGCTCCGGCAC





702
CGACCCCTCCGACCGTGCTTCCG





703
CCCGCAGGGTGGCTGCGTCC





704
GCGTCTGCCGGCCCCTCCCC





705
TAGGCCGCCGGGCAGCCACC





706
GGGGAGCGGGGACGCGAGCA





707
GCCGGCTGGCTCCCCACTCTGC





708
TCGCTCACGGCGTCCCCTTGCC





709
TCCCCGCTGCCCTGGCGCTC





710
GGCCAGAGGCAGGCCCGCAGC





711
TGCCCGGGTCATCGGACGGGAG





712
CCCAGTGCGCACGGCGAGGC





713
AGCGTCCCAGCCCGCGCACC





714
TGCTCCCCCGGGTCGGAGCC





715
CGCTCGCATTGGGGCGCGTC





716
TGCGGCAAGCCCGCCATGATG





717
TCTTGAGCCTCAGGAGTGAAAAGGCCCCTTG





718
GGACCATGAGTGTTTCCATGCTTGGCATCAGA





719
tcagccactgcttcgcaggctgacg





720
cggccagctgcgcggcgact





721
TCGGAGAAGCGCGAGGGGTCCA





722
GCCGGGTGGGGGCTGCCTTG





723
tcctcgcccggcgcgattgg





724
GGCCGTGCAGTTGGTCCCCTGGC





725
GCGAGCCTGCTGCTCCTCTGGCACC





726
gccagagctgtgcaggctcggcattt





727
tgcccagcaaatgccttcctctttccg





728
TGGCCTGACCACCAATGCAGGGGA





729
TCCACCTGGGCTTCTGGGCAGGGA





730
agctggcctgcgccccgctg





731
AGCCGCGGCAGCGCCAGTCC





732
GGGGCCGGGCCGCTCAGTCTCT





733
GCAGTGAGCGTCAGGAGCACGTCCAGG





734
cccgATCCCCCGGCGCGAAT





735
GGCGTGACCGTGGCGCGGAA





736
AGCGGCCCGCAGAGCTCCACCC





737
GGCAGGCGGGCGCAGGGAAG





738
tctgccccgggttcacgccat





739
CGGGCGGGCCCTGGCGAGTA





740
GCAAGCCCGCCACCCCAGGGAC





741
GGCCCAGGCGGATGGGGTTGG





742
TCCGAGAGGCGTGTGGTAGCGGGAGA





743
AGGCGGCCGCGGGCGTTAGC





744
aaggcagcgcgggccaccga





745
ggcatcctgcccgccgcctg





746
TGGGGCGGGGTCTCGCCGTC





747
TCGGGCTCGCGCACCTCCCC





748
CCAGGTGCGCGCTTCGCTCCC





749
ACCTGCGCCACCGCCCCACC





750
GCCGAGCAGAGGGGGCACCTGG





751
TCGCGCCGCTCTGCGTTGGG





752
CCGCCGGGGCAGAAGGCGAG





753
TCcactggacaggggtgggagcctctg





754
gcccaccggcgctgcgctct





755
GCGGTGCCAGCCCCGCTGTG





756
GACCCGCCTGCGTCCTCCAGGG





757
CCCATCACAGCCGCCCAACCAGC





758
GAGCggggcggagccgagga





759
TGCAATTGTGCAGTGGCTGCGTTTGTTTC





760
CCCGACCGGATGCTCCTTGACTTTGCC





761
GCGAGCGCGCGCACCGATTG





762
CACTCCGCCGGCCGCTCCTCA





763
TCGGGGGTCCCGGCCGAATG





764
GCTCTCCCAGCTGCACGCCAACTTCTTG





765
GGAGGAGCCTGGCGCTGGCGAGT





766
TGGCTCTGGACCGCAGCCGGGTA





767
ACGGCGGCGTCCCGGGTCAA





768
TGGCCAAGCGCTGCCACTCGGA





769
CGCAGGCCGCTGCGGTGGAG





770
GCGCCTGCGCCATGTCCACCA





771
TGGTGCCTCCCGCAACCCTTGGC





772
GCCCGGCTCCAGGCGGGGAA





773
GCAATGCTGGCTGACCTGGACC





774
CGCCCGCCCGTCGGGATGAG





775
TGCCCCCACCATCCCCCACCA





776
GGCGCGAGCGGCGGGAACTG





777
GGCGCCGCTCGCGCATGGT





778
CCCGCTCTGCCCCGTCGCAC





779
GTAGCGCGGGCGAGCgggga





780
AGCGCCGAGCAGGGCGCGAA





781
ggcggcggccacgcaggttc





782
ccctcccgcacgctgggttgc





783
TCACGGCCGCATCCGCCACA





784
CGGCGCCGGCCGCTCTTCTG





785
CCGGCAGAGAATgggagcgggagg





786
TCGGCCGGGGCGCCAGGTCT





787
TGGGGCTGCGGGCGATGCCT





788
GGCTGCGGGGACCGGGGTGT





789
CGGCCCAAGCCGCGCCTCAC





790
ccgcgcccggAACCGCTGCT





791
TCGGCCGGGAGCGTGGGAGC





792
TGCAGACATTGGCGCGTTCCTCCA





793
GGACCCACGCGCCGAGCCCAT





794
GGAGGGGGCGAGTGAGGGATTAGGTCCG





795
TCCCCTCACGCCGATGCCACG





796
CCATGCCCGCCCCAGCTCCTCA





797
CCGCCGTGATGTTCTGTTCGCCACC





798
CGTGGCTGCCCCTGCACTCGTCG





799
tctggccagtccgtgaaggcctctga





800
CCGGGGTGCAAGGGCCACGC





801
cgccgcgcTTCCTCCCGACG





802
AGCGACCCGGGGCGTGAGGC





803
TGCGGAAACCTATCACCGCTTCCTTTCCA





804
ggcagggcggggcagggttg





805
GGGTCTCCAGACTGATGGGCCGGTGA





806
CGCTGAAGCCGCTGCTGTCGCTGA





807
TCCCACGCTCCCGCCGAGCC





808
AAATATgccggacgcggtgg





809
CGCCTTTCCGCGGCGGGAGC





810
CCCAGCCCAGGCCGCAGGCA





811
ccccgcaggggacctcataacccaa





812
GAGTTGGCTCGGCGTCCCTGGCA





813
tccctccgcctggtgggtcccc





814
TGACCCCTGGCACATCAGGAAAGGGC





815
TGCCCCGCAAGAACGGCCCAG





816
GGCCTCGGAGTGCGACGCGAGC





817
GCGCCAACCCAGACCCGCGCTT





818
TGCAAGCGCGGAGGCTGCGA





819
AGCCGGGCCACGGGCAGACA





820
CCCGGGCGGCCACAAAGGGC





821
CCCCATCCCAGGTGACCGCCCTG





822
TGACTCTGGGGGAAGCACGCGACG





823
GGTGCGGCCGAAGCCGTCGC





824
TGCCCCTCGGGCCCTCGCTG





825
GGCCACGGGGACCGGGGACA





826
GGGCGCCGCAGGGCGACAAC





827
GCAGCGCGCTTTGGGAAGGAAGGC





828
GGGTTCCACCCGCGCCCACG





829
TCGCGGCCCAGACCCCCGAC





830
CGAGACCCGGTGCGCCTGGGAG





831
aggtgcccgccaccatgc





832
cgcccaggctggagtgcagtggc





833
GCCGGCGAGGTCTCCGCGGTCT





834
CGCAGGGCCACCGGCTCGGA





835
GCCCCGGAGCATGCGCGAGA





836
CCCCTGGGGACCCCTGCCATCCTT





837
TTAccccgcgccgcgccacc





838
GCGGGCCGAGCCCACCAACC





839
GCGCGGTGGCCGCTTGGAGG





840
CCCGCCAGCGGCctgtgcct





841
CGCGCATGCCAAGCCCGCTG





842
GGCGCAGGAGCAGTTGGGGTCCA





843
TGGGGTAGGCGGAACGCCAAGGG





844
CCCGCTTCACGCCCCCACCG





845
GCAGCCCGGGTGGGCAAGGC





846
TGCAGTTGCCCTTGCCCTGCGAC





847
TGGCCGGGCGCCTCCATCGT





848
GCCTGCGATGGGCTCGGTGGG





849
CCGCGGTTCGCATGGCGCTC





850
TGGGCCATCTCGAGCCGCTGCC





851
TGGGGGAGTGCGGGTCGGAGC





852
CTGCCGCGCCCCCAGCACCT





853
GGCTGCTGGCGGGGCCGTCT





854
GGGCGCGGCGACTTGGGGGT





855
aaactgcgactgcgcggcgtgag





856
TGCTGGGGCCGTGGGGGTGC





857
TCCGCGCTGCCCGGGTCCTT





858
GTGGCGGCCCCCGCGGATCT





859
GGGGAGGCGCCACCGCCGTT





860
GGAGCGGGAGGGCGCTGGGA





861
tgaaggctgtcagtcgtggaagtgagaagtgc





862
ggagaaaatccaattgaaggctgtcagtcgtgg





863
ggggacaaccggggcggatccc





864
CCCGGGAGGAGAGGCGAACAGCG





865
AGTGCGCGGGTGCCGGGTGG





866
TGGCATCCCCTACCCGGGCCCTA





867
GAGGCTGGTTCCTTGTCGTCGGTTGGG





868
GCGGGGTCAGGCCGGGGTCA





869
GGCAGCGGCTGGAGCGGTGTCA





870
GCCCGGGCACACGCCCCATC





871
gcaccgccacgcccactgcc





872
TGTCATGCTTCTTTCTCCCCACTGACTCA





873
gcccaggctggggtgcaatggc





874
CGCCTCGGGGGCCACGGCAT





875
CGTGGGTCCTGGCCCGGGGA





876
TCCCCGGGCGGCCATTAGGCA





877
GGCGGGGGTGGGAGTGATCCC





878
CGTCAGTCCCGGCTGCGAGTCCA





879
CCGGGGTCCGCGCCATGCTG





880
CATGGCGGGGCCCGAGCGAC





881
CCGCCTCCTTGCCCCGACACCC





882
TCGGACACGCCTTCGCCTCAGCC





883
CGAGCTGGGCGCAGGCGCAA





884
GCGGGGTTGTGTGTGGCGGAGG





885
accgcgcccggccTGCAAAG





886
GCGGGGCCAGAGAGGCCGGAA





887
GCCCCAAGGGAAGATGCAGGGAGGAA





888
gccccaagggaagatgcagggaggaa





889
GCCCGCACGTGCACCACCCA





890
GGGTGACGAAGTGGTGTCTTTACCGAgga





891
CCGCCGTGCGCCTGTGGGAA





892
ggctgctgcgggaggatcac





893
TGGGCATCCAGAAAAATGGTGGTGATGGC





894
gccgcgccgggccCTATGAG





895
CCGCCATGCGGGCAGGGACC





896
TGTTACAggctggacacggtggctc





897
cggaacttgcagggggccga





898
TGCAAAATCCTCCCCTTCCCGCACCC





899
GCGCTGGAGCCACGCGACGA





900
GGGGTCCGCTCCCGCGTTCG





901
CGCCCCGGGCTGAGAGCTGGGT





902
GGCCCTTCGGGGGCCGGGTT





903
TGGCCACAAAGGGGCCGGAATGG





904
ACCCCAGCGCGTGGGCGGAG





905
GGGCTGCGGGGCGCCTTGAC





906
GCACCGCGGCTGGAGCGGAC





907
AGGCGATCCCAAGGCTGTTGGAGGC





908
tccacccgccttggcctccca





909
cggcgggaaggcggggcaag





910
ggagccgcggcgtgagtgcg





911
GGCCGGCACCCCACGCCAAG





912
GCGGGGCGGAGCGCACACCT





913
GCGGCCAGCAGCGCGTCCTC





914
CCGACAGCCGGCAAGGCCCAA





915
ttgtttttgtttgtttgttttgaaagggag





916
CCCCGGTTTCCCCGCGCCTC





917
GGCTGGACGCGCCCTCCGACA





918
TCCCACGCGCCCGCCCCTAC





919
cggccacgccttccgcggtg





920
GGCTCCGCTGGGGCGCAGGT





921
GCCGCCCCGTGTCGTGCGTC





922
GGCGTCAGTTGGAGTGTGGGGTCGG





923
CCGAGCGGGGTGGGCCGGAT





924
CATCGCGCGGGACCCAACCCA





925
CAGTGGGTGGATCTCACCTGCCTTCGG





926
GAGGCCGCGGGGCTCCGACA





927
GAGCCTGCCCTATAAAATCCGGGGCTCG





928
TCCCGGCGGGTGGTGCCTGA





929
TCTGAGCGCCCGCCGCCTGC





930
GGCTGCCGGCGCGGGACCTA





931
TCCGGGGCATTCCCTCCGCGAT





932
TGGCGGCGGCCCCTGCTCGT





933
cggcgCGCGACTGGGAGGGA





934
GGCGCCAGCGCAACCAGAGCG





935
CGAAGGTGGCGCGGCCTGGA





936
CCCAGCGGGCTTCGCGGGAG





937
CCCGCTTGCCCCGCCCCCTA





938
CCCACACCTCCACCTGCTGGTGCCT





939
ATGCAGCCCCGCCGGCAACG





940
CCGGATGCCCGGTGTGCCTGG





941
GCGAGCAGGGACGCAGCTCTGGTG





942
CGCGCTCGGCCCGCTCAGTG





943
TGGTGCCGGCAGGGAGGGGA





944
GGGCGGTGGCGATGGCTGGC





945
GGCTGTTGGTCTTTTTCCCAGCCCCGAA





946
CCGGGCCGGCAGCGCAGATGT





947
CGGAGGGCGATGGGGCCCTG





948
GGGGCCGGGCTGCGAAGCTG





949
TGCCTGGGCACCCCACGGACG





950
GCCCTACGTCCGGGCAGCACGC





951
CTGTGCGCGTCCCCGCCGTG





952
TGCAGCGGCGCCTCGGACCC





953
ccgctgggcgcgctgggaag





954
GGCGCATGCTCTGCGCGTATTGGC





955
GGGTGGGCGGGCCGTTCTGAGG





956
GGGCTGCCGGGTTGGCGCAG





957
GGCGCGTGCGGAAAAGCTGCG





958
TCCAGGCCGCCCTCGGGTCA





959
GGGGAGGGGGCGCAGCCAGA





960
GGCAGCGTGGTCTTCCACTTCCCCCT





961
GGGATCGAGGGATCGAGGCAGGGGA





962
CGGCCATGAGCGCCTCCACGC





963
CCCGGTGTGCGGCAGCGACG





964
TTGGGGCGGCCGGAAGCCAG





965
CGCAGCGGCGGCGTCTCGGT





966
CCGCGACCTCCCCAAGCCACCC





967
GGCGGCCGACCGCGAACACC





968
CCCCATTTCCGAGTCCGGCAGCA





969
CCCAGCCTGGCCTCTCCTCTCAGGCA





970
cggctctttcctcctcaagagatgcggtg





971
CGCCGCCGTCCCTGGTGCAG





972
TGGGGACCCCTCGCCGCCTG





973
GCGCCCAGCCCGCCCCAAGA





974
caggggacgcgggcgtgcag





975
CCGGGCGGGGCCCAACTGCT





976
CCCGAGCAGGGCCGGAGCAGA





977
CCCCTCCACATTCCCGCGGTCCT





978
TCCTTTGTGGCCTGGGCAGGATGCAG





979
GCAGCGCGCGGTTTGGGGCT





980
GAGGCCTGCGGGCGCTGCTG





981
TCACGGTTGCTGGGCCGTCGC





982
CGGGGTGGGCCTCGCGGAGA





983
GCCTGCGCTCCTGGCGCCCT





984
CGCCTTCGGAGAGCAGAGTCAACACGGA





985
TGCCCCTAAATGAGAAAGGGCCCTTGAG





986
GCCACGCCCCGGGACCGGAA





987
TCCCGCCCAGGGGCCTCCCA





988
ccccgcgcccggccAAAGAA





989
GGACCGCCGCACAGCCCCAA





990
GGGCAGCGGTGGCCGTGCAT





991
TTCCTGCGCCGCCCCCTCCC





992
GGCGTCTCCCTGTCCCCGCCTG





993
GCCGGCCTCGCGCACCGTGT





994
CCCGGGACGTGCGCGCTTGG





995
TGTCCCCCGAGCCGCCCTGC





996
TCGCTCTCGTGCAGCGGCGTCA





997
CCCGCGCGCTGCAGCATCTCC





998
CCCCAGCTGCCGCCATCGCA





999
GCCCGGGCCCGCCTCAAGGA





1000
TGCCGGCGAGGCCTTTTCTCGG





1001
GGCGGGTGGGGAGCGCGAAC





1002
CCCGCCGCCGCTGGTCACCT





1003
ccggctgcctcggcctccca





1004
ggtgtgcaccaccacgcc





1005
GGCGCGTCCCGGCGGCTTCT





1006
AGTCCCTGCGCCCCGCCCTG





1007
TGCCCCCAAACTTTCCGCCTGCAC





1008
CTTGCGGCCACCCGGCGAGC





1009
TCGCGCGGAAACTCTGGCTCGG





1010
GCTGCGGCCCAGAGGGGGTGA





1011
CGGCGGGCTTGGGTCCCGTG





1012
TCCCCCGCCGCACCAGCACC





1013
GCGCGGTGCGGGGACCTGCT





1014
GCCGGACGCTCGCCCCGCAT





1015
GAGTGCTCTGCAGCCCCGACATGGG





1016
CCGCGCAGACGTCGGAGCCCAA





1017
TGGCCGAGGCGCGTGGCGAG





1018
GGCCGCGCTGCCCCAGGGAT





1019
CCGGGGGCGGACGCAGAGGA





1020
GGGGGCGGAGCCTGGGAATGGG





1021
GGGCGGGCCCTGTGGGTGGA





1022
CCGCTCCCCCATCTCCACGGACG





1023
GACCCAGGGAGGCGCGGGGA





1024
TGCCCGGCCGCAGGTGACCA





1025
GCGCCGGGAGTGGGCAGGGA





1026
ACCCAGGCCGGCGCGGGAAG





1027
ttcccgccgcccggtcctca





1028
CGCGCCGGTGACGGACGTGG





1029
AACCCTCCCAGCCAAAACGGGCTCA





1030
CGGGCGAGGCCGCCCTTTGG





1031
GGCCGCGGACGCCCAGGAAA





1032
CCGTTTGGAACGTGGCCCAAGAGGC





1033
CCCGCCTCCGCTCCCCGCTT





1034
ggtggcggcggcagaggagga





1035
CGCGGGGAGCAGAGGCGGTG





1036
gggcgcccgcgctgagggt





1037
GGGCCTGGCCTCCCGGCGAT





1038
CACCCGGCGTCCGCACCAGC





1039
CGGCGCTGGTTTGGCGGCCT





1040
ccaggagccccggaggccacg





1041
GCGATCTCCTGCCCAGGTGTGTGCTC





1042
ACTGCCCGGGCTCGCCGCAC





1043
TGCGGCAACGGTGGCACCCC





1044
GGAGCGAAGCTGGCGGAACCCACC





1045
GGCGGCCGACGGGGCTTTGC





1046
GGCCGCGGGTGCCTCGGTCT





1047
GCGCTCCAGCCATGGCGCGTT





1048
GCCGGACGGGCGTGGGGAGA





1049
TCCCCCGCGACTGCCCCTCC





1050
GGGTGGCAGCGGGTGCGGAA





1051
gctcgcccgctcgcagccaa





1052
CGAGGTTCCGCAGCCCGAGCCA





1053
GCGCGGGGGACCGAAACCGTG





1054
GCCGAGCCCGGCCCAAAGCC





1055
TGCCAACGTTCACCCGGCTGGC





1056
GACAGTGCGAGGGAAAACCACCTTCCCC





1057
GGGTCGGGCCGGGCTGGAGC





1058
GGGTCGGGCCGGGCTGGAGC





1059
GCGGGGCCGAGGGGCTGAGC





1060
GCCCGGCCACCTCGGGGAGC





1061
ACTGTCTGCCAAGCCAGCCCCAGGG





1062
GGATGGTGGCGCCGGGCTGC





1063
TCCAGGAGGGCCAGGTCACAGCTGC





1064
CGGCTGGCTCGCTTGGCTGGC





1065
TCCGGCGCTGTTGGGCAGCC





1066
CCTGCGCACGCGGGAAGGGC





1067
TCTTCCCTTCTTTCCCACGCTGCTCCG





1068
CAGCGCCCCCGCCTCCAGCA





1069
GCTGCGCGGCTGGCGATCCA





1070
GCCGACGACCGGAGGGCCCACT





1071
TGCCCAGGCTGGCCCCTCGG





1072
CGCGGCCCTCCCCAGCCCTC





1073
CCCCGCCCGGCAACTGAGCG





1074
AAGAGCCCGCGCGCCGAGCC





1075
TGCCCACTGCGGTTACCCCGCAT





1076
GCATGGTGGTGGACATGTGCGGTCA





1077
CATAGAAGAGGAAGGCAAAGGCTGTGACAGGCA





1078
TCATCCTAGACTTGCAGTCAAGATGCCTGCCC





1079
agccagcggtgccggtgccc





1080
gccccgctccgccccagtgc





1081
CACGGGGGCGGGGAGACGCGGGGTGCACTTCTCGCCCCGAGGGCCTCCGGCGAAGCAACCCGGCAGC-



CGCGGCGCCCGAGGGCCTGGCGCTGGTCTGGGGCTGCGCCGGGGGCGCCTGGCTCTGGGGTGCGGCCGGTCAG-



GAATCCCCATCCTGGAGCGCAGGCGGAGAGCCAGTGGCTGGGGGCGGGAAGGCTTCTTGGACCCCTCGCGCTTC



TCCGA





1082
CACAGGGTGGGGCAGGGAGCATCAGGGGGCAGGCAGCCACACCCCCGACACATCAAGACACCTGAGT-



GGCAGGTTCAAGCCGGAGGCGCTGTATTTCCACACAGGAAGAAGGCCAAAAAAGGTGACACTGC-



CCCCTCCCAGTGGCTCCATGCTCCTCAGCTATGGCTGTCCGGGCCGCCTCACTCAAAGCCTTGCCCTCCGCTGC



TGCCAGGCTCCTTGCATGCAAGGCAGCCCCCACCCGGC





1083
accgggccttccgcgcccctcgccccacgccgcgggtgcggtcctccctccagcagagggttccgg-



gcgccggcgcggcccgcacggggccgggagcccttcctgccggccgggtgcgcgcggcgccgccgacagct-



gtttgccatcggcgccgctcccgcccgcgtcccggtgcgcgccccgcccccgccaacaaccgccgctctgattg



gcccggcgcttgtctcttctctccccgcagccaatcgcgccggg





1084
CCCACCTCCCCCAACATTCCAGTTCCTTCTTTTCCTTCTACTCTTCAGCGGCCTCAGCCTGCGCAC-



CCCAGGAGCGTGGATGACTACGGCCACCCCGGGCGCGCACCCCTTTCCCACCACCCCAGCATCTCTGCAGC-



CCAGGACACCCGCCTCCCCCACACCCCGCATCCGGTGTGTCTCCGCCTGGCCCGGCCGGCGCGGCAGGCGGGCC



AGGGGACCAACTGCACGGCC





1085
CACAGAGCCAGGCAAGCATGGGTGAGAGCTCAGACCATCCTTGTTGGACTAAAAGGAAGGG-



GCAGACTGCCCATGGGGGGCAGCCGAGAGGGTCAGGCCCCCATAGGTCCTCAGCCTGCTTCAACCTCAAAGGG-



GATGGGGGGCTGAGTGGTGCCAGAGGAGCAGCAGGCTCGC





1086
ggagcagcaggctcgctcggggagagtagggccttaggatagaagggaaatgaactaaacaac-



cagcttcctcccaaaccagtttcaggccagggctgggaatttcacaaaaaagcagaaggcgctctgtgaa-



catttcctgccccgccccagcccccttcctggcagcattaccacactgctcacctgtgaagcaatcttccggag



acagggccaaagggccaagtgccccagtcaggagctgcctataaatgc





1087
gcccaaagtgcggggccaacccagacagtcccacttaccaggtcttctgaaagacagctgacaa-



gagacatgcagggctgagaggcagctcctttttatagcggttaggcttggccagctgcccacagcttcaggc-



catcagagacagcttctccctgccagagttgctacagtctctggtttctcaaccaggtgaatgtggcaatcact



gtgcagaatgaaaattttgggtggggaggtaggagaagcggaaag





1088
GGAAAGAGGAAGGCATTTGCTGGGCAATAGTGCCCAGAAGGAAAAAGCAGGTAGGGGG-



GCTCTTTTTCTGGGCTGCTGGCATCCACTTGCTTGATCCAGCCAGATTCCCACTCCCATGCCCTCTCCACTAT-



TGCGATTGCTAATCCCCTGCATTGGTGGTCAGGCCA





1089
CAGCGGCCCCGCGGGATTTTGCCCAGCTGCTTCGTGCCCTCTGGTGGCTAAGGCGTGTCATTGCAGT-



GCCGGCCTCCTGTCATCCTCCCTTTCTTGTCCGCCAGACCCTCTGGCGCCCTGCTTACGACTCAAACAG-



GAGACAGTGCTGATTCATTTCCAAGCGGCCTTCCTACACCCACACCTGCTTCACATAGATGAGGTTTCCCGGAC



AGTCCCTGCCCAGAAGCCCAGGTGGA





1090
ccgacagcgcccggcccagatccccacgcctgccaggagcaagccgagagccagccggccg-



gcgcactccgactccgagcagtctctgtccttcgacccgagccccgcgccctttccgggacccctgc-



cccgcgggcagcgctgccaacctgccggccatggagaccccgtcccagcggcgcgccacccgcagcggggcgca



ggccagct





1091
GGGCCAATCCCCGCGGCTGGGCAGAGCGACCCGAGGGCGGCGCCCTGCAGACCACGTGGCCCGGGAG-



GCGCCGAGGCCAGGTAGGTGGTGAGTTACTTGGCTCGGAGCGGGCGAGGGGACGCGTGGGCGGAGCGGGGCTG-



GCCAGCCTCGGCCCCCATGACCCGCTGTCCTGTGCCCTTTCCCAGCGATGGGCGTGCAGCCCCCCAACTTCTCC



TGGGTGCTTCCGGGCCGGCTGGCGGGACTGGCGCTGCCGCG





1092
GGCGGCTGCGGGGAGCGATTTTCCAGCCCGGTTTGTGCTCTGTGTGTTTGTCTGCCTCTGGAGGGCT-



GGGTCCTCCTTATTCACAGGTGAGTCACACCCTGAAACACAGGCTCTCTTCCTGTCAGGACTGAGTCAG-



GTAGAAGAGTCGATAAAACCACCTGATCAAGGAAAAGGAAGGCACAGCGGAGCGCAGAGTGAGAACCACCAACC



GAGGCGCCGGGCAGCGACCCCTGCAGCGGAGACAGAGACTGAGCG





1093
GCCAGGACCGCGCACAGCAGCAGGGCGCGGGCGAGCATCGCAGCGGCGGGCAGGGCGCGGCGCGGGG-



GTAGGCTTTGCTGTCTGAGGGCGTCTGGCTGTGGAGCTGAAGGAGGCGCTGCTGAGGAGTTCCTGGACGT-



GCTCCTGACGCTCACTGC





1094
CGGGCAAGAGAGCGCGggaggaggaggaggagaaaaaggaggaggaggaggaggaggaggCGGC-



CCCGCATCCCTAATGAGGGAATGAATGGAGAGGCCCCCTCGGCTggcgcccgcccacccggcggcggccgc-



cAAGTGCCTCTGGGCGCTGCGTGCCGCGCCCGCTGCTCCGCGCGCAGCCGGCTCGGGCCGCTCCTCCTGACTGA



GGCGCGGCGGCGGCGGTGGCTGTGACCGCGCGGACCGAGCCGAGAC





1095
GCGCGCAGCCAGGGGCGACGCTTCCGCTCCGAGCCGCGGCCCGGGGCCACGCGCTAAGGGC-



CCGAACTTGGCAGCTGACCGTCCCGGACAGGGAGGCCCTTCAGCCTCGACGCGGCCTGCGTCCTCCGGAGGGC-



CCTGCTCCGCCCGGGAAGCGTCCGCCTCCCGCCCGCCCGCCCGCAGATGTCGCTGCCCCTCTGGCTGTCCCGGC



CTGACCGCCGCGCGCCGCCCTGCTGCTCACCTACTTCCGCGCCACGG





1096
GTGCGCTCACCCAGCCGCAGGCGCCTGAGCGGCCAGAGCCGCCACCGAACACGCCGCACCGGCCAC-



CGCCGTTCCCTGATAGATTGCTGATGCCTGGCCGCGGGAACGCCCACGGAACCCGCGTCCAcggggcggggc-



cggcggcgcgcgcgccccctgccggccggggggcggAGTTTCCCGGGCGCCTGCCGGGTGGAGCTCTGCGGGCC



GCT





1097
GAGGGCCCGGGGTGGGGCTGCGCCCTGAGGGCCCTGCCCTGCCCTCCGCACGCCTCTGGCCACG-



GTCCCTTCCCCGGCTGTGGGTCTGCGGCCCCTGCGTGCGCAGCGCTCCTGGCCTCTGCGGCCAGCGCGGGG-



GCGGAGAGAGGAGAGTGCCCGGCAGGCGGCGGCTGGGCCGGCCCGGAACTGGGTCGTGGAAGGATCGCGGGGAG



CGGCCCTCAGGCCTTCGGCCTCACTGCGTCCCCACTTCCCTGCGCC





1098
TATGCgcccggcgcggtggctcacgcctgtaatcccagcactttgggaggccgaggcgggcggat-



cacgaggtcaggagatcgagaccatcctgactaacacggtgaaaccccgtctc-



tactaaaaatacaaaaattagccgggcgcggtggcgggtgcctgtagtcccagctacttgggaggctgaggcag



gagaatggcgtgaacccggggcaga





1099
CGCAGGGGAAGGCCGGGGAGGGAGGTGTGAAGCGGCGGCTGGTGCTTGGGTCTACGG-



GAATACGCATAACAGCGGCCGTCAGGGCGCCGGGCAGGCGGAGACGGCGCGGCTTcccccgggggcggccg-



gcgcgggcgccTCCTCGGCCGCCGCTGCCGCGAGAAGCGGGAAAGCAGAAgcggcggggcccgggcctcagggc



gcagggggcggcgcccggccACTACTCGCCAGGGCCCGCCCG





1100
CCTGAGGCGGGGCCGTCCGGCACCCTGTGATGGGGCGTGGCCCCTGGGGAGGCTCCCACCAGCCCT-



CAGATTCCTCAGGGCCGCAGAGGTGTGGAGCTGGTTGGGCCGGTTCTTCACCCTCCTCCCCTGGTGCTTGCCT-



GTGCCCCAGCAGGGTGACAGTGATGTAGTAGCGGGTCCTCCTGGAAGAGGGACGCGTGTGTAGGGTCTGGGCAG



GCTCTGGCAAGGCAGTCCCTGGGGTGGCGGGCTTGC





1101
GAGGCCGGGGACGCCGAGAGCCGGGTCTTCTACCTGAAGATGAAGGGTGACTACTACCGCTACCTG-



GCCGAGGTGGCCACCGGTGACGACAAGAAGCGCATCATTGACTCAGCCCGGTCAGCCTACCAGGAGGCCATG-



GACATCAGCAAGAAGGAGATGCCGCCCACCAACCCCATCCGCCTGGGCC





1102
CCGCCGGCTCCCCCGTATGAGGAGCTGCCATAGCTTTCGAATCCACCTGTTTTGAACAACAG-



GATTAGTGCCTGTGCCACGTCCCACGCCTCCGAGAAACCCGCAGGCTCCCGGAGGCTTCGC-



CCCTTCAAACACTGCCCGAGTCTCCCTAACCTTCCTCGCCGCCTTCCTGCGGGTGACCCCCAAACGCCCCAGCT



CCGCTCCCGCCCTTCCTCTCCCGCTACCACACGCCTCTCGGA





1103
CAGGAGCGACGCGCGCCAAAAGGCGGCGGGAAGGAGGCGGGGCAGAGCGCGCCCGGGACCCCGACT-



TGGACGCGGCCAGCTGGAGAGGCGGAGCGCCGGGAGGAGACCTTGGCCCCGCCGCGACTCGGTGGCCCGCGCT-



GCCTTCCCGCGCGCCGGGCTAAAAAGGCGCTAACGCCCGCGGCCGCCT





1104
cgggggaaacgcaggcgtcgggcacagagtcggcaccggcgtccccagctctgccgaagatcgcg-



gtcgggtctggcccgcgggaggggccctggcgccggacctgcttcggccctgcgtgggcggcctcgccgg-



gctctgcaggagcgacgcgcgccaaaaggcggcgggaaggaggcggggcagagcgcgcccgggaccccgacttg



gacgcggccagctggagaggcggagcgccgggaggagaccttggcc





1105
ccccccaccctggacccgcaggctcaggagtccacgcggggagaggggatg-



gagaactctcctcgcttcgtcctctctcccggggaatccctaaccccgcactgcgttacctgtcgctttggg-



gaggccgctgccgggatccggccccgaacagcccgggggggcaggggcgggggtcgtcgaggggatgggggcag



agagcaggcggcgggcaggatgcc





1106
GCCCGGCTTTCCGGCGCACTCCAGGGGGCGTGGCTCGGGTCCACCCGGGCTGCGAGCCG-



GCAGCACAGGCCAATAGGCAATTAGCGCGCGCCAGGCTGCCTTCCCCGCGCCGGACCCGGGACGTCTGAACG-



GAAGTTCGACCCATCGGCGACCCGACGGCGAGACCCCGCCCCA





1107
cgctgggccgccccTTGCTCTTAGCCAGAGGTAGCCCCTCACCCCGCGACTTACCCCACAC-



CCCGCTCTCCAGAACCCCCATATGGGCGCTCACCGCCCGCCCGCACAGCTCGAACAGGGCGGGGGGAGCGT-



TGGGGCCCGAGGCCGAGCTCTTCGCTGGCGCCGCCTCCCGGGACGTGGCCTCCATGGTCGTTGCCGCCGCTACC



TCACAGAACCAGCAACTCCGGGCGCGCCAGGCCTCGGGCGCCGCCATCT





1108
GCTTCTCCATAGCTCGCCACACACACACACACACGCCACGCACCGTATAAAAGCCTAAATGACACAC-



CACTGCAGCGTTCAAACGCTGGGAAGAAGACTCCCTTGTGGCACCGGAAACCCACGAGGTTGGAAGTGG-



GAGGGGAAGAGGGCCAGATACTTCACCTGAAAATCCGCCAGGATCATCTCCCGGTCCATGTTGGACGCCATGGC



GGCCGCCGAGTTCCGCGGCTCCGGGAGCGAAGCGCGCACCTGG





1109
CCGCGCACGCGCAAGTCCAGGCCGCCGCGGCCCTGGAATAGAGACTCGCCCTTGAT-



GTCCCTCTCGAAGTAGTAGGCGGCATCGCCGATATCCACGTCACCGGCGGCCTTCTGAGACGTGTTCTGC-



CGCAGCTCGATCTGGATGGTGGGCTGCTCGTAGTGCACGGCCGCCACGAACTTGGGGTGCAGCCGATAGCGCTC



GCGGAAGAGCCGCCTCAGCTCGGCGTCCAGGTCTGAGTGGTTGAAGGCGCCGGCG





1110
GTCTCAACTCACCGCCGCCACCGCCGCGCAGCCCCGCGGCCGCTGCTCCATAGCCCTCCGACGG-



GCGCCCAGGGGCTTCCCGGCTCCGTGCTCTCTGCCCGTCGTGGTTCCGCCTTCAgccccgcgcccgcagggc-



ccgccccgcgccgtcgagaagggcccgcctggcgggcggggggaggcggggccgcccgAGCCCAACCGAGTCCG



ACCAGGTGCCCCCTCTGCTCGGC





1111
ACAAATGCGCTGCTCGGAGAGACTGCCGCGGCAACCAACTGGACACCCCAAGAGCT-



CACTCCTCCGCGGTTTTATATTCCGACTTGCGCACAGGAGCGGGGTGCGGGGGCGCAGGGAGTGTGG-



GTAACAGGCATAGATTCCGCTTGCGCAATACGTGGTAAGAAACCAGCTGTGAGGGGCTGGCCCAACGCAGAGCG



GCGCGA





1112
GCGCCTGCGCAGTGCAGCTTAGTGCGTCGGCGCGCAGTTCTCCCGCCCGTTTCAGCG-



GCGCAGCTTCTGTAGTTGGGCTACTGGAGGGGTCGCTCAGAAACCTCATACTTCTCGGGTCAGGGAAG-



GTTTGGGAGGATGCTGAGGCCTGAGATCTCATCAACCTCGCCTTCTGCCCCGGCGG





1113
aagtcaagggctttcaacctcccctgccccattcatacagtggaaggtctaacccaggcttgt-



cagcctaagaacacgggatctcttcactgtggttcatgtgtagagtggagtttccatgctgaga-



gagacaagcaaagaagaccagaggctcccacccctgtccagtgGA





1114
tggatcccgcacaggggctgcaggtggagctacctgccagtcccctgccgtgcgctcgcattcct-



cagcccttgggtggtccatgggactgggcgccatggagcagggggtggtgcttgtcggggaggctggggc-



cgcacaggagcccatggagtgggtgggaggctcaggcatggcgggctgcaggtccggagccctgccctgcggga



acgcagctaaggctcggtgagaaatagagcgcagcgccggtgggc





1115
CCGCCTGTGGTTTTCCGCGCATTGTGAGGGATGAGGGGTGGAGGTGGTATTAGACGCAGC-



CGAATCCTCCCTCAGAGTCCGCCAGGTGGGCGTCTCAGGGGTGGGAGTGGCCGCGTCGTGAAGCGGAGAGAG-



GATTTCTCTCCTGGTCCTGGAGAAGGCCCCCGGCGGCCGGCGGCATCCCTCGCTGGCGAGTCCCGGGAGCGAGG



TGGTCTCTGCAGGGGAGGAAGTTCCCGGGCGGCGCGGCCTGCGTCACAG





1116
cgcgctctcccgcgcctctgcccgcccccggcgcccgcccccgccgctcctcccgactccccgc-



ccccggcccGGGTCACTTGCCGTCGCGGTGGGCGGCCCCCGGCGAGTCCACACCCCTGCCCCGCCTCCTCCCG-



GTAGGAAACTCCGGGACCCTGCAAGGGATGACTCACCCCAGTGATTCAACCGCGCCACCGAGCGCGGAGCTGCC



CTGGAGGACGCAGGCGGGTC





1117
TCCGGCCCAGCCCCAACCCCGACCTAAGTAACCGGCTATCGGCCACCCATTGGCTGAAGTCCCT-



GAGCACCTGTTGGGAGGAAGGCTGCTGCGTGCAGCCGGAAAGTCCTGCGTCCCTCCGCTCTTACCGCGGCAG-



GAACCACAGCCTCCCCGAACCTCAGGGTTTGTATGGATTTCGCCCAGGGGAAAGCGCTCCAACGCGCGGTGCAA



ACGGAAGCCACTGGCTGGTTGGGCGGCTGTGATGGG





1118
CCGGGTCAGGCGCACAGGGCAGCGGCGCTGCCGGAGGACCAGGGCCGGCGTGCCGGCGTCCAGCGAG-



GATGCGCAGACTGCCTCAGGCCCGGCGCCGCCGCACAGGGCATGCGCCGACCCGGTCGGGCGGGAACAc-



cccgcccctcccgggctccgccccagctccgcccccgcgcgccccggccccgcccccgcgcgctctcttgcttt



tctcaggtcctcggctccgccccGCTC





1119
GGGGCGGTGCCTGCGCCATATATGGGAgcggccgcccctcgccgcgcccctcgccgccgccgccgc-



cgcgctcgccgactgactgcctgacggcgccgcgagccggcccgagccccgcgagccccgcgagccccgccgc-



cgccgagcgccaccgagcgccgccgccgccccccgccacgcaccgcggcTCCTCGCGTCCAGCCGCGGCCAAGG



AAGTTACTACTCGCCCAAATAAATCTTGAAAAGAAACAAACG





1120
GCGCGGGCCCTCAGGTTCTCCCTATCGAAGCGGTCTATGGAGATAGTTGGATACTCGGCCATCTGC-



CCCTCGAAAGAACTCATAGCGCCGCCGATCCCAGAGTCCGGGACCCCAAAACCGCAGCTGAAGCCAAGGC-



CAGCCCTGACCGCGCCGCCACTTCCGGGAAGCCGCGCGCTGCCTCGCCATTGGGCGGCCGAACGCAGCCACGTC



CAATCAGAGGAGTCCGGAGACCGGGGGCAAAGTCAAGGAGCATCC





1121
cgtccgcggcTCCTCAGCGTCCCCCTTTACGGTCTGGGCGGACTGCGGGGGCTGGGGAGGTTCTGGG-



GACCGGGAGAGTGGCCACCTTCTTCCTCCTCGCGAAGAGCAGGCCGGGCCTACCCGTCCGCCCGCTCTGC-



CGTCCGCTGGCCGGCCGACTGCTGCCCGATCACTCCTGAGGCCGCCGTTGGGCGACAGGGCGGTGCGGGAG-



GAGGACTGCGCAGGCGCAGTGGGCCAGGCGGCCCGGCGACCAATCGG





1122
GGAGGCGCCCAGCGAGCCAGAGTGGTGGCTGGTCCCGCGCGGTGAGTGGGATTGGGGCACTTGGG-



GCGCTCGGGGCCTGCGTCGGATACTCGGGTCCGCTCGGGAGCGCGCTGGCCGCAACGAGGGCGGCGCGGGC-



CCGGGCGATGGCGTGGCTTGCGTCTCCCGCCTccgggcagggcctggccgccgggcgggggcgggagggccacg



cgggcccagggtggggccgcggcctgcgcggcgggcgggccgggt





1123
CGCGCAGGGGGCCTTATACAAAGTCGGAGAAGTAGCTGGGTCGCTGGCCGGCCAGGGACTCAAGC-



CGCCTCAGGTGAGCGCTCCTTGGCGCTACTTCCGGTCTCAGGTGAGGCCGCCGGAAGCGGGCACTTGGC-



CCTAAGACCCGCTACAGTGCGTCCTCGCTGACAGGCTCAATCACCACGGCGAGGCCAAggcgcggggccgcggc



ccgcccgAGAAGCCTGAGCTGGGCCCCGACACCCCCTGCCCGACATT





1124
CCCCACCCCCTTTCTTTCTGGGTTTTGATGTGGATGTCTTTCTATTTGTTCAGGAAATTGTGACGT-



GTGTTCTGGGCAGGGTTTGAGGTTTTGGAACATTTTCTAAAAGGGACAGAGAGCACCCTGCTA-



CATTTCCTAATCAAGAAGTTGGCGTGCAGCTGGGAGAGC





1125
GCGCGTTCCCTCCCGTCCGCCCCCAAgccccgcgggcctcgcccaccctgcccgccgcccctccgc-



cggcggccgcccTCTGCGGCGCCCCTTTCCGGTCAGTGGAGGGGCGGGAGGAGGGGCGGGGGTGCGCGGG-



GCGGGGGGAGAAGTCCTGGAGCGGGTTTGGGTTGCAGTTTCCTTGTGCCGGGGATCCTGTCCCCTACTCGCCAG



CGCCAGGCTCCTCC





1126
ccggcggAGGCAGCCGTTCGGAGGATTATTCGTCTTCTCCCCATTCCGCTGCCGCCGCTGCCAG-



GCCTCTGGCTGCTGAGGAGAAGCAGGCCCAGTCGCTGCAACCATCCAGCAGCCGCCGCAGCAGCCATTACCCG-



GCTGCGGTCCAGAGCCA





1127
GCCTGGTGCCCCGAGCGAGCCGGGAGTAGCTGCGGCGGTGCCCGCCCCCTCTCTCCGC-



CCCTCCAGCGGAGCTGGTCTCCGGCCGGGCACCGTCGCGGGCCCCCCTGGCCCGGCCACCTGGGACCGTGCT-



GGGGAGTCTGCCACTTCCCTCTCTCCCCTGGCCCGCAAAGTTTTGGCGGAGCCATCGCTGGGGCTGAGCGCGCC



CCCGGGGGGAGATCGGGGAGCGCCCGATGCCGGGCGGCCGGAGCCATTGAC





1128
GGCGGCGGCGCTACCTGGAGGCGCGGTGGCGGGCAGGTGCCCGAACTGCACGGCGATGCAGAG-



GTCGTTGTCCAGGGGGAACTTGTGGCAGTGCAGCATCTCAGGCCAGGGGAAGCCGTAGGCCTCCATGAGCG-



GCGCGCAGCCGGCGCGCACGGCCTCGCACAGCGAGCGGCACGGGTAGATGGGCCGGTCGAGACAGACGGGCGCA



AAGAGCGAGCACAGGAAGACCTGCGTATCCGAGTGGCAGCGCTTGGC





1129
TGGTGGCCAGCGGGGAGCGCCCGGGCGCCATCGGCGCGTCCTGCTCCACCAGGGCGACCCTGG-



GCGCTGAGAAGCGGGAATCTTCCTTGGGGACCAGGGCGACGCCTCCTGCTGCCGCCCCCGGCGGGACAGC-



CGCGGCTCCTCCTCCAGCCGCCGCGCCACCCAGAGCCCGAGGTTTGCCCTTCAGAAGCGGACCCGCAGACTCCT



CGGACTCAGAGCCATCCTCCTCCTCAACCTCCACCGCAGCGGCCTGCG





1130
GCGGCACTGAACTCGCGGCAATTTGTCCCGCCTCTTTCGCTTCACGGCAGCCAATCGCTTCCGCCA-



GAGAAAGAAAGGCGCCGAAATGAAACCCGCCTCCGTTCGCCTTCGGAACTGTCGTCACTTCCGTCCTCAGACT-



TGGAGGGGCGGGGATGAGGAGGGCGGGGAGGACGACGAGGGCGAAGAGGGTGGGTGAGAGCCCCGGAGCCCGAG



CCGAAGGGCGAGCCGCAAACGCTAAGTCGCTGGCCATTGGTG





1131
CTCGGCGATCCCCGGCCTGAACGGGTAGGAGGGGTTGGGGGATTCCGCCATCCCTTGTTTTGAG-



GCGGGAACGCAACCCTCGACCGCCCACTGCGCTCCCACCCACACCCAGAGTAATAAGCTGTGATTGCAGGCT-



GGGTCCTCACCGTCTGCTCGCCAGTCTTCTCCTTTGAGGACTCAGAAGCCAAGGGTTGCGGGAGGCACCA





1132
CGCAGGGAGCGCGCGGAGGCCCGCAGGGTGCCCGCCTGGCCGCAGAGGCCGCGACGCCCCCTCCGC-



CACCCTCGGGCCGCCGAAAGAACGGGCAGCCGGGAAATCCCGTGTCCCCACTCGTGGCAGAGGACGCTGTGGG-



GCGGGCGGGCTGCGGGCTCCCGGCGCCTTCCCGCAGAGGCGGCGACAgcggccgccccccccgcggggccgggc



cggggAACTTTCCCCGCCTGGAGCCGGGC





1133
GAAATACTCCCCCACAGTTTTCATGTGATCAGGAATTCAGCATAGGCTATAAGACGGAGTGCTCCAT-



GTCAATAGAGAATATTTCCACAGGTGTGCTAGGCACTTGTGGTAGATGTTGCAGGGAAGTCAGGACTGGG-



GACAGCTTGGTCCCTACTTCAAGGTTACAGTCTAGGAGCTGAGAGTGGCAAAGTGACCTGATTCTACAGGGTAA



AAGCCCCAGAGATAAATGACATAGGTCCAGGTCAGCCAGCATTG





1134
CCGGGCGCACGGGGAGCTGGGCGGACGGCGGCCCCCGCCTCCTCCGGGGACGCGGCAC-



GAGACGCGGGGACGCGCGGACGCCACGCTCAGCGGCCGCCCCCGGCCTCCGCGCCGCCTTCCTCCCGG-



GAGCAGCCCCGACGCGCGCGGGCCCGGACCGCCGGGGTTGTCATGGCAGCAGCTCCATCCCTGACCGCCACTTT



CTCCCGGTGCCGCCTCGGAGCGAGCGGGCTGGCGGGCGGCGCGGACTGCGCGCTC





1135
gcggcggcgTCCAGCCAGAGCCCTGTGGAAGCGGCGGCGACACTTGGGCTGGGCAGTGTCTCTGAT-



GCCTCCCAGCGCCAGCGACTGCTCTTATTCCCGCCGCTGTGGGTCGGGAAAGTTCCGCCAGTGCACAGCAAC-



CAATGGGCGGAGGGGTCCTTTGCCCCTGGGTTGCGTCACCCTCATGCTTCCAGAACCTGGAGGATCCAGCAGGA



CCGTCCCACTTGTATTTGCATTGAGGTCATTGATGGAAATGGT





1136
GGGTCGCCGAGGCCGTGCGCTTATAGCCGGGATGACGCCGCAGTTGGGCCGGATCAGCTGAC-



CCGCGTGTTTGCACCCGGACCGGTCACGTgggcgcggccggcgtgcgcggggcggggcggagcggggcctg-



gcctgggcggggcAACCTCGGCGCACGCGCACAgcgcccgggcggggggcggggTGGTGGTGCGCCTGCCGCGC



CTACAGTTCCCGCCGCTCGCGCC





1137
CGCGCCTGATGCACGTGGGCGCGCTCCTGAAACCCGAAGAGCACTCGCACTTCCCCGCGGCGGT-



GCACCCGGCCCCGGGCGCACGTGAGGACGAGCATGTGCGCGCGCCCAGCGGGCACCACCAGGCGGGCCGCT-



GCCTACTGTGGGCCTGCAAGGCGTGCAAGCGCAAGACCACCAACGCCGACCGCCGCAAGGCCGCCACCATGCGC



GAGCGGCGCC





1138
ccgggagcgggcggaggaagggccgggcgtccggcgcaagcccgcgccgccccagcccoggccccg-



gcccggcccgcACACGCCGCTTACCTGGAAGCCGGCGACGCTGCCGCCCACCTCCCTGCTGCGTGTCGCAAAC-



CGAACAGCGGGCGTTGGCCCTCCTGCCGGACACTCCTCTGCCAGCGCCGCTCTGGCCGAGTCGCGGGGGCCGAA



TGTGCGACGGGGCAGAGCGGG





1139
GGGGCGCACCGGGCTGGCTCCTCTGTCCGGCCCGGGAGCCCGAGGCGCTACGGGGTGCGCGG-



GACAGCGAgcgggcgggtgcgcccgggcgcggcggcggcAGCGTCGGGGACCCGGAGCTCCAGGCTGCGCCT-



TGCGCCCGGGTCAGACATTATTTAGCTCTTCGGTTGAGCTTCGATTGGTCAAACGGCGCCGccccccccccccc



gccccccgccccccgctccccGCTCGCCCGCGCTAC





1140
GCCACGGGAGGAGGCGGGAACCCAGCGAGGCCCCCGAgggctggggggaccggccggccg-



gacaaagcggggccgggccgggccggggcggggccgtgcggggcTCACCGGAGATCAGAGGCCCG-



GACAGCTTCTTGATCGCCGCGCCGTTGGCGCTGGCGGCCGCGGTGCCGGCCGCGGGACGTCCCGAAATCCCCGA



GTGCAGCTGGTCAGCGAGAGGCTCCTGGCCGCGCTGCCCCTGGTTCGCGCCCTGCT





1141
cgggcatcggcgcgggatgagaaaccaacctgatacttatcgtgtgccgagttccctcct-



tgtatcctgactaagcacagcgaataaccctgtccttgttctaaccccaggtcttgaagaaatact-



gtcccagctgagccccgcgtttacaagatgaagaggcgccccagatgcgctgaaagaaaggccaaagctcgtgc



ctccttccactgcctgcggtagaacctggtcccgcatagcttggactcggataag





1142
acaccgccggcgcccaccaccaccagcttatattccgtcatcgctcctcaggggcctgcggcccggg-



gtcctcctacagggtctcctgccccacctgccaaggagggccctgctcagccaggcccaggcccagccccag-



gccccacagggcagctgctggcagggccatctgaagggcaaacccacagcggtccctgggccccaacgccaggc



agcaaggactgcagcgtgcctacctgtgcagctgcaacccag





1143
CCCCAACAGCGCGCAGCGAACTCCACTGCCGCTGCCTCCGCCCCAGAGACACGTTGCAGGCCA-



GAGCGGCCGGGGCGCGGGGCATCACGGGACGGCCTCACCTGGCCTCTTGGAGGACTCCCGAAGCCCGAGGC-



CGCCAACCGAAGGAGGCCCCGCCCCCGGAGGCACCGCCTCGCCTCTTTCCGCCAGCGCCCGCAGGACCCGGATG



AGAGCGCACGCTTCGGGGTCTCCGGGAAGTCGCGGCGCCTTCGGATG





1144
CCCCGCTGGGGACCTGGGAAAGAGGGAAAGGCTTCCCCGGCCAGCTGCGCGGCGACTCCGGG-



GACTCCAGGGCGCCCCTCTGCGGCCGACGCCCGGGGTGCAGCGGCCGCCGGGGCTGGGGCCGGCGG-



GAGTCCGCGGGACCCTCCAGAAGAGCGGCCGGCGCCG





1145
CCCGGGGGACCCACTCGAGGCGGACGGGGCCCCCTGCACCCCTCTTCCCTGGCGGGGAGAAAGGCT-



GCAGCGGGGCGATTTGCATTTCTATGAAAACCGGACTACAGGGGCAACTCCGCCGCAGGGCAGGCGCG-



GCGCCTCAGGGATGGCTTTTGGGCTCTGCCCCTCGCTGCTCCCGGCGTTTGGcgcccgcgccccctccccctgc



gcccgcccccgcccccctcccgctcccATTCTCTGCCGG





1146
CCCGCGGAGGGGCACACCAggcgggtgttggggaggacgcagagggctggggctggagcccag-



gcggggcagggggcggggcggagctgggtccgaggccggCGGGGGCGCCTCCATCCCACGC-



CCTCCTCCCCCGCGCGCCCGCCCGCTCTCGGGTGACTCCGCAACCTGTCGCTCAGGTTCCTCCTCTcccggccc



cgccccggcccggccccgccgAGCGTCCCACCCGCCCGCGGGAGACCTGGCGCCCCG





1147
GCCCACGTGCTCGCGCCAACCCCTACGCCCCAGCGCGCCTTCTCCACCCACGCACGGGCCTCG-



GACGCATTTCCAGCCCCGGCGTTGGTTGTGGATGCTGGACATCCACCGCCTCCAGGCAGTTTCGCCGTCACAC-



CGTCGCCATCTGTAGCCAAAGCAAAACATATCCTAACTGAGACTTTGCAGCTCTTGTGGCCACTCTGGGCTCAC



CGGGAACATGAGTGGAAGAGCCCGAGTGAAGGCCAGAGGCATCGC





1148
GGCGGAGCGGCGAGGAGGAGGAGCAGGAGCGCGCAGCCAGCGGGTCCACGCATCT-



CAGCACTTCCAGACCAACTCCGGCACCTTCCACACCCCTGCCCGGGCTGGGGGCTCCGAGAGCGGC-



CGCGAAGCGACTCCGATCCTCCCTCTGAGCCTTGCTCAGCTCTGCCCCGCGCCTCCCGGGCTCCGGTCCGCGCG



GCGGGGTCCCTGCTCCTGCGCCCCGGGCGCGCTTCCCGGACACCCCGGTCCCCGCAGCC





1149
CCTCGCCGGTTCCCGGGTGGCGCGCGTTCGCTGCCTCCTCAGCTCCAGGATGATCGGC-



CAGAAGACGCTCTACTCCTTTTTCTCCCCCAGCCCCGCCAGGAAGCGACACGCCCCCAGCCCCGAGCCGGC-



CGTCCAGGGGACCGGCGTGGCTGGGGTGCCTGAGGAAAGCGGAGATGCGGCGGTGAGGCGCGGCTTGGGCCG





1150
caggcgcgccgATGGCGTTTCTGAGGTGACGCCGCCCACACCGGGCTTCTCCGGGGGCGGAG-



GAAACACCTATGAACCCTCCGGCAGCCTTCCTTGCCGGGCGCCAGGTAAGCAGCGGTTccgggcgcgg





1151
CTCCCGGCTTCTGCATCGAGGGCCTTCCAGGGCCAGCCCTTGGGGGCTCCCAGATGGG-



GCGTCCACGTGACCCACTGCCCCCACGCCCGCGCGCGGGCCCCAGCAGCCCCAGAGCTGCGC-



CAACTTCGTTCACTCCGCGCTCACCTTACGGGGGTCCCCGCGTGACCGCATGGGGTAGCCCCTGCTCCCACGCT



CCCGGCCGA





1152
CGGTCCGCGAGTGGGAGCGGCTGCTTGTGGGCAGGGTGGACGCGGGGCCACGTCTTGGCCG-



GCGTTTTGCGGGGTCTTCCTGTTCTGAACGCGCGTAACTTTTGCCTCAGTATCTCACTTCTTGGAATCCGGCG-



GCGTTCACGTGTGTGCTCCAGAGAAGGGCGCCAGAGGGTATTCCCTGAAAGTGAAAGGTCGGCGAAAGAGGAGT



AAAGACGGCGAGACGCGTCCACGCAGGGGGAGTCTGTGCGGTTTGGA





1153
GCAGCGCCGCCTCCCACCCCGGGCTTGTGCTGAATGGGTTCTGATTGTGCACGGGGTGCACACTGG-



GCATTTCTTGGAAGGGGCACACTGacgcgcgcacacacgcccccgacgcgcacgcgccccgcgcgcact-



cacactcacccccgcgcacactcacccccgcgcacactcacgcTGCCGCCGCGCTGAGGTGCAGCGCACGGGGC



TTCACCTGCAACGTGTCGATTGGACGGATGGGCTCGGCGCGTGGGT





1154
CGACCGTGCTGGCGGCGACTTCACCGCAGTCGGCTCCCAGGGAGAAAGCCTGGCGAGTGAG-



GCGCGAAACCGGAGGGGTCGGCGAGGATGCGGGCGAAGGACCGAGCGTGGAGGCCTCATGCCTCCGGGGAAAG-



GAAGGGGTGGTGGTGTTTGCGCAGGGGGAGCGAGGGGGAGCCGGACCTAATCCCTCACTCGCCCCCTCC





1155
CCCGGGCTCCGCTCGCCAACCTGTTACTGCTGCAGAACGCCAGGAAGCTCAGCCTG-



ATCCCACAGATTAGGGTAAAATATCCCGGGGGGCCGAAGTGGAAACCGGAGTTGCGTCATTGCTCCCAC-



CCGATATCACCTTGGCAGCGACCGCGGCTGACCACGTTCCCGGCCTGTCGCGAATCTCACCCAAGGGAGCTGAG



TCTCAGCTTCCCTGGTCCCTGGTCCCGAGTTCCGCCTTCCCCCCCCGCCCCGTGGC





1156
CATGGGGTGCTCATCTTCCCGGAGCTGAGGAGCTGGGGCGGGCATGGGGTGCTCATCTTCCTG-



GAGCTGAGGAGCTGGGACGGGCATGGGGTGCTCATCCTCCTGGAGCTGAGGATCTGGGGCGGGTGTGGGAT-



GCTCATCCTCCTGGAGCTGAGGAGCTGGGGCGGGCATGGGGTGCTCATCTTCCCGGAGCTGAGGAGCTGGGGCG



GGCATGGGGTGCTCATCTTCCCAGAGCTGAGGAGCTGGGGCGGGCAT





1157
CCGAGAGCCGGAGCGGGGAGGGCCCGCCAAGTCAGCATTCCAGCCGGTGATTGCAATGGACAC-



CGAACTGCTGCGACAACAGAGACGCTACAACTCACCGCGGGTCCTGCTGAGCGACAGCACCCCCTTGGAGC-



CCCCGCCCTTGTATCTCATGGAGGATTACGTGGGCAGCCCCGTGGTGGCGAACAGAACATCACGGCGG





1158
CCGCTGCAGGGCGTCTGGGCTTCTGGGGGCAGAGAAGACTCACGCAGTGAGCAGTCCGCAAGC-



CCGCTGGCGGCAGCGGCGGTGCTCCGTCCAGGGCGAGAAGCTGCAGCGCTCGGGCCGGGGTCCCTCCT-



GTCGCAGCAGCTCCTCGACGAGTGCAGGGGCAGCCACG





1159
gcgctgccccaagctggcttccgctgcctgctctgggctgggctgggctgggctgggctggtag-



gacctgctcccagggcgggaggggacacacccacctcagcagatctcagcccatccctcccagctcagt-



gcactcacccaaccccacacgggccaaggagagagtgaagaggaagcattgccctcagaggccttcacggactg



gccaga





1160
CAGGATGCCAGCGTGACGGAAGCAAGTAACCACCAAGGCATCACCACTGGCGCTAAACTTCT-



CACTTCCGGAGTGCTGCAAGCGCAGAAAATATACGTCATGTGCGGAGGCGGAGCTTCCGCCCT-



GCGCGTCGTATTAGACGGAAACCGAGCGGGCCCATTTTTCATGGGTTTGCGGACCCACCAGCGAAGGCGGGAGG



TGTCGCAGGGACATCTTCTGGCTGTTTCCGTCGCCTGCGTGGCCCTTGCACCCCGG





1161
GGCGGTGCCATCGCGTCCACTTCCCCGGCCGCCCCATTCCAGCTCCGGAGCTCGGCCGCAGAAACGC-



CCGCTCCAGAAggcggcccccgccccccggcccAAGGACGTGTGTTGGTCCAGCCCCCCGGTTCCCCGAGAC-



CCACGCGGCCGGGCAACCGCTCTGGGTCTCGCGGTCCCTCCCCGCGCCAGGTTCCTGGCCGGGCAGTCCGGGGC



CGGCGGGCTCACCTGCGTCGGGAGGAAgcgcggcg





1162
GTGGGTCGCCGCCGGGAGAAGCGTGAGGGGACAGATTTGTGACCGGCGCGGTTTTTGTCAGCT-



TACTCCGGCCAAAAAAGAACTGCACCTCTGGAGCGGGTTAGTGGTGGTGGTAGTGGGTTGGGAC-



GAGCGCGTCTTCCGCAGTCCCAGTCCAGCGTGGCGGGGGAGCGCCTCACGCCCCGGGTCGCT





1163
GGCGGAGGGCCACGCAGGGGAGACAGAGGGCCTCCACAGGGGCCAGGGGGAAGTGTGGGAACT-



GAGTCTCCCCCAGACGAGGCTTCACTTGGACACGTGTATGTGGTCACCGGGGGAAACTGAGCAGTTCT-



GACTTCCCTTGGAAGGCGTGGAATTAGGAGAGAAATCCCTTAGTGGGCACACGAGTGAGTGCCCCTTGGAGTCC



ATCTGTGGAAAGGAAGCGGTGATAGGTTTCCGCA





1164
GTCCGGGGGCGCCGCTGATTGGCCGATTCAACAGACGCGGGTGGGCAGCTCAGCCGCATCGCTAAGC-



CCGGCCGCCTCCCAGGCTGGAATCCCTCGACACTTGGTCCTTcccgccccgcccttccgtgccctgc-



ccttccctgcccttccccgccctgccccgcccggcccggcccggccctgcccaaccctgccccgccctgcc





1165
CGGCCTGCGGCTCGGTTCCCGCCTCTTCCCCACCCCCAGCCCCGCGCTGCCCTCTCGGTCCCCCT-



GCGCGACCCCAGGCTCGGCCCCTGCCCGGCCTGCCGGGGTGGCCCGGGGGTGGGGTGGGAGCCCTTTGTCT-



GCGTGGGTCGCCTCGCGTCTCTCTCTCCCACCCCACCTCTGAGATTTCTTGCCAGCACCTGGAGCCCGAAACCA



GAAGAGTTGTCAGCCCAACAAGAATATAGGATCACCGGCCCATCA





1166
GGGAACCGTGGCGGCCCCTCCTGGCCCTGGGAGGTGGTCCCGCTGCCCCCCTGACTTCCGTGCACT-



GAGCCCCTGGCCCTGCCCGCAGCCCCGGCCCTGGACTCGGCGGCCGCGGAGGACCTGTCGGACGCGCTGTGC-



GAGTTTGACGCGGTGCTGGCCGACTTCGCGTCGCCCTTCCACGAGCGCCACTTCCACTACGAGGAGCACCTGGA



GCGCATGAAGCGGCGCAGCAGCGCCAGTGTCAGCGACAGCAGC





1167
cggggaaggcggggaaggcggggaaggcggggaaggcggggaaggcggggaaggcggggatggT-



GAGACggtgaggcggggcggggcctggggcgcgggcggggcggggaggggtggggcggggcCCGGGGGCGCTG-



GACCGCGGTGCTGCGGGACGGATTCCCGGCGGCTGCGCGGGAGGCTGCGAGCCTGGGCTCCCAGGGAGTTCGAC



TGGCAGAGGCGGGTGCAGGGAACCCGCGGCTCGGCGGGAGCGTG





1168
cctcccggtttcaggccattctcctgcctcagcctcccaagtagctgggactacaggcgcctgccac-



cactcccggctaattttttgtatttttagtagagacgggggtttcaccgtgttagccaggatggtctcgatct-



gcttacctcgtgatccgcccgcctcggcctcccaaagtgctgggattacaggcgtgagccaccgcgtccggcAT



ATTT





1169
AGCCCGCGCACCGACCAGCGCCCCAGTTCCCCACAGACGCCGGCGGGCCCGGGAGCCTCGCGGACGT-



GACGCCGCGGGCGGAAGTGACGTTTTCCCGCGGTTGGACGCGGCGCTCAGTTGCCGGGCGGGGGAGG-



GCGCGTCCGGTTTTTCTCAGGGGACGTTGAAATTATTTTTGTAACGGGAGTCGGGAGAGGACGGGGCGTGCCCC



GACGTGCGCGCGCGTCGTCCTCCCCGGCGCTCCTCCACAGCTCGCTG





1170
CCCCAGCCACACCAGACGTGGGAGCTTAGGATGAGAGCGGCCTCCGAGCAGATGATCACCCTG-



GAACGACGCCAAACGCGACCCCTACCAGAGGACTCGCGCATGCGCAGCGCAGCCTGGGCCGGCGGCCTGG-



GCAGGATGTAGTCGCGAGCAGCGCACCGGGCCCACGCCAGCGGAATTGCGCATGCGCAGGGCCGCCTCTGCCTG



CGGCCTGGGCTGGG





1171
tgggcttcctgccccatggttccctctgttcccaaagggtttctgcagtttcacggagcttttca-



cattccactcggtttttttttttttgagactcgctctgtcgcccaggctggaatgcagtggcgcgatctcg-



gctcactgcaagctccgcctcccgggttcacgccattctgcttcagcctcccaagtagctgggattataggcgc



ccgccaccacgcccggctaatggctaattttttgtattttttttt





1172
CCGCGCTGGGCCGCAGCTTTCCGGAGCGCAGAGGAAGCTGGCCAGCCTGCAGATAGCACTGG-



GAAAGACACCGCGGAACTCCCGCGAGCGGAGACCCGCCAAGGCCCCTCCAGGGACCTGTCTTCCTAACTGC-



CAGGGACGCCGAGCCAACTC





1173
gcatggcccggtggcctgcactccagtgaggtggctgaactctgaccagccaagagaaaac-



ccccctctccgccccaaacagctccccactcccccagcctgcccccaccctccccacattccagtctttcact-



gtcgccccaggcaacttggctgcccaagaccaagccccaccaagaagctggagggccaggcaagtccaggatgg



gcaagcagggaagcacgagagggagaaacagaggtgaggaaggaagg





1174
GGGCAGGGGAGGGGAGTGCTTGAGTATTGGGGCTACACTCACCACAAGAGCAGCAAACAAAGCACT-



GGGTGTGGTAGAGGCTGTCCAGGGCCTGGCAGGCATTGCTCTGCCCATAGATGCCTTTGTTGCACT-



TGATACAGGTGCCTGAGAAGAGAAAAGTGTCACACTCTACTCCCCCAGGTCAAAACCAGG-



GATTCCCAAGCTTTCCTGACTGCCCTTTCCTGATGTGCCAGGGGTCA





1175
CCCCGGCGCCTTCCTCCTCCGGACTCCGCTGCATGCCTCGCTTGCGGTGGTCCGATCG-



GCTTTCTCCGGGAGCTTTCCTCTCCCCGCCACGCCCCCGTCTCCCCGGCCGTCCCCGCGCCTCTCG-



GCCTCCCTTTCATTAGCCCCACATCTGTCTTTCCCATGGGAGGGAGCGCGCGCCTTCCGCCCAGCGGGGCCCTT



AGCAGAGCCTCTCCAATCCTCGGCGCCTCCCCTACACAGGGTTCGCTGGGCCGTTCT





1176
CCACCGCGCTTCCCGGCTATGCGAAAGTGAAAACGAGGGGCGCCCAAGGCCCT-



GCTTCTTCCCCCTTCCTCTTCCCCTTGCCCAGCCGCGACTTCTTCCTCACTGATCTCCCGGGGGCG-



GAGACGCTGAGTTCCCCGGAGACGAGTTAGTCACCAAGAAGAGGCGGTGACAGAGAGCGCGGCTCGCGTCGCAC



TCCGAGGCC





1177
CCGCATCTGACCGCAGGACCCCAGCGCTACCAAGTGCCTGTTCTTGGACCCCCAGCCGAGCAGGGG-



GAAGCATCCCCAGCTCCCGCACCCAAGTCCCTGGCGCCGCTGCCGGGCCGCCCTCCCTGATGC-



CCAGCGCGCAGCCTGCCGGCGCCGCGCCTTCTGGACGGCTCTCGCCGCACCTCCTGAGCTCAGCCCGCGGCCCC



GCAGTGGGGCGGCCTCACTTACTGGCGGGGAAGCGCGGGTCTGGGTTGGCGC





1178
GCGGACACGTGCTTTTCCCGCATTAGGGGGGGTCTcccggcgcgcgccccgccgccACCTGTTGAG-



GAAAGCGAGCGCACCTCCTGCAGCTCAGGCTCCGGGCGCCAGCCCTGCCCCGCAGCCCCAGAGC-



CCGTCGCAGCTCGGGTGGTCCCTCCCCGGCCCAGCGCTCGCCGCCTGCTCTTCGCCCTGCAAGTTTCAAGAGGC



AGTTATTTCTCGCAGCCTCCGCGCTTGCA





1179
GAGCTGGAAGAGTTTGTGAGGGCGGTCCCGGGAGCGGATTGGGTCTGGGAGTTCCCAGAGGCG-



GCTATAAGAACCGGGAACTGGGCGCGGGGAGCTGAGTTGCTGGTAGTGCCCGTGGTGCTTGGTTCGAGGTGGC-



CGTTAGTTGACTCCGCGGAGTTCATCTCCCTGGTTTTCCCGTCCTAACGTCGCTCGCCTTTCAGTCAGGATGTC



TGCCCGTGGCCCGGCT





1180
GGCCGCCAACGACGCCAGAGCCGGAAATGACGACAACGGTGAGGGTTCTCGGGCGGGGCCTGG-



GACAGGCAGCTCCGGGGTCCGCGGTTTCACATCGGAAACAAAACAGCGGCTGGTCTGGAAGGAACCTGAGC-



TACGAGCCGCGGCGGCAGCGGGGCGGCGGGGAAGCGTATGTGCGTGATGGGGAGTCCGGGCAAGCCAGGAAGGC



ACCGCGGACATGGGCGGCCGCGGGCAGGGCCCGGCCCTTTGTGGCCG





1181
GCGCCCGGTCAGCCCGCAGCGCCCGGCCAGCCCGCAGCGCCGGAGCCCGCAGTGCGTGCGAGGG-



GCTCTCGGCAGGTCCAGACGCCTCGCCGAGCCCAGCCCGCAGCTccccgggccgcgccgcgcccgcccACAGG-



GCCCACAGCCCTGCTTCGGCTCTCAGGGCGGTCACCTGGGATGGGG





1182
CCCGCCAGGCCCAGCCCCTCCCTGGCCAGCCCCGTCCTTGTCCCCAAACTgggcccgcccggccgc-



caggccgccgggcctccggggcccTCGCGCATCCGGCTCCGAAAGCTGCGCGCAGCCATCATCAGGGC-



CCTTCTGGTGTTAGAAGAGACCCCGGCATCATCTTTTCGTCGCGTGCTTCCCCCAGAGTCA





1183
CGATTCTTCCCAGCAGATGGCCCCAAAGTTCAGTTCCTGAATTGCCTCGCGGAGCCGCGGGCT-



GCAACGTGAGGCGGCCGCTGCCAGTCGACTCAACCACCGGAGTGGCCCCTGCAGTTGGATAGCAAC-



GAGAATCCTCCAGGGGTGCAGGGCGACGGCTTCGGCCGCACC





1184
CGCACACCGCCCCCAAGCGGCCGGCCGAGGGAGCGCCGCGGCAGCGGGAGAGGCGTCTCTGTGGGC-



CCCCTGGCAGCCGCGGCAGGAAAGGGCCCGAAGGCAGCGAAGGCGAACGCGGCGCACCAACCTGCCGGC-



CCCGCCGACGCCGCGCTCACCTCCCTCCGGGGCGGGCGTGGGGCCAGCTCAGGACAGGCGCTCGGGGGACGCGT



GTCCTCACCCCACGGGGACGGTGGAGGAGAGTCAGCGAGGGCCCGA





1185
AGGCCCCGAGGCCGGAGCGGCGGAGGGGGCGGCCCCTCCCACAGGGTCTTCCCACCCACAGGGCAC-



CCAGGCGCAGCGGAGCCAGGAGGGGGCTTACCCGCGGGCAGGGACGGAGCACGCCGGGGCCCTGGAGGG-



GCGACGCTCGCTCGTGTCCCCGGTCCCCGTGGCC





1186
GGGTTCGCGCGAGCGCTTTGTGCTCATGGACCAGCCGCACAACTTTTGAAGGCTCGCCGGCCCATGT-



GGGGTCTTTCTGGCGGCGCGCCGCCTGCAGCCCCCCTAAAGCGCGGGGGCTGGAGTTGTTGAGCAGCCCCGC-



CGCTGTGGTCCATGTAGCCGCTGGCCGCGCGCGGACTGCGGCTCGGCGTGCGCGTGTTCCCGGCCGTCCCGCCT



CGGCGAGCTCCCTCATGTTGTCGCCCTGCGGCGCCC





1187
CCAGTCTCCCGCCCCCTGAGCATGCACGCACTTTGGTTGCAGTGCAATGCTCTGACTTCCAAATGG-



GAGAGACAAGTGGCGGAAAATAGGGTCTTCTCCCACCTCCCACCCCCCCATCCCGACTCTTTTGC-



CCTTCTTTTGGTCCAAGAGATTTTGAAACCGTGCAGAACGAGGGAGAGGGGCAGGCTGCAGCCGGGCAGATAAC



AAAACACACCCCAAAGTGGGCCTCGCATCGGCCCTCGCATTCCTGTAGAG





1188
GAGGAGGCAGCGGACCGGGGACACCCTGGGGGAACTTCCCGAGCTCCGCGACCTCGAAGCCTGGC-



CCTTCCTTCTCCCTGGTCCTACATGCCTCCCTCCCCCACTGTCCGGGGTCCTGGCCTCGACGCCGAGGGGT-



GTCCCTCTCCTCTCCTGGTCAGGGAACGCAGCAACTGAGGCGGCGCGGCCCAGATGAGACGGGAAGCGCCTGCG



GGCCGTGGGCGCGGGTGGAACCC





1189
CCGGCTCCACGGACCCACGGAAGGGCAAGGGGGCGGCCTCGGGGCGGCGGGACAGTTGTCGGAGG-



GCGCCCTCCAGGCCCAAGCCGCCTTCTCCGGCCCCCGCCATGGCCCGGGGCGGCAGTCAGAGCTG-



GAGCTCCGGGGAATCAGACGGGCAGCCAAAGGAGCAGACGCCCGAGAAGCCCAGGTGAGCGGCTGGGCCGCGCC



GGACGGGCGTCGGGGGTCTGGGCCGCGA





1190
CCGCCACCGCCACCATGCCCAACTTCGCCGGCACCTGGAAGATGCGCAGCAGCGAGAATTTCGAC-



GAGCTGCTCAAGGCACTGGGTAAGCTGGTGCAGAGGGCGCGCCCCGACGGGGAGATGCGGCCCGGAGGTGC-



CCTGGTCCCGGAAGTGCCCCGGTCCTGGAGGGGGTGGAAGTTGGGGAGCCCAGGCAGGAGGGAGTCCCCGGGGC



AATAGATCGCCTTGTCTCCCAGGCGCACCGGGTCTCG





1191
TGAGTAAGGATGATACCGAGAGGGAAGAAAAAAATACCCTCTTTGggccaggcacggtggctcac-



ccctgtaatcccagcactttgggaggctgaggcgagcggatcacgagatcagaagatcgagaccatcctg-



gctaacacagtgaaaccccatctctaccaaaaatacaaaaaattagccaggcatggtggcgggcacct





1192
tgggccaggcacggtggctcacccctgtaatcccagcactttgggaggctgaggcgagcggatcac-



gagatcagaagatcgagaccatcctggctaacacagtgaaaccccatctctaccaaaaatacaaaaaattagc-



caggcatggtggcgggcacctgtagtcccagctacttgggaggctgaggcaggagaatcctttgaacccaggag



gcggagcttgcagtgagctgagattgtgccactgcactccag





1193
CCGGCGAAGTGGGCGGCTCCCCAAGCGCCCAGGCTGCGCAGCACGATggccgcccccgccgcgcac-



cgcgtgtgcccgcacgcccgccccctgcgccccggggacgcctctccgcccctccccctgcccctccgcccac-



cgcgcggtcgccccacgccgcgggcgctgcttcgccgcccgggaggccgcctcccgccccgggACCGGATAACG



CCCTAAATCAGCGCAGCTGAGGCGAGGCCGTGGCCCCCGCAG





1194
GCGGCCTTACCCTGCCGCGAGCGCCTGTGACAGCGGCGCCGCTGTGCTCGCGACCCCGGCTCCGG-



GCCTCTGCCGACCTCAGGGGCAGGAAAGAGTCGCCCGGCGGGATGGGCGGGGAGGCTGGGTGCGCGGCGGC-



CGTGGGTGCCGAGGGCCGCGTGAAGAGCCTGGGTCTGGTGTTCGAGGACGAGCGCAAGGGCTGCTATTCCAGCG



GCGAGACAGTGGCCGGGCACGTGCTGCTGGAGGCGTCCGAGCCGG





1195
gtggggccggcgAGGGTCAGGGGCATCGCGGCCGCGACCCCATTCTGCAGCCCCCGAGGCTCGC-



CCGACTCCTGGCTGCCCTGGACTCCCCTCCCTCCTCCCTCCCGCCTCCTCGCCCAGGGCCCGGCTCACCTg-



gcggcggggcgcgggacgccgcgggcgggacggcggggggctccggggcgctccggggcggcTCTCGCGCATGC



TCCGGGGC





1196
CGGCGCGGACCGGCTCCTCTACCACTTTCTCCAGCTGCACTGCCACCCAGCCTGCCTGGTGCTGGT-



GCTCAACACGCAGCCGGCCGAGGAGGTGCGGCCGCGCTGGCGCGGGAGTGAGGGGACTCCGAGAGTGTTGAGG-



GCCTCCTGAGCGGATGCGAGGCCTCTGACAGGGATGGAGGGGCTCTGAGGGGGATTCAGGCCCCTGACACTACG



CGATGACACAGAGAAGGATGGCAGGGGTCCCCAGGGG





1197
GCCCATGCGGCCCCGTCACGTGATGCAAGGATCGCCGGCCTTTCCGCCAGAGGGCGGCACAGAAC-



TACAACTCCCAGCAAGCTCCCAAGGCGGCCCTCCGCGCAATGCCGCTACCGGAAGTGCGGGTCGCGCTTCCG-



GCGGCGTCCCGGGGCCAGGGGGGTGCGCCTTTCTCCGCGTcggggcggcccggagcgcggtggcgcggcgcggg



gTAA





1198
GGGATTGCCAGGGGCTGACCGGAGTGTTGCTGGGAAGGAGCCTCAGCTCCGCTCCAGGTCCTCCAC-



CAGGTAGGACTGGGACTCCCTTAGGGCCTGGAGGAGCAAGTCCTTGCAGGTCCAGTTCCAGGCTGGTGT-



GAAACTGAAGAGCTTCCGCATCTTGCTTGGGTTGGTGGGCTCGGCCCGC





1199
GCCGGAGCACGCGGCTACTCAGGCCGAACCCCGACCCGGACCCGGCACGCGGCCTCGGCGAGGGCGG-



GCGGGAGTGTCCTCCTCCGGGACAGCCGGACTCCCGCCGACTTCTGGGCGGCGGGGAGGGCTCCAGGCCCG-



GCTCTCCCGGGCCCCCGCACGCGATGCGCGGCCCCTGCAGCTGCTCCGTGCCCCGAGACGCGCCCGAGGCCTCG



GACCTCCAAGCGGCCACCGCGC





1200
CCTCGGCGCCGGCCCGTTAGTTgcccgggcccgagccggccgggcccgcgggTTGCCGAGCCCGCT-



GACGTCAGCCCGGGTTTCCCCCCCCCACCGGGGCTTCCCCATCCCCCGAGGCTTCCCGGGAGGGCTGC-



GAGTCCGGGGAGCGTGCGGGGTCGCCACCATCGGGACCCCCAGAGGAGAGAGGACTTGGGGCGGGAGCCGCGCG



GGACGCTGTCCCCCTCCCGCCCCCCAccccatttacagattgggaga





1201
CACAGCGGCGGCGAGTGGGTCGTGCACGCGGATGCGGGGTGGGAGTGGGGGCGCACGCGCGGGCGT-



GGGCGAGCGGGCCCCGGCAGTGCACACACACGGCAGGGGCGGGCGACAGATGCAGTGCGTGCGCCGGAGC-



CCAAGCGCACAAACGGAAAGAGCGGGCGCGGTGCGCAGGGGCGGGCGCCCAGCGGGCTTGGCATGCGCG





1202
CACCTCGGGCGGGGCGGACTCGGCTGGGCGGACTCAGCGGGGCGGGCGCAGGCGCAGGGCGG-



GTCCTTTGCGTCCGGCCCTCTTTCCCCTGACCATAAAAGCAGCCGCTGGCTGCTGGGCCCTAC-



CAAGCCTTCCACGTGCGCCTTATAGCCTCTCAACTTCTTGCTTGGGATCTCCAACCTCACCGCGGCTCGAAATG



GACCCCAACTGCTCCTGCGCC





1203
AGACGGGGCCGGGCGCAGACGCCCCGCCCCGCCCTTGCACCCAGCCCGCTGAGTCCGCACCGC-



CCGCGGTCCCGGCCTGGGCTGTGCGCAGGAGATGGGCCAAGTGCAAGGTCCCTTGAGCGCAGCTGG-



GCGCACACCGCAGGACGGCCCCTTTCGCACCGGCTCGCGAGGGAGGCGCTGTGCCCCCCGTGTGCGGCTTCTCT



CACCCTGCCAGGCCTTCCCAGCTTCCCTGAGGTTGCCTGCTACACCCGCCCC





1204
GCATTCGGGCCGCAAGCTCCGCGCCCCAGCCCTGCGCCCCTTCCTCTCCCGTCGTCAC-



CGCTTCCCTTCTTCCAAGAAAGTTCGGGTCCTGAGGAGCGGAGCGGCCTGGAAGCCTCGCGCGCTCCGGAC-



CCCCCAGTGATGGGAGTGGGGGGTGGGTGGTGAGGGGCGAGCGCGGCTTTCCTGCCCCCTCCAGCGCAGACCGA



GGCGGGGGCGTCTGGCCGCGGAGTCCGCGGGGTGGGCTCGCGCGGGCGGTGG





1205
GCCCGAAAGGGCCGGAGCGTGTCCCCCGCCAGGGCGCAGGCCCCAGCCCCCCGCACCCCTAT-



TGTCCAGCCAGCTGGAGCTCCGGCCAGATCCCGGGCTGCCGCCTCTGCTGCCTTCCCTGAGCGGGAGCG-



GAGCGCAGAGAAAAGTTCAAGCCTTGCCCACCCGGGCTGC





1206
CGGCGGCCGGGTGACCGACCACTGCTTACCAGGAGGGGAGACTGGCAGGGGGGGCTCAAGGAA-



CATCTGGTGGGTGTCCCCTTCACAAGACTCGGCCTGCAGAGTTCGTGCAGGGAGTTCGCACATAGGAGAGCAC-



CGGTCCGGGAGTGCCAGGCTCGTGCCCGGCCGGGGAGAGGAGTGGGAGACTAAGTCGCAGGGCAAGGGCAACT-



GCA





1207
CCACCGGCGGCCGCTCACCTCCTGCTCCTTCTCCTGGTCCGGGCGGGCCGGCCTGG-



GCTCCCACTCCAGAGGGCAGCCGGTCCTTCGCCGGTGCCCAGGCCGCAGGGCTGATGCCCCCGCTCAGCT-



GAGGGAAGGGGAAGTGGAGGGGAGAAGTGCCGGGCTGGGGCCAGGCGGCCAGGGCGCCGCACGGCTCTCACCCG



GCCGGTGTGTGTCCCCGCAGGAGAGTGTGCTGGGCAGACGATGCTGGACACGATG





1208
CGGTCAGGGACCCCCTTCCCCCTTCAAGCTGACTCCCTCCCACAAGGCTCTTCAGATCTCGT-



TGTATTTTGGGATTGATGGGGGAAAAATCCAAATTTGTTTGTTTGCTTCCCTTTTTTCGGTGGTGGGGAAAG-



GTGGCAGGCTTTTTGGGACAACCATGGAGGGGTCCTCCGTCTCGGCCTCTTCGCATATCCCCCTCCGTGATCCT



GCCTTCCCCCCCCACCGAGCCCATCGCAGGC





1209
GGCCGAAGCTGCCGCCCCTCCTCCCAACCGGCGGGTCAGATCTCGCTCCCTTTCGGACAACT-



TACCTCggagaggagtcaaggggagaggggaggggagggggggagggggcaagagagagaggggggagaa-



gaggGATCTTCTCGCTTATTTCATTGTTCCCCCATCTTCAGGGAGCGGGGGCAGCGGCTCCTCAAGGCGGCGGG



CGCCGGCGTCTTCAGAGCGCCATGCGAACCGCGG





1210
GCGGCCTTGTGCCGCTGGGGGCTCCTCCTCGCCCTCTTGCCCCCCGGAGCCGCGAGCACCCAAGGT-



GGGTCTGGTGTGGGGAGGGGACGGAGCAGCGGCGGGACCCTGCCCTGTGGATGCCCCGCCGAGGTCCCGCGGC-



CGGCGGGGCCAGAGGGGCCCGGACGAGCTCTCCTATCCCGAAGTTGTGGACAGTCGAGACGCTCAGGGCAGCCG



GGCCCTGGGGCCCTCGGGCGGGAGGGGGCAGTTACACGGCAG





1211
CGCGGGAGGAGCGGCGAGGCCCTCACCTGGCGCCTTTTATGCCCGCGGCCGGTGGAGGGGGGAAGG-



GAGGAATGGTGTCAGGGGCGGATATCTGAGCCCTGAGGAATTTGCAGGCTCCTGAGAGCAAATATGG-



GCTCTCTCCCCATTGGTCAATTCCCTCCCCTCCCAGAGACCAGAGGCCCCTGCCCTCCAGAGGTGCCCCGCCCC



GGTCCGCGCAGAAGCTCCGACCCGCACTCCCCCA





1212
GCCCACCAGAAGCccatcaccaccagcaaagccaccaccaaagccaccacccaagccagcaccaag-



gccaccaccatatcctcccccaaagccactaccaAAGCTGCTGCTGCTGCTGCTGAAGCCACCGCCATAGC-



CGCCCCCCAGCCCGCAGGCTCCCCCAGAGGAGAAGCGGGAGGATGAGACAGACAGGCCGCCCCCGTAGGTGCTG



GGGGCGCGGCAG





1213
GGGCCATGTGCCCCACCCCACAGCCCCACCCTGCCCTGCCCACCACCCCAAGCCCGGCCCTGG-



GTCCCAGGGTCCCGCCAGGCCCGCTGGGTGGAATGTGGTCATGTTTCAGACTGCCGATG-



GCTTCCACTTCCCAGACAGGCCCAGACGGCCCCGCCAGCAGCC





1214
CCGCCAGCCCAGGGCGAGAGTCAGGGACGCGGCGTCGGGCGAGCTGCGCGGGCCCCGGGGGAG-



GCGCGACCCCGGAGGCACCTGTCCGGATCCCTCCCCGCCTTGCTCAGATCTCTGGTTCGCGGAGCTCCGAG-



GCGCGCTCGGCCCGAACCGCGCGACCCCCAAGTCGCCGCGCCC





1215
gccccctgtccctttcccgggactctactacctttacccagagcagagggtgaaggcctcct-



gagcgcaggggcccagttatctgagaaaccccacagcctgtcccccgtccaggaagtctcagcgagctcacgc-



cgcgcagtcgcagttt





1216
GTGGGGGTCCGCACCCAGCAATAACCCGGGTCTTCCCGCTCCGGCTCCTGCCCCAGTAAGCGTTG-



GACCGGGAGACGCAGTGCTCAGCATCGGTCAGCAGGGGGCGCAAGGACCCCGCCCCGCCGAGTCCGCGC-



CAAAGTTTCTCATCCTCCACCCGCCCACGCTCCGCACCCCCTCCGCGGCTGCCCAGCACCCCCACGGCCCCAGC



A





1217
gggcccccgggTTGCGTGAGGACACCTCCTCTGAGGGGCGCCGCTTGCCCCTCTCCGGATCGC-



CCGGGGCCCCGGCTGGCCAGAGGATGGACGAGGAGGAGGATGGAGCGGGCGCCGAGGAGTCGGGACAGCCCCG-



GAGCTTCATGCGGCTCAACGACCTGTcgggggccgggggccggccggggccggggTCAGCAGAAAAGGAC-



CCGGGCAGCGCGGA





1218
GCCTGCACAGACGACAGCACCCCCGGCGGGGGAGAGCGGCCCCAGCGGAGACTCGGCAGGGCTCAG-



GTTTCCTGGACCGGATGACTGACCTGAgcccggggcccgggcggcgctggccgggcACAGGATGCGCGGCCCG-



GAGAGCGCATCCCGGCCATCCGCCCGCGCTCGGCCCCGCAGCGCAGCTGCTGCAGATCCGCGGGGGCCGCCAC





1219
GGCCGCGCCGGGCTCAGGTTCCACCCCCGGGAGCGCGGGGCGGAGCCAGGCCGGCGCCGAGGCT-



CAGTGCCCTCCCCGCTCCGCGGCGCCGGCTGCGAAGTTGAGCGAAAAGTTTGAGGCCGGAGGGAGCGAGGC-



CGGGGAGTCCGCTCCAGCGGGGCGCTCCAGTCCCTCAGACGTGGGCTGAGCTTGGGACGAGCTGCGTTCCGCCC



CAGGCCACTGTAGGGAACGGCGGTGGCGCCTCCCC





1220
GGGGTAGTCGCGCAGGTGTCGGGCGCGGAGCCGCTTGGCCTCCTCCACGAAGGGC-



CGCTTCTCGTCCTCGTCCAGCAGCTTCCACTGCGCGCCCAGGCGCTTGGAGATCTCGGAGTTGTGCATCT-



TGGGGTTCTGCTGCGCCATCTGGCGGCGCTGAGCGGAGCTCCACACCATGAACGCGTTCATCGGCCGCTTCACC



TTCTCCAGGGGCAGCGTCCCGGGGGCCGCGGGGCTCCCAGCGCCCTCCCGCTCC





1221
tgcaggcggagaatagcagcctccctctgccaagtaagaggaaccggcctaaagga-



cattttctctctctctcctcccctctcatcgggtgaatagtgagctgctccggcaaaaagaaaccggaaat-



gctgctgcaagaggcagaaatgtaaatgtggagccaaacaataacagggctgccgggcctctcagattgcgacg



gtcctcctcggcctggcgggcaaacccctggtttagcacttctcacttccacga





1222
ccggaaatgctgctgcaagaggcagaaatgtaaatgtggagccaaacaataacagggctgccgg-



gcctctcagattgcgacggtcctcctcggcctggcgggcaaacccctggtttagcacttct-



cacttccacgactgacagccttcaattggattttctcc





1223
gcgtcggatccctgagaacttcgaagccatcctggctgaggctaatctccgctgtgcttcctct-



gcagtatgaagactttggagactcaaccgttagctccggactgctgtccttcagaccaggacccagctccagc-



ccatccttctccccacgcttccccgatgaataaaaatgcggactctgaactgatgccaccgcctcccgaaaggg



gggatccgccccggttgtcccc





1224
CCGGCTCCGCGGGTTCCGTGGGTCGCCCGCGAAATCTGATCCGGGATGCGGCGGCCCAATCGGAAG-



GTGGACCGAAATCCCGCGACAGCAAGAGGCCCGTAGCGACCCGCGGTGCTAAGGAACACAGT-



GCTTTCAAAAGAATTGGCGTCCGCTGTTCGCCTCTCCTCCCGGG





1225
CGTCGCCGGGGCTGGACGTTCGCAGCGGCGCTTCGGAAGGGGGCCCCGCGGGAGCAGCCGC-



CCGCGTCTCCAGCAGCTTCCCCTTGCCAGGCGCCGCGCGCGCCCGGTATCCCCGGGTGTCCACCTGTGCGT-



GGGGGGCTGTTTCCCGTCTGTCCAGCCGCGCCCACTTCTCAGGCCCAAAGGCCAGCAGGAAGGGTCCCGGAGGT



GGCTGGGGGCGTCCACCTGAGAAGCTCCGCTCTCGCTCAGACACCCCAC





1226
GGGCCTGCCGCCTCGTCCACCGTCCGTCGTGAGGCCGGCAGCGGACACGTGCTCATCCCACGGGGAG-



GCCCCGCGCAGCGCGGAGGACGCGCCTGAGAGAGAAAAGGGGTTCGGGAGAAGCCCGAGGACCCGGCCCGT-



GACTGGGCGCGCCCTATGCAAATGAGCGGGCGGGGCCCTCGTGTTGCTGAACGAGGGCGGGTTCGCGATGTAAA



TAAGCCCAGAGGTGGGGTCTTTGGAGAGCACTTAGGGCCCGGG





1227
GCACACCGCTGGCGGACACCCCAGTAACAAGTGAGAGCGCTCCAC-



CCCGCAGTCCCCCCCGCCTCTCCTCCCTGGGTCCCCTCGGCTCTCGGAAGAAAAAC-



CAACAGCATCTCCAGCTCTCGCGCGGAATTGTCTCTTCAACTTTACCCAACCGACGACAAGGAACCAGCCTC





1228
GCAAACCATCTTCCCCGACGCCTTCCACATAAGATGCCCTCCTGCGGGCCCTCACCTTTTGACACT-



GCCTCCCACCGCACTGGGGTCAACTCTCACCCAAGGGTTCCGCCACCTTCCACCACCAAACCAGCCTGTCCCT-



GCCACATGCCCCCCGGGCCCCAGCGCTCATCCTCTGCCCAGGCCCGCTCTTGACCCCTGACCCCGGCCTGAC-



CCCGC





1229
GGCCCTCCGCCGCCTCCAACCGCGCACCAGGAGCTGGGCAcggcggcagcggcggcagcggcg-



gcgTCGCGCTCGGCCATGGTCACCAGCATGGCCTCGATCCTGGACGGCGGCGACTACCGGC-



CCGAGCTCTCCATCCCGCTGCACCACGCCATGAGCATGTCCTGCGACTCGTCTCCGCCTGGCATGGGCATGAGC



AACACCTACACCACGCTGACACCGCTCCAGCCGCTGCC





1230
CACCACCGTGGCAAAGCGTCCCCGCGCGGTGAAGGGCGTCAGGTGCAGCTGGCTGGACATCTCG-



GCGAAGTCGCGGCGGTAGCGGCGGGAGAAGTCGTCGCCGGCCTGGCGGAGGGTCAGGTGGACCACAGGTG-



GCACCGGGCTGAGCGCAggccccgcggcggcgccgggggcagccggggTCTGCAGCGGCGAGGTCCTGGCGACC



GGGTCCCGGGATGCGGCTGGATGGGGCGTGTGCCCGGGC





1231
CACAGCCCCTTCCTGCCCGAACATGTTGGAGGCCTTTTGGAAGCTGT-



GCAGACAACAGTAACTTCAGCCTGAATCATTTCTTTCAATTGTGGACAAGCTGCCAAGAGGCTTGAGTAGGA-



GAGGAGTGCCGCcgaggcggggcggggcggggcgtggagctgggctggcagtgggcgtggcggtgc





1232
GCTTGATGCTCACCACTGTTCTTGCTGCTCAAGGGAAACCAAGTATATATTTGTGGATAG-



ATCCTAACTCAGATGATACTGTCAGAATATATAAGATTCCTATACCACATCCTGAACTCTGAAAGT-



TGCAGTTCTACGTAGAAGTTCACTGAGGGTTGTAAGAGTCAGAATGGACTCCATGGAAGTTATGGGGTGTGAAT



CAAACCTCACAGGTGAGTCAGTGGGGAGAAAGAAGCATGACA





1233
ggccaggcccggtggctcacacctgtaatcccagcactttgggaggccgaggtgggcggattgcct-



gaggtcaggagtttgagaccagcctggccaacatggtgaaaccccgtctctactaaaaataccaaaaattagc-



cagtcgtagtggtgggcacctgtaatcccagctattcaggaggctgaggcaggaggatcacttgaacccaagag



gcgggagttgcagtgagcagagatcacgccattgcaccccag





1234
GCGGGACGGGTGGCGGGAAGGAGGGAGGCGCGGCTGGGGAGAGCGCTCGGGAGCTGCCGGGCGCT-



GCGGaccccgtttagtcctaacctcaatcctgcgagggaggggacgcatcgtcctcctcgccttacagacgc-



cgaaacggagggtcccattagggacgtgactggcgcgggcaacacacacagcagcgacagccgggaGGTAAGCC



GCGTCCCAGCGGCTCCGCGGCCGGGCTCGCAGTCGCCCCAGTGA





1235
GCTTGGCCCCGCCACCCAGACCCCTCCCCCGGGGGCGCCCAGCTTGGCCTCTGGGTCCCG-



GCGCACGCGGACCCCAAGTCGGGGAGGCCGGGCTGACCGCGGCCGCCTCCCCGGCTCCGGGTAGGAGGTGG-



GCAGAGAAGGTGGGCTGAGGGGAGGAGAAACTGGGCTGCGGGGGTCCGGGAGGGTGGATTCCGAGAAACTATGT



GCCCAGCTGACCCTGCCCGCCCCGCCGCGGCCCTGCAGTCCCCGGGCCAG





1236
GCGGGGAAGGCGACCGCAGCCCACCTACCGCTGGACGCGGGTTGGGGACCCCGCCGCCCGGC-



CAGCTTTGTTcgggggcccgcggcccctcccgggcccccgcACCGCCTCGGGTGACCCGCGGT-



GTCCCAGCGCGTTGACGCAGCCTGTGATCCCTCGCGAGGCGAGGAGAAGGTCGGGGGCTTGGCTCTGCCTAATG



GCCGCCCGGGGA





1237
gcgcccaaccaccacgcccgcctaatttttgtatttttagtagagacgggttttcaccattttggc-



caggctggtctcgaaccccgacctcaggtgatctgcccaaaagtgctgggattacaggcgtcagccaccgcgc-



ccggccGGGACCCTCTCTTCTAACTCGGAGCTGGGTGTGGGGACCTCCAGTCCTAAAACAAGGGATCACTCCCA



CCCCCGCC





1238
AAAAGCCCCGGCCGGCCTCCCCAGGGTCCCCGAGGACGAAGTTGACCCTGACCGGGC-



CGTCTCCCAGTTCTGAGGCCCGGGTCCCACTGGAACTCGCGTCTGAGCCGCCGTCCCGGACCCCCGGTGC-



CCGCCGGTCCGCAGACCCTGCACCGGGCTTGGACTCGCAGCCGGGACTGACG





1239
CGCAGGTGCGGGGGAGCGTGCGGCCGGGTCCATGCGCCTGCGGGCGGCGGGGGGAGACGCGT-



TGCCTTCGGCCGGGACCACTGCACCTGCCCGCGTGGGTAATGCGCCCGC-



CGCAGACTCCGCGCACGACTCCGCCTGGGAGCGCGTTGGGGGCCGTTGGAGTCCAGCATGGCGCGGACCCCGG





1240
CCCGCCCACAGCGCGGAGTTTAGTCTGCGCGTGCCTCGCTCGAGAACGCGCTCGTGCGCATGC-



CCACAAAGGCCAAGGAGGGAGTGCGCAGGTCACGTGCGCCGGTGGTCAGCGCGCGCATTGCCTGCCCCG-



GAAGTGGTcggcgcgcggcgcggcgcgccTGGGCGCTAAGATGGCGGCGGCGTGAGTTGCATGTTGTGTGAGGA



TCCCGGGGCCGCCGCGTCGCTCGGGCCCCGCCATG





1241
GCAGGGGCCCGGGGGCGATGCCACCCGGTGCCGACTGAGGCCACCGCACCATGGCCCGCTCGCT-



GACCTGGCGCTGCTGCCCCTGGTGCCTGACGGAGGATGAGAAGGCCGCCGCCCGGGTGGACCAGGAGAT-



CAACAGGATCCTCTTGGAGCAGAAGAAGCAGGACCGCGGGGAGCTGAAGCTGCTGCTTTTGGGTGAGTCCAGGG



TCGGTGGGCGGTGGGTGGTGGGCAGTGGGCGGTGGCCAGCCGGCAGGG





1242
CATGACCGCGGTGGCTTGTGGGAAAAGTGGCTCGGAACCCCAAATCCCGGTTAGATTGCAGGCAC-



CGCCGGACGCTGGCTCCCGGAGGTTTTAGTTTTCCCTCTACCAGGAGTGTGAAGACACAGAGACTTAT-



TGCGCTGGCGAAGATGGCTGAGGCGAAGGCGTGTCCGA





1243
GCAGGTGCTCAGCGGGCAGACGCCCCGCCCCGCCCCGCCAGGTTCTGTTGGGGGCGAGGC-



CCGCGCAAGCCCCGCCTCTTCCCCGGCACCAGGGGCGGGCCCAGGTGCGCCCAGGGCCGGGGAGCGGC-



CGCGCAGGTGCCTGCCCTTTGCGCCTGCGCCCAGCTCG





1244
GGTGCGCCCTGCGCTGGCTAAAGTGCGCAAGCGCGCGAGGCTCGGGCCTTTCAAAccccggcgcgc-



cggcgccggcgTCGACACTGCGCAAGCCCAGTCGCGCCTCTCCAGAGCGGGAAGAGCGCTGCGTTCCT-



TAGCAACGAGCGTTTCCTCCAGCCCCGCCTCCCTCCGCCACACACAACCCCGC





1245
AATTTGGTCCTCCTGCGCCTGCCAAGATTGTCTgagtattgatcgaacccaggagttcgagat-



cagcttgagcaagatagcgagaacccccgcccctccacctcgtctcaaaaaaaaaaaaaaaTCGTCT-



CAGTAGCGAATAGTCTAACGGAGAATGACAGGGAAATTGGTGATCCTTTCTGGGCCCAAGAGTTAGAAATGGCT



TTGCAggccgggcgcggt





1246
GGCTTCCGCGGCGCCAATCTCCACCCGCAGTCTCCGCCTCCCGCACCTGTGGTCCGGGCCTCACG-



GTTTCAGCGCCGCGAGGCCTCACCTGCTGGTCTTGGAGCCTCAAGGGAAAGACTGCAGAGGGATCGAGGCGGC-



CCACTGCCAGCACGGCCAGCGTGGCCCAGGGCTCGCAGCACTTCCGGCCTCTCTGGCCCCGC





1247
GCCAGGAGAGGGGCCGAGCCTGCACAGGAGCTTCCTCGGTTTTCCGAGCGCCGGCCCCCCTTCTCT-



GCCTGGGAGGAGGTGGTTAGAGTCCCCTGGGTGTGTGCCCCGCAGAGGGAGCTCTGGCCTCAGTGCCCAGTGT-



GCAGACCAATGAGAGCCCCAGAGAGAAAGACGGTCATTTCCTCCCTGCATCTTCCCTTGGGGC





1248
cgagcgccggccccccttctctgcctgggaggaggtggttagagtcccctgggtgtgtgccccgca-



gagggagctctggcctcagtgcccagtgtgcagaccaatgagagccccagagagaaagacggt-



catttcctccctgcatcttcccttggggc





1249
GGTTGCGAGGGCACCCTTTGGCCCGGGGGCGCGCAGGAGAGGGCAGGGGCCAGGGGTTTCCTGGGC-



GAGGGCGCGGGGACGAGCAGGAAAAGGCCGGGGTGGGGGTGGAATTCCTCGGCGGGCAGGGGGCGCATGCGC-



CGGGCACCGTGGGGCGGGACGTGGCCCGGGAGGAGCTGGGGGGACTGGGTGGTGCACGTGCGGGC





1250
acccggacgcggtggcgcgcgcctgtaatcccagctactcgggagcctgaggcaggagaatcgct-



tgaatccgggaggcggaggttgcagtaagccgagatcgcgccactgcaccccagcctgggcgaca-



gagcaagactccTCGGTAAAGACACCACTTCGTCACCC





1251
CGCCGCCGAGCCTCAGCCACGCCTCTGTGCAGCGGGGAAGACTCCTCTCGCGCCTTCTCAGTCAGT-



CACGGATGATGCTGACCCAGCGCTCCGGGGCTTTCTACCAAGTAATCAGTCCAGACAAATGCCAAAACGAC-



CGCCACAAGGAGGACAACGGAAGTCCCGCCGCGACCGCGCGTGCGCTTACGGAAACACCACCTTTCGGAGGCCT



CATTGGCTGAAGGTCGCCGTCGCCCAACGCAGGCCATTCTGGGT





1252
gcagcctcaacctcctggggtcaagtgatcatcctggctcaaccacccaagtagccgggactacgg-



gtggccgccaccatgcccggataatttttttatttttgtggagatgggggtcccacgatgttgc-



ccagtccagtcttgaactcctgggctcaagtgatcctcccgcagcagcc





1253
CTTGCCGACCCAGCCTCGATCCCCTGCGGCGTCCAGGTCCCAATGCCCCAACGCAGGCCACCCCCG-



GCTCCTCTGTGGACTCACGAAGACAAGGTCCGGCCGCTCGGGCCGCGAGAGTCGCGCCATCACCAC-



CATTTTTCTGGATGCCCA





1254
GCGGCGTTCGGTGGTGTCCCGGTGCAGCCACGCGAGAGTAGAAGGGTGGAAAGGGGAGGTGCCCAGT-



GAAATGGAGCCTGTCCCGTGCACTTTCGGGCATTTCGAGCATCTTGTGGGCTCTCCCAAGTCGCGGC-



CCCTCCTCTGAGAGCCACAGTCAGGTCTGTCCTCAGGGGTCGAGGCGGCTGCGCTGGGGCCTCGGCCCGGGAGG



AGGCGGGGGGCACGGCCTTTCCATTTTCCCTGCTCCCCTCTGCAGAA





1255
CCGGACTCCCCCGCGCAGACCACCGTGCCAGGACAGCCCGCTCGGGAGTCGGGCCTGGAAGCAGGCG-



GACAGCGTCACCTCCCCGCAGCCGCCGGCTGGGACCCGCGGCCAGCCTTTACCCAGGCTCGCCCGGTCCCTGC-



CCGCATGGCGG





1256
ggccccctgcaagttccgcctcccgggttcacaccattctcctgcctcagcctccccagcagctgg-



gactacaggcacctgccgccacgcccggctaattttttgtatttttagtagagacagggtttcaccatgt-



tagccaggatggtctcgatctcctgaccttgtgatctgcccgcctcggcctcccaaagtgttgggattacaggc



gtgagccaccgtgtccagccTGTAACA





1257
GCCCAGGGGAGCCCTCCATTTGTAGAATGAATGAGAGTCCAGGTTATGAACAGTGCCTGGAGTGTAG-



GAACACCCTCCTTTGCCTCTTTGACAGGTCTGCATCATAACACtttttttttttttttgagacagagtct-



cactctgtcgcccaggctggagtgcagtggcacgatctcggccccctgcaagttccg





1258
CCGGCTGCAGGCCCTCACTGGTTGGGTCCGCCCGCGAGGGTGCCCTGGGCCCGGT-



GTCTCTCCTCCTTCTGAAGTTTGTTCCCATCCACCCGGCATCACCGACCGGTTTTATCCCGCTGAGGCCCTGG-



GAGATGGGTCTGGCGAGGCTCGTAGGCCGCGGATTGGCTGGCTGGGTGCAGGGGGGTGCGGGAAGGGGAGGATT



TTGCA





1259
GTCACACCTGCCGATGAAACTCCTGCGTAAGAAGATCGAGAAGCGGAACCTCAAATTGCGGCAGCG-



GAACCTAAAGTTTCAGGGTGAGATGCGTTGACTCGCGGTGGCTCAGAAGACCCACGCGCGAGCCCTG-



GCGCGTTCGGGCGGCCGGGGGCCCAGCTGCTCTGTGTGACGGAGGCAGCTTCCCCTGCAGCGTGTGTGATTGGG



GAGAGTGAAAAGGCAGCTTCCACTCGGGACCCGCGCTGCTGCCCACTC





1260
CCCTGCGCACCCCTACCAGGCAGGCTCGCTGCCTTTCCTCCCTCTTGTCTCTCCAGAGCCG-



GATCTTCAAGGGGAGCCTCCGTGCCCCCGGCTGCTCAGTCCCTCCGGTGTGCAGGACCCCG-



GAAGTCCTCCCCGCACAGCTCTCGCTTCTCTTTGCAGCCTGTTTCTGCGCCGGACCAGTCGAGGACTCTGGACA



GTAGAGGCCCCGGGACGACCGAGCTGATGGCGTCTTCGACCCCATCTTCGTCCGCAACC





1261
CCTGGGGGAGCGCGGTGGGGGTAAGATAAGGGATGGGGGCTCCGAGGGCTGGGAACTGCAGGAAG-



GAAAGAAGCGGCGGGGCCGCCCGGGTCAAGGGGCCACGTGGGGGAGGGCGGGCAGGCGGGACCGGGAGGT-



CAATAACTGCAGCGTCCGAGCTGAGCCCAGGGGAGCGGGCGAGGAGAAAGAAGCCTCAGAGCGCCCGGGAAGCC



TCGCGCGCCTGGGAGGCTTCCATCTCCCGGGACCCAGCTCTCAGCC





1262
GTGGGGCCGGGCGAGTGCGCGGCATCCCAGGCCGGCCCGAACGCTCCGCCCGCGGTGGGC-



CGACTTCCCCTCCTCTTCCCTCTCTCCTTCCTTTAGCCCGCTGGCGCCGGACACGCTGCGCCTCATCTCT-



TGGGGCGTTCTTCCCCGTTGGCCAACCGTCGCATCCCGTGCAACTTTGGGGTAGTGGCCGTTTAGTGTTGAATG



TTCCCCACCGAGAGCGCATGGCTTGGGAAGCGAGGCGCGAACCCGGCCCCC





1263
CGTCCAGGCTGTGCGctccccgttctcccctcctccccacttctccccacgcct-



tgctcgtctcccgccctcctccgacaaccgctcccctcaccctccacccctacccccgc-



ccctcctccttcctccccGGCATGCGCCATATGGTCTTCCCGGTCCAGCCAAGAGCCTGGAACCACGTGACCTG



CCCATTTGTATGCCGCGGAGCGCTCCATTCCGGCCCCTTTGTGGCCA





1264
GCGCGGCGGTGCAGCCTCTCCCGAGCGCGCTGGGTCGCCTCTGCTCGGTCTGGGGTCTGCCAG-



GCGCGATCCCCCCGGTGCAGCCGAGCCCCTCCGCAGACTCTGCGCAGGAAAGCGAAACTACCCGGCAG-



GAGAAAAGGCAGCGCTGGCGCCCGGCCCCCTTCCGCCCCCACCAATCACCGGGCGGCTCCGCGCTCAGCCAATT



AGACGCGGCTGTTCCGTGGGCGCCACCGCCTCCCTCTGCGGGCCGCTGCT





1265
aggcggcggcggtggcagtggcacccggcggggaagcagcagcCAAACCCGCGCATGATCTCGA-



GAGTTTCAGCAACATCCAGGGACTGGGCTCAGCCCCGGAGCGAGAGGGTCGTCCGCTGAGAAGCTGCGCCG-



GAGACGCGGGAAGCTGCTGCCATAAGGAGG GAGCTCTGGGAAGCCGGAGGACAGGAGGAGACGGGAGTCCAGGG



GCAGACGAGTGGAGCCCGAGGAGGCAGGGTGGAGGGAGAGTCAAGG





1266
GCGCGACCCGCCGATTGTGTCGAGTCAGCAGCGGCAGCGGGGACGCGCGAAGCCATGGCTCCCGC-



CCGCGCTCGGGAGGGCGCCGGGGGTCCTGCGCCTCCGGGAGGTTTGTGGCCGAgcgcggcgcggccccgagcg-



gccccgcagcgcccggctccccgccgcTCGCTCTCCAGGCGCCGACCCGCCTGCGTCGCCACCCTCTCGCCGCT



CCCTGCCGCCACCTTCCTCCCGCCCGGGTGCCGGGCGTCCGCT





1267
CGCGGACGCCGCTCTGCACCTGTTGCCGCCGTCACTCATCCCGCCAGGCGGGCGGGGCCGCGCGGGT-



GGCTTGGTCAGGACCTGCCATTCAGCCCAGTCGGGCTCCGGTGCTCGCCCCGGACGGCGCCCCAAGCGG-



GTCCCGGCCCCGCTGAGCACCTCCAGCAGTGGCACAGCCTCTGGAGGGGTCCGGGACGAAGCCACCCGCGCGGT



AGGGGGCGACTTAGCGGTTTCAGCCTCCAACAGCCTTGGGATCGC





1268
tgaacccgggaggcggaggttgctgtgagccgagatggcaccattgcactccagcctgggcaacaa-



gagcgaaactccgtcccccgaacaaaaaattcaaatgggaaagagaggcagatggcagagaacaggggaggg-



gctgggcaccgtggctcatgcctgtaatcccagcactttgggaggccaaggcgggtgga





1269
CTCGGCGGCGCGGGGAGTCGGAGGACGCAGCCAAGCGGCGGCGGCGAGGAGGGTCACAGCCGGAAA-



GAGGCAGCGGTGGCGCCTGCAGACGCCGCGCAGCCCGGGCAGCCCCACAGCGCAAGCTGGCTGCCGCGGCG-



GCGGGGGCTTTATCGGCGGCGCCGCGCGGGCccccgccccttcctgccgcccccgcccccggcccgccttgccc



cgccttcccgccg





1270
aggcggccacgggagggggaggggctggcaacggcgccgtgggggcggggctcgctttgtgcaag-



gtccgcgctgattgggccgtgggcgcgcgggtcccggcctgcgtcgtgggactggcgtttttggcgccggct-



gtgaggggagcgcgggggtggtggaatcgggcggtctccggttcgccaatgtggctgggtccgtaggcttgggc



agccttggagttcctcagagaccccgcgctcggtcccggcacgc





1271
GACCCGAGCGGGGCGGAGAGTGGCAGGAGGAGGCGAATCTCCGCGCTCCGGCGAACTTTATCGGGT-



TGAAGTTTCTGCTGTCGCCTCCCCTTTGCGTGCGGAGCTGGGCTTTGCGTGCGCCGCTTCTGGAAAGTCG-



GCTCCAGTCATATCCCTGGGCGCTGCCTGCGGCCGCTCCTCCCGCGCTTCTCACGGCACCTGACACGCGGAGGC



GGCGGCCGAGGGTGGGGTGCCGGCCACCACCACCCTTGGCGTGGG





1272
AGCACCTggggcggggcggagcggggcgcgcgggcccACACCTGTGGAGAGGGCCGCGCCCCAACT-



GCAGCGCCGGGGCTGGGGGAGGGGAGCCTACTCACTCCCCCAACTCCCGGGCGGTGACTCATCAACGAGCAC-



CAGCGGCCAGAGGTGAGCAGTCCCGGGAAGGGGCCGAGAGGCGGGGCCGCCAGGTCGGGCAGGTGT-



GCGCTCCGCCCCGC





1273
CCGCTCGGGGGACGTGGGAGGGGAGGCGGGAAACAGCTTAGTGGGTGTGGGGTCGCGC-



ATTTTCTTCAACCAGGAGGTGAGGAGGTTTCGACATGGCGGTGCAGCCGAAGGAGACGCTGCAGTTGGA-



GAGCGCGGCCGAGGTCGGCTTCGTGCGCTTCTTTCAGGGCATGCCGGAGAAGCCGACCACCACAGTGCGCCTTT



TCGACCGGGGCGACTTCTATACGGCGCACGGCGAGGACGCGCTGCTGGCCGC





1274
ACCGCCAGCGTGCCAGCCCCGCCCCTACCCACCAGTGTGCCAGCCCCGCCCTTCCCCACGTcgc-



cgcgcgcccgggggcggggcctggcgcgcaccgcccgcgcACGGCGAGGCGCCTGTTGATTGGCCACTGGGGC-



CCGGGTTCCTCCGGCGGAGCGCGCCTCCCCCCAGATTTCCCGCCAGCAGGAGCCGCGCGGTAGATGCGGTGCTT



TTAGGAGCTCCGTCCGACAGAACGGTTGGGCCTTGCCGGCTGTC





1275
ATTCTTggccgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggtgggtggat-



cacctgaggtcaagagttcgagaccagcctggccaacatggtgaaaccccgtctc-



tactaaaaatacaaaaattagccgggcgtggtggtgggcacctgtaatcccagctactcagaaggttgaggcag



gagaatcgcttgaacccgggagaaggaggttgcagtgagccgagatcgcgccattgcac





1276
CGCTTCCCGCGAGCGAGCCGCCCAGAGCGCTCTGCTGGCGGCAGAGGCGGCGGCGAGGCTG-



GCGCGCTTGCCGCCGTCTGCTCGCCCCGCGGAGGCGACCTGGGCAGACGCTGCTGG-



GAACTTTGAAAAACTTTCCTGGAGCCAGGCTTGCCGCAGATTCGAGGGGAAGCCTCGGCCGCGTCCCACCCCCT



CCCAAATCCGAGTCTGCGGAGCCTGGGAGGGCTCCCAGCTTCCTATCCAAACCGCGCCGGGGCA





1277
AGCCGGCGCTCCGCACCTGCCCCTCAGCGCCTGCCGTCCGCCCCACCGCCGCGGCGC-



CCCGCACTCCTGGGCGGGCCAGGGGAGCGGGCTGGGCGGGCGATCGGGCACGCGGGATCCCTGGTCGAGC-



CCCCTTTCCTCCCGGGTCCACAGCGAGTCCCCTGAGGAAGGAGGGACCTGGGAGGAAACCACCCTCTGGGGCGG



CTCCGGCCTCCAGCCCCCGCCCCGTCTCATCGCGCCGGGCGCCCGGTGCGCCTG





1278
CGGAGCGCGCTTGGCCTCACAGGACAGTGGGTGTGGCTGGGGTGACGGGGCAGGGTGGGGAAGACTG-



GCCTAACACCAGCGCCCTCTGCCCCATGGCTGGCCAGGGACCCGCGAGTCCCTGGACACGCACTGGCCAACGC-



CAGACCCCATCTCATCGGGTGGGGAAGTCGCGGGGACACTGTCAGGGCGCCGAAGTCCGGACCCGGCTCAGAGG



CGGTGGCAGGTGAATTGCTGCGGCGCCGGG TAGG GGCGGGC





1279
ggcctcgagcccacccagacttggccaagcagccctcggccagaccaagcacactccctcggag-



gcctggcagggcccctgctttaccctgccccccacgccccgccccgacccgaccctcccaggcagcccct-



cagcgtctgccgcccgcccttgggcctttccggccagcccctccctccgcccacgcccagaacagcccatgctc



ttggaggagagcaggtgggcttgaccgggactggcccctcaccgcgg





1280
GCCGCGCCGTAAGGGCCACCCCCAGAGGCCGAGGAGGTGGGGCTGGCCTGGCTTTCTGGCCAGGT-



GGGGCTTGTCCAACCCCACAAACATCAGGGCTCACCCTGGATGTGGAAGAGAAGGAGCGAC-



CCCCAAAACGAAGCGGCTGGATCTGACCTTCCAAGGCCTGTTGGCGACGCAGGGCCCCCAGGAGGCAGAGCGCG



CGCCTGGCCCGGGCGATGGGCCTCCCGTCCCCCCAGGGCTGCCTCCCCGCCGGTG





1281
CGGCGGTGGCGGTGGGTCGGCGACCGGCGGGCCGAAGACTGGAAGCCCGGGCCGCTGAG-



GCTCCGCAgccccctccgcgccgccccggcccgcccccgccgcgccgccccttccctccccgcgcccgc-



cccTTCTTCCCCGCAGGGTCAGCGCTGGGGCTCCGGCCGTAGAGCCACGTGACCCTGGCAGGCCCTGCTCGCGG



GGCTTGGCGACAAGGACGCACGACACGGGGCGGC





1282
ACCTGCCCAGTTACTGCCCCACTCCGCGGAATAAGCTCTTACCCAC-



CGCTCCTCTTCTTCAATTCATTTCTGTTATGGAACTGTCGCGGCACTACAAAGTCTCTATGTAGT-



TATAAATAAACGTTATCTGGAAGAGCAGCCGACAACAACTTTCAAGATCTCCAATTCCCCGAC-



CCCACACTCCAACTGACGCC





1283
CCAGCGCCCGAGCCGTCCAGGCGGCCAGCAGGAGCAGTGCCAAACCGGGCAGCATCGCGACCCT-



GCGCGGGGCACCGAGTGCGCTGCTGTGCGAGTGGGATCCGCCGCGTCCTTGCTCTGCCCGCGCCGCCACCGC-



CGCCGTCTCCCGGGGCCCCCGCGCACGCTCCTCCGCGTGCTCTCGCCTACCGCTGCCGAGGAAACTGACGGAGC



CCGAGCGCGGCGGCGGGGCTCAGAGCCAGGCGAGTCAGCTGATCC





1284
CTGCTGCTGCCCGCGTCCGAGGCTCGCGGGCGGCGGGCCCGGGTGAGTGCACACCCGGCGCGCTGC-



CGGGCTCCCGGATGTGTCACCTTGTCCCGCTGCAGCCGAGATGCCGGGGGAGCGGGGCCTTCCACAC-



CCCCTCCGTGGGTGTGTGGTGAGTGTGGGTGTGTGCGCGTCTCCTCGCGTCCCTCGCTGAGGTGCCTACTGTGT



CTGCATGGGTTGGGTCCCGCGCGATG





1285
ACTGCTTAGGCCACACGATCCCCCAAGCCTGGGCTGCCAGACGTCGCCATCATTGTTCCATGCAGAT-



CATGCCCATCCTGTGCAGAAGGTCACTATAGGAACACATGGCACAGGGAAGAAAACGCCCATAGAAATTCA-



CATGGTGCTTGTCTAAACCGAAGGCAGGTGAGATCCACCCACTG





1286
GCCGGACGCGCCTCCCAAGGGCGCGGGTCCGAGGCGCAAGGCGAGCTGGAGACCCCGAAAACCAGG-



GCCACTCGGGGAGTGTCAGGAAGCACGACTGGGCGCCTTAGGACGTCCGGGCAGACGCGGCCCCCGAGGAGC-



CCCAGAGGAGCCCCAGAGGAGCCGCCTGACCCGGCCCCGACGTGCGCGATCGAGCCCGGGCTCGCCAAAGCCCC



CGCGCCCCTCCGGCCCGGACAGGCCGAGTGGACATTGTCGGAG





1287
CGGCCAGGGTGCCGAGGGCCAGCATGGACACCAGGACCAGGGCGCAGATCACCTTGTTCTCCATGGT-



GGCCATTGCCTCCTCTCTGCTCCAAAGGCGACCCCGAGTCAGGGATGAGAGGCCGCCCGAGCCCCG-



GATTTTATAGGGCAGGCTC





1288
ccgcccgcccCACAGCCAGCGGCTCCGCGCCCCCTGCAGCCACGATGCCCGCGGCCCGGCCGCCCGC-



CGCGGGACTCCGCGGGATCTCGCTGTTCCTCGCTCTGCTCCTGGGGAGCCCGGCGGCAGCGCTGGAGCGAG-



GTAAGCGCCCCGAGGGGCGGGGCGGGCAGGGGGCAAAGTTGCCGGGAGAGCGGGGCAGCCAGGGGTCGGGGCTG



ACCAGGGCGACTCAGGCACCACCCGCCGGGA





1289
GCGCCCCAGCCCACCCACTCGCGTGCCCACGGCGGCATTATTCCCTATAAGGATCTGAACG-



ATCCGGGGGCGGCCCCGCCCCGTTACCCCTTGCCCCCGGCCCCGCCCCCTTTTTGGAGGGCCGATGAGGTAAT-



GCGGCTCTGCCATTGGTCTGAGGGGGCGGGCCCCAACAGCCCGAGGCGGGGTCCCCGGGGGCCCAGCGCTATAT



CACTCGGCCGCCCAGGCAGCGGCGCAGAGCGGGCAGCAGGCAGGCGG





1290
cgtgctgggcgcaggggaaacagcgacgcacgggacaaaACAAGCTTGCAGAACAGCAGGGGGCAGA-



GAGGCTGTAAACAAGCCAACGGGCTGCACTTGTAGCGGTTCTGTTGCCAATGCCATTCAGACCCCAGTCCGG-



GATTCCGCGCTCGGGGTGCGAGAGGCCGCTCCcggggaggggcgggacccgggcggggcgggaggggcggggcg



CCCGGGCCTATTAGGTCCCGCGCCGGCAGCC





1291
GCGCACGCGCACAGCCTCCGGCCGGCTATTTCCGCGAGCGCGTTCCATCCTCTAC-



CGAGCGCGCGCGAAGACTACGGAGGTCGACTCGGGAGCGCGCACGCAGCTCCGCCCCGCGTCCGACCCGCG-



GATCCCGCGGCGTCCGGCCCGGGTGGTCTGGATCGCGGAGGGAATGCCCCGGA





1292
GGTGAGTGCGGCCCGGGGAGGGGAGGGGACCAGGGCGACCGGAGCCCCCAGCGATCCCGCCTG-



GAGCGGCCGCCAAGCTCCCTCGGGCACCCGGGTTCAGCGGGTCCCGATCCGAGGGCGTGCGAGCT-



GAGCCTCCTGGACCGGGTCCGCCGCGGACCTCGGCCTGTCACCTGAAGGTGCCGCGTGGTCTCTGAGGACGTCT



GTCGACGAGCAGGGGCCGCCGCCA





1293
GGCCGAGAGGGAGCCCCACACCTCGGTCTCCCCAGACCGGCCCTGGCCGGGG-



GCATCCCCCTAAACTTCGGATCCCTCCTCGGAAATGGGACCCTCTCTGGGCCGCCTCCCAGCGGTGGTGGC-



GAGGAGCAAACGACACCAGGTAGCCTGCCGCGGGGCAGAGAGTGGACGCGGGAAAGCCGGTGGCTCCCGCCGTG



GGCCCTACTGTgcgcgggcggcggccgagcccgggccgcTCCCTCCCAGTCGCGcgcc





1294
CCAGCGCCGCAACGCCCAGGGTGTGGGGCGGAGTAAGATGTGAAACCTCTTCAGCTCACGGCACCGG-



GCTGCAACCGAGGTCTGAATGTTGCGAAAGCGCCCCAGACGCCGCCGCTGCTTTCCGGCCGCCCCCTCGGC-



TACAGCCGCCATTTCCACGCTCCACCAATCAAATCCATTCTCGAGGAAGACGCACCGCCCCCACACGCCCCGAC



CAATCGCTCGCGCTCTGGTTGCGCTGGCGCC





1295
CCACAAGCGGGCGGGACGGCTGGAGACTGCCGGGACAGCGGCTGCCGGTGCTACGCGGGTGGTGG-



GCGGCCCGGAAATGAGCGCCCTCCGGGGACAGGGGGCTCTGCGGGGCGGCGACAGCTG-



GATTCCCAGCGCGCACAAAGCCTGCGGGAGGATCCATTGTAGCGGTCGCTCCTCCCCGCTTAGCGAGGGCGGGC



GCAGGGGCGGGGGATGTCGAAGGGTCAGGTTTGTCCAGGCCGCGCCACCTTCG





1296
CCTCTGGACAACGGGGAGCGGGAAAAAAGCTACGCAGGAGCTTGGATCGGGCGAAGCTCGCGG-



GAAACCGCTCTGGGTGCGCAGGACAAAGACGCGGGGACAGCGGGGAGGGCCGGCCGCAGCCTGCCGGGCTGC-



CCCCACGGCGCGGAACGCGCGCAGCAACCTCCACCAGGCCTCCGCGTCTGGACTCCCGCCCTGCCTCTGGGCCT



CCTCCGCCCACCGGCGGCGTCTCCCGCGAAGCCCGCTGGG





1297
GCGGGTTCCCGGCGTCTCCAAAGCTACCGCTGCCGGAAGAGCGCGGCGCCCGACGGAGCCGTGTG-



GAGGCCAAAACTCCTCCCGGAAGCCGCTACTGGCCCCGCTTGCCAGGCCCAGCGTCTTTTCTGCATAGGAC-



CCGGGGGAAGCCGGGAAGCCGTTAGGGGGCGGGGCAAGCGGG





1298
CGCCGCCCGTCCTGCTTGCTGCTGGGTCCGGTTGCCGAGGCGGAAAAGTCGCAAGCTCCTTCAGT-



CAGTCTTCTTCCTCAGCTCCTTCCGACTCCGGAAGCTGCTGTTTGGGCCCAGGCTCCCTGCATCCGAGAGC-



CCTGGGCTGACTGCTTCTGAGGCCCCGCCCCACTACTGCCTGCAGCGGGCTTCCTTACTCCGCCTGCTGGTTCC



TACTGGAGGAGAGGCCAGCATGCTTGTCAGGCACCAGCAGGTGGA





1299
CGCGCGGCCCTCCTGCACCTCGGCCAGCACTCGTAGCGCGCTGGGCGAGCCGGACCGGAAGT-



TGAAGAAGTGAAGCGCCGCGCGCGCCGCCTGCTGCAGGAGCCTGCGCGGGACCCCAGCATCCTGAGGCTGC-



CCAGGGTCGTCGGGGTCCCCGGACCCCGCGGGCGCCGCCACCGGGGCGAGCAACAGCAGCAGCGCGAGCAGCGG



GGCGGTGGGGCGCGGGCCCCTGGGCCCGGACCAGGGAGCAGGCAGCCG





1300
GGCGGGGCAAGCCCTCACCTGCGCCAATCAGGGTGCGGAGTAGGCCCCGCAGGCGCCTCACCCAT-



TGAGGGGGCGGGCTGACAGAGCAGAGGAAGGAAGGGGGTGAGGGGCCTGTGGTGGGGATCCTGGGGCTGTCGG-



GCTGAGTATGCCGTGTGGGTGGAGAGGAAGCCTCGGGGAAATCGCCCAGGTGAAGGGAGGGCTTGGTGTGGGGA



CTTGCACTGGGCAGAGGGGCAGCTTCCCTGAGAGCAGCTAAGC





1301
GGAGCGCCCCCTGGCGGTTTCAGGGCGGCTCACCGAGAGGGCGCCGGGAGCGCCCGGTTGGG-



GAACGCGCGGCTGGCGGCGTGGGGACCACCCGGCAGGACCAGGCACCAGAGCTGCGTCCCTGCTCGC





1302
CGAATGGTTCGCGCCGGCCTATATTTACCCGAGATCTTCCTCCCGGACGGCAAGGATGTGAGGCAG-



GCGAGCCGGACGCCGCTCGCAGCACCGGAGAGGGCGCACTGCAAAGGCGGGCAGCAGACCGTGGAGAGCCCGG-



GAGCGGAGCTGGACACCGCCTCGGAGGGAAGAAATGAGGTAGCGGCGGTTCCCGGACCCGGCCATGCCCGTCCC



CTGTTCTCGGAGCCCAGCGCCGTCTCGGCCAGGCCAGCCCGG





1303
TTCCGCCGGCTGGGCCCTCCGTCTACCCCCAGCGGCGAggggcggggccggcgcgggcgcAGAG-



GCGTCACGCACTCCATGGTAACGACGCTCGGCCCGAAGATGGCGGCCGAATGGGGCGGAGGAGTGGGT-



TACTCGGGCTCAGGCCCGGGCCGGAGCCGGTGGCGCTGGAGCGGGTCTGTGTGGGTCCGAAGCGTTTTACTCCT



GTTGGGCGGGCTCCGGGCCAGCGCCACATCTACTCCCGTCTCCTTGGGC





1304
CTCCGGGTcccccgcgtgcccggcccgccccggcccgcTTCCCGGGCGCTGTCTTACTCCGGGC-



CCGGGGCGCCTGCTCCGCGCCGCGTCTGCGAACCGGTGACCTGGTTTCCCCTCCAGCCCTCACGGCT-



GTCCGACTTGCGCGGCGGTGGCGGCGGCGGCCAAGAGCAGGCAAACCCGGCTCCGCCAGGGGCGCAGCGAGGAA



ATGGCCTCCTGGCGCACACCCCGCCGCCGCCGCCAGCCATCGCCACCGCC





1305
CAGCCCGGGTAGGGTTCACCGAAAGTTCACTCGCATATATTAGGCAATTCAATCTTTCATTCTGTGT-



GACAGAAGTAGTAGGAAGTGAGCTGTTCAGAGGCAGGAGGGTCTATTCTTTGCCAAAGGGGGGAC-



CAGAATTCCCCCATGCGAGCTGTTTGAGGACTGGGATGCCGAGAACGCGAGCGATCCGAGCAGGGTTTGTCTGG



GCACCGTCGGGGTAGGATCCGGAACGCATTCGGAAGGCTTTTTGCAAGC





1306
GGCGGAGAGAGGTCCTGCCCAGCTGTTGGCGAGGAGTTTCCTGTTTCCCCCGCAGCGCTGAGT-



TGAAGTTGAGTGAGTCACTCGCGCGCACGGAGCGACGACACCCCCGCGCGTGCACCCGCTCGGGACAGGAGC-



CGGACTCCTGTGCAGCTTCCCTCGGCCGCCGGGGGCCTCCCCGCGCCTCGCCGGCCTCCAGGCCCCCTCCTGGC



TGGCGAGCGGGCGCCACATCTGGCCCGCACATCTGCGCTGCCGGCC





1307
CCTCACCCCAGCCGCGACCCTTCAAGGCCAAGAGGCGGCAGAGCCCGAGGCCTGCAC-



GAGCAGCTCTCTCTTCAGGAGTGAAGGAGGCCACGGGCAAGTCGCCCTGACGCAGACGCTCCACCAGGGC-



CGCGCGCTCGCCGTCCGCCACATACCGCTCGTAGTATTCGTGCTCAGCCTCGTAGTGGCGCCTGACGTCGCGTT



CGCGGGTAGCTACGATGAGGCGGCGACAGACCAGGCACAGGGCCCCATCGCCCT





1308
CGATGACGGGATCCGAGAGAAAGGCAAGGCGGAAGGGGTGAGGCCGGAAGCCGAAGTGCCGCAGG-



GAGTTAGCGGCGTCTCGGTTGCCATGGAGACCAGGAGCTCCAAAACGCGGAGGTCTTTAGCGTCCCGGAC-



CAACGAGTGCCAGGGGACAATGTGGGCGCCAACTTCGCCACCAGCCGGGTCCAGCAGCCCCAGCCAGCCCACCT



GGAAGTCCTCCTTGTATTCCTCCCTCGCCTACTCTGAGGCCTTCCA





1309
CCGCAGGCCGCGGGAAAGGCGCGCCGAGTCCTGCAGCTGCTCTCCCGGTTCGGGAAACGCGCGGG-



GCGGGGGCGTCGGGCTTGGGACAGGGGAGGATACCAGGGCCACCTTCCCCAACCCAGGCCGCGGGGGCCCG-



GCCTCCCCGATGCAGACCACAGCGCCCTCACGGGCTGCCCTCAGGCCGCGCAGCGGGCAGCCGCCAGCCGTCAC



CCCGGGGAGCGTCCGTGGGGTGCCCAGGCA





1310
GCCCCAGTCCACCTCTGGGAGCGCCTGCGCCGCTCCGCGGAGAGTCCGTGGATCTCACAGTGAGC-



GAGTTGGGACCCAGGGAGGGGAAAAGAGAGGACCCCGGCGAGCCATTGCTGGGGCGGCGGGCTGGAGGGT-



TATCTGGGAAGTCAGCCCCGGCCTCGGTCCTCTCCACGTTGCTGCCTACGCGTGCTGCCCGGACGTAGGGC





1311
CTTGGCCGCCCCCGGGATGGGGCGAGGGGTTCCCGAGGGCTtgggagggcggcttgggaga-



gagctccggctccggaacgaggtgtcctgggaacactcccgggtctgtaacttcggacaaat-



cacgctcgctttcccggcctcagtgtgccgttctgtaacttgggtctaaCCCCGGCTCGCACACACGGCGGGGA



CGCGCACAG





1312
CCTCCATGCGCAATCCCAAGGGCGGAGAGGAATTTCAGCAGCTACGAGCAACAGAAAGGAAACGAGA-



GAGTAGCCAGACTCTCCGCGCATGGAGCCGACGGCACCCACCAGCACACCGCCGGCGCCCCCAGCCACTACT-



GCACGTCCGCccccgccccgccccgctccgcccGGCGCACCTGATGCCCAAACTGGTTGCACGGGAAGCCGAGC



ACCACCAGGCCCCGGGGTCCGAGGCGCCGCTGCA





1313
gcggcgactgcgctgccccttggctgccccttccgctctcgtaggcgcgcggggccactact-



cacgcgcgcactgcaggcctttgcgcacgacgccccagatgaagtcgccacagaggtcgcaccacgtgtgcgt-



ggcgggccccgcgggctggaagcggtggccacggccagggaccagctgccgtgtggggttgcacgcggtgcccc



gcgcgatgcgcagcgcgttggcacgctccagccgggtgcggccctt





1314
GGGCTTGCCTCCCCGCCCCTACCTTCCAGGATGTTGACAGCTGGGAATGAAAGGCAGAGGGAGG-



GAGCGCGGGGCCGGAGCGCCGCCTGGGAGTGTGCCCACTGGGTGGCCGCCTGAGGGACCCGGGAACAGAGG-



GCAAAAAGTCCTGTGACCGGACAGAGCAGAGCGGGGACTGCAATTCCCAGAAGACCCCACGGTAGGGGCGGGAC



CCAAGATGGCCGCTTGTCTGGGGACAGGAGCGGAGGCCAATACGCG





1315
GCGGCCCAAGGAGGGCGAACGCCTAAGACTGCAAAGGCTCGGGGGAGAACGGCTCTCGGAGAACGG-



GCTGGGGAAGGACGTGGCTCTGAAGACGGACAGCCCTGAGGAACCGCGGGGCGCCCAGATGGAACTCGT-



TAGCGCCCCGAGTGCAGACAATCCCGGAGGGGGAAAGGCGAGCAGCTGGCAGAGAGCCCAGTGCCGGCCAACCG



CGCGAGCGCCTCAGAACGGCCCGCCCACCC





1316
ctgcgcggcTGGCGATCCAGGAGCGAGCACAGCGCCCGGGCGAGCGCCGGGGGGAGCGAGCAGGG-



GCGACGAGAAACGAGGCAGGGGAGGGAAGCAGATGCCAGCGGGCCGAAGAGTCGGGAGCCGGAGCCGGGA-



GAGCGAAAGGAGAGGGGACCTGGCGGGGCACTTAGGAGCCAACCGAGGAGCAGGAGCACGGACTCCCACTGTGG



AAAGGAGGACCAGAAGGGAGGATGGGATGGAAGAGAAGAAAAAGCA





1317
CACCGCCTCCGGACCCCTCCCTCATCAGAAAGCCCAGGCTCCGCTCGTAGAAGTGCGCAGGCGTCAC-



CGCGCATCCAGGAGCCACGTGTCAGGAGTCACGTGTCAGGTGTCACGTGTCAGGCGTCACGTGGCTGGAGGC-



CGTTGGAGCGCCTGCGCAGCTTTTCCGCACGCGCC





1318
CCTTCCAGCCACCCCGCCCTGGGCGCCTCTGGCGCGCTCTGATGACGCTCCAAGGGAAGAGGAAGT-



GGGGATCGGCGAGCGGGTGGGTGCGCCTCGGGCCGCGGGACTCGCAGCCGCCACCGCCGCTGCCGCCTCTACG-



GCCGCGTCAGAACTGAAGAGAGGAAGGGGAGGAGCCGAGTCGAGCCTAAGCTGCCGCCCGATCTTACCCCTGAC



CCGAGGGCGGCCTGGA





1319
CGGGACACCGGGAGGACAGCGCGGGCGAGGCGCTGCAAGCCCGCGCGCAGCTCCGGGGGGCTCCGAC-



CCGGGGGAGCAGAATGAGCCGTTGCTGGGGCACAGCCAGAGTTTTCTTGGCCTTTTTTATGCAAATCTGGAGG-



GTGGGGGGAGCAAGGGAGGAGCCAATGAAGGGTAATCCGAGGAGGGCTGGTCACTACTTTCTGGGTCTGGTTTT



GCGTTGAGAATGCCCCTCACGCGCTTGCTGGAAGGGAATTC





1320
CCTGGGTTCCCGGCTTCTCAGCCACTGGAGCTGCCAGTCTCAAATTACCGGAGGGGAGGGAGGGCAG-



GCCTGGATCTCAGGATCTCGGTCCTGCATGCAATGCAAGCCTGAGCTCTCCCGCCATAAGGCTGCAGCGGTGT-



GGGCTCCTTGTGCCCAGATCCTTTGTATTCATAGGGGGAAGTGGAAGACCACGCTGCC





1321
GGCGGTGATGGGCggaggaggaggaagaggaggaggaggaagaggaggagggggaAAACGATGACAG-



GAGCTGGGGCCGGGGGGGGAAATTGGGGGGACGCGGGCGGAGGCGCGGTGCGCGCCGGCGGTGGCGGGCAC-



GAGCCCCGCGCCTGGAGGAGGAGGAGTCAGGCCGGGTAGGAGGGCTAAGGAGGTTCCCGGGAAGGCAGGGcccc



ccctcccccccctcccccccccccACACACACACACTCCCCTG





1322
CAGCCCGCCCGGAGCCCATGCCCGGCGGCTGGCCAGTGCTGCGGCAGAAGGGGGGGCCCGGCTCTGC-



ATGGCCCCGGCTGCTGACATGACTTCTTTGCCACTCGGTGTCAAAGTGGAGGACTCCGCCTTCGGCAAgccg-



gcggggggaggcgcgggccaggcccccagcgccgccgcggccACGGCAGCCGCCATGGGCGCGGACGAGGAGGG



GGCCAAGCCCAAAGTGTCCCCTTCGCTCCTGCCCTTCAGCGT





1323
GCCCGCGGGGGAATCGCAGTGAGCAGCGCGGGGCGAGGCCGCCGCGGACGCCCCGTCGGATGTGC-



CCTTCGCTGGGCCGAGCGGCGCAGGGTTGGAGAGGGAAGCGCTCGTGCCCACCTTGCTCGCAGGTGCCCT-



TGCTGACCTGGGTGATGGCCTTCTCCCCGCGGCTCTCGGCCCTCTGGCTGGCGGCGCGCAGCTGGCAGCCGCTC



GGGTAGGTGGTGCCGTCGCTGCCGCACACCGGG





1324
GCCGCGAGCCCGTCTGCTCCCGCCCTGCCCGTGCACTCTCCGCAGCCGCCCTCCGCCAAGC-



CCCAGCGCCCGCTCCCATCGCCGATGACCGCGGGGAGGAGGATGGAGATGCTCTGTGCCGGCAGGGTCCCT-



GCGCTGCTGCTCTGCCTGGGTAAGTTCTCCCCCTCTGGCTTCCGGCCGCCCCAA





1325
GCGGCCCCCTCCCGGCTGAGCCTATAAAGCGGCAGGTGCGCGCCGCCCTACAGACGTTCGCACACCT-



GGGTGCCAGCGCCCCAGAGGTCCCGGGACAGCCCGAggcgccgcgcccgccgccccgAGCTCCCCAAGCCTTC-



GAGAGCGGCGCACACTCCCGGTCTCCACTCGCTCTTCCAACACCCGCTCGTTTTGGCGGCAGCTCGTGTCCCAG



AGACCGAGTTGCCCCAGAGACCGAGACGCCGCCGCTGCG





1326
CAGCAGGGCGCGGCTTCCCTTTCCCGGGGCCTGGGGCCGCAATCAGGTGGAGTCGAGAGGCCGGAG-



GAGGGGCAGGAGGAAGGGGTGCGGTCGCGATCCGGACCCGGAGCCAGCGCGGAGCACCTGCGCCCGCGGCT-



GACACCTTCGCTCGCAGTTTGTTCGCAGTTTACTCGCACACCAGTTTCCCCCACCGCGCTTTGGGTAAGTTCAG



CCTCCCGGCGCGTCCCCGCGAGCCTCGCCCACAGCCGCCTGCTG





1327
CCGCAGCACGCTCGGACGGGCCAGGGGCGGCGACCCCTCGCGGACGCCCGGCTGCGCGCCGGGC-



CGGGGACTTGCCCTTGCACGCTCCCTGCGCCCTCCAGCTCGCCGGCGGGACCATGAAGAAGTTCTCTCGGAT-



GCCCAAGTCGGAGggcggcagcggcggcggagcggcgggtggcggggctggcggggccggggccggggccggct



gcggctccggcggcTCGTCCGTGGGGGTCCGGGTGTTCGCGGTCG





1328
GCGGAGTGCGGGTCGGGAAGCGGAGAGAGAAGCAGCTGTGTAATCCGCTGGATGCGGACCAGG-



GCGCTCCCCATTCCCGTCGGGAGCCCGCCGATTGGCTGGGTGTGGGCGCACGTGACCGACATGTGGCTGTAT-



TGGTGCAGCCCGCCAGGGTGTCACTGGAGACAGAATGGAGGTGCTGCCGGACTCGGAAATGGGG





1329
GCGCGGGGGCAGGTGAGCATGCGAAGGTTGGAGGCCGCGCCCCTTGCTGAGGCGCAGCTGGCT-



GCTCTTTTCGGGCCGGCATACGCGCGCAGCCGCAGCTGAGGTCACCCCGCTGAGGTGGTGGGGAGGGGAATG-



GTTATTCTTGAGGCACCGCATCTCTTGAGGAGGAAAGAGCCGGAAACACCTGGTCTCTCAAGCAGGTACAGCCC



GCTTCTCCCCAGCACCCCGGTGTGGGCTTCCCAAGGTCCTGCCTGA





1330
ggcgcgggggcaggtgagcatgcgaaggttggaggccgcgccccttgctgaggcgcagctggct-



gctcttttcgggccggcatacgcgcgcagccgcagctgaggtcaccccgctgaggtggtggggaggggaatg-



gttattcttgaggcaccgcatctcttgaggaggaaagagccg





1331
AGTGACGGGCGGTGGGCCTGGGGCGGCCAGCGGTGACTCCAGATGAGCCGGCCGTCCGCGTTCGCGC-



CGCGGCGGTGCGGTTGTCGCGGATCAGCAGGATCGGAGTGCGGGGCTGCTGGGCGGAGGCGTTGGCTGCAC-



CAGGGACGGCGGCG





1332
GGCGACCCTTTGGCCGCTGGCCTGATCCGGAGACCCAGGGCTGCCTCCAGGTCCGGACGCGGG-



GCGTCGGGCTCCGGGCACCACGAATGCCGGACGTGAAGGGGAGGACGGAGGCGCGTAGACGCGGCTGGG-



GACGAACCCGAGGACGCATTGCTCCCTGGACGGGCACGCGGGACCTCCCGGAGTGCCTCCCTGCAACACTTCCC



CGCGACTTGGGCTCCTTGACACAGGCCCGTCATTTCTCTTTGCAGGTTC





1333
CGCGGCAGCCCGGGTGAATGGAGCGAGGCGGCAGGTCATCCCCGTGCAGCGCCCGG-



GTATTTGCATAATTTATGCTCGCGGGAGGCCGCCATCGCCCCTCCCCCAACCCGGAGTGTGCCCGTAATTAC-



CGCCGGCCAATCGGCGGCGTCGCGCGGCCCCGGGAGTCGGCTCGGGCTAAGCTGGCCAGGGCGTCTCCAGGCAG



TGAAACAGAGGCGGGGTCGGCGGGCGATTAGCGGCCGAGGCACGCTCCTCTTG





1334
GGCGAGCGAGCGGGACCGAGCGGGGAGCGGGTGGAGGCGGCGCCACG-



GCGCGCACACACTCGCACACACGCGCTCCCACTCCAcccccggccgctccccgcccgaggggccgcgcggcg-



gccgcggggAACGATGCAACCTGTTGGTGACGCTTGGCAACTGCAggggcgcccgcggtccctgcccccacgcc



ctccgcgcgggccccgccaccccggccccgacggcgcctgcacgcccgcgtcccctg





1335
GGGGCAGTGCCGGTGTGCTGCCCTCTGCCTTGAGACCTCAAGCCGCGCAGGCGCCCAGGGCAGGCAG-



GTAGCGGCCACAGAAGAGCCAAAAGCTCCCGGGTTGGCTGGTAAGGACACCACCTCCAGCTTTAGCCCTCT-



GGGGCCAGCCAGGGTAGCCGGGAAGCAGTGGTGGCCCGCCCTCCAGGGAGCAGTTGGGCCCCGCCCGG





1336
CGCTGGCATTCGGGCCCCCTCCAGACTTTAGCCCGGTgccggcgccccctgggcccggcccgg-



gcctcctggcgcagcccctcgggggcccgggcACACCGTCCTCGCCCGGAGCGCAGAGGCCGACGCCCTAC-



GAGTGGATGCGGCGCAGCGTGGCGGCCGGAGGCGGCGGTGGCAGCGGTAAGGACCCTTCCCTCGCCCTGCGCCT



CTGGACCTGCAGGTGCTCGGGCGCGGCCCAGGCCGCCCCCTGTCTGA





1337
GAGCCGTGATGGAGCCGGGAGGAGAGGCGCATCCTCAGCAGAGCTTCCCTCCCTTGCACACGAGCT-



GACGGCGTGAACGGGGGTGTCGGGGTTGGTGCAACTATAGAAGGGAAAGGCTGGGCGGGGGTCACACATACCT-



CAGTGGCAGGCAGGCAGGCGGCAGGCAGAGCGCGCTCTCCGGGCAGTCTGAAGGACCGCGGGAATGTGGAGGGG





1338
GCCAGGGTGTCTTGGCTCTGGCCTGAGTCGGGTATGTGAAAGCCTTTTGGGGCAGGAAGGG-



GCAAAGTGATACCTGGCCGTCCCACCCTCTGGTCCCAGAAGGAGCTCTCGCTGGAGCCAG-



GCAGCCTCCAGTCCCCCTCCTTTCAGCCTTGTCATTCTCTGCATCCTGCCCAGGCCACAAAGGA





1339
CGGCTCCGGCGGGGAAGGAGGCgggctgcggctgcggctggggctgaagctggggctggggttgggg-



GACTGCCCGGGGCTTAGATGGCTCCGAGCCCGTTTGAGCGTGGTCTCGGACTGCTAACTGGACCAACG-



GCAACTGTCTGATGAGTGCCAGCCCCAAACCGCGCGCTGC





1340
GCCAGGGTGCCGTCGCGCTTGGCGCCGTCCAGGGCGGCGCTGCGCTCGTCCAGCAACACCACGGCGT-



GGTAGGCGCCGGCCAGCAGGCGGCCGCGGAGCTCGGCGTTGGGCACGATGTGCTCCAGGCCCATGGCGCCCT-



TGGCCCGGCGCCGCACGATGGTGCTGAAGCGCACGTTGACAGAGCCGGCGATGTGGCCGGCGTTGAAAGCGAAG



AAGGAGCGGCAGTCCAGCAGCAGGCATTGCGCCGCTCGCTCC





1341
CGGCTCGGTCCTGAGGAGAAGGACTCAGCCGCGGCTGCGGGACCCGGGCACCGGGAggcggtggcg-



gcggcggcggcggcagcagcggcgacagcagaggaggaagaggaggaagaaggaaagaaaaagaagaaCCAG-



GAGGAGTCCTCAACAACGACAGCGGGGACTGCGGGACCAGGGTAAAGCGGCGACGGCGGCGACGGCCCAGCAAC



CGTGA





1342
CGCGGGGAACCTGCGGCTGCCCGGGCAAGGCCACGAGGCTTCTTATACCCGGTCCTCGC-



CCCTCCAGCGCCGGCCTCGCCCGCGCTCCTGAGAAAGCCCTGCCCGCTCCGCTCACGGCCGTGCCCTGGC-



CAACTTCCTGCTGCGGCCGGCGGGCCCTGGGAAGCCCGTGCCCCCTTCCCTGCCCGGGCCTCGAGGACTTCCTC



TTGGCAGGCGCTGGGGCCCTCTGAGAGCAGGCAGGCCCGGCCTTTGTCTCCG





1343
CCGCCGCTGCTTTGGGTGGGGGGCTGACAGGGCTGCGCGCGTCGCGCTCTTGGCTGGGGCTGCGCGG-



GCCCGGGGCGCTGCGGGCGGCTCAGCGGCAGCTGCCGCGCTCTGCGCCTCCTCTGGGCGCACTGCCTGG-



GAGCACGAGACTGGTTTGTCTGATGCTGCTGCCGGAGCTGAGGTCTTGCCTGGAGATCCGAACGAGACACCACG



TCAACCGGCGCGGGGAGTCCCGTGAAGACATGAG GGCGCCAGGAG





1344
ACCTGAGCCCGCGGGGGAAccccccccccaccccoggggaaccccccccacccccgccgc-



cccccgccTGCAAGTTGTTACCAGTAAATAAAAGGGATCCTATTTTAGCAAGCCACACAGCATTAGAGG-



GCAAATAATAGTTTGGTGGCAGGAGAGCGATGAGACGGGAAAGTGTGGGGCAAAGCTTACAGTCATTGGTCCAG



ATTCTAACTGGCCTGTTAGCCAAAAAGTAAGGTTTTCTTTACCTCCGTGTTG





1345
AACGCCGGCCTCACCGGCAGACGCGCGCCCTCCTCCCAGATGCGCAGGTGACCCCGGCGGGCG-



GCGCGGGAAAGGGAAGAGCTCCGCGAGGCCGCGCGGGGGGGAAGCGGGAGAAGC-



CGCTCTTCCTATTCCACTCGCAGTCTGCGTGTGGGGGAAACGAGTGCCCGGCGTATGAAACGCCTAACTTCGCG



AAATAAAGAGAGACGTATAAAAGTTCAAGAATTCTGTCCAGACTCAAGGGCCCTTTCTCATTTA





1346
CCGTGGTCCCAGCGCTCCTGCTATTTGCATTCCAAAGCAGACACCTCATGCGCTCAACCCCGC-



CCGCAGGCGGCTCCCGCAGTCTAAGGGACCTGGCGCGAGTCCGGGAAGCGGAGGGCGCAGCTGCGCAGG-



GAAGGGGGCCGGGGGCGGGACCAGGGCGCGCGTTCCGGTCCCGGGGCGTGGC





1347
TGCGACCCGGCGCCCAAGCAGCCTGGGACCTTGCGCGGACCTGACCCCTTCAGACCGCAGGCAGTCT-



GGGAGGAGGTCCGGCCGGGGGAGGTGCAGGATCCCCGCCGTGTCTCTTTGACGACTTGGGGACTGTCACG-



GTTCTCTCCCGGCGCCCCTGGGTTCTTTTGTCCTGCACGCGGTGCGAAGGGGCCAGCAGGGAAGGAGCAGAGGA



TGGGGGGTGGGGTTGTTGGAGCCCCGCGGAGGTCTGGGAGGCCC





1348
GGCTCTGCGCTGCCTTTGGTGGCTCCTCCCTGGTCCTCTAAATGTGACACCAGGCGGATGCGGGGC-



CACAGGACCCTGGGGCTTGAGTCACACAAGAATGTCTCTGGGAGACCCGAGAGACTCACAGTTATGAAACAG-



GACCATGGTTCTTTggccgggcgcgggg





1349
gcgcgggcggcTCCTTTGTGTCCAGCCGCCGCCACCGGAGCTCCCGGGGCCTCCGCGGG-



GAGCGCGTCCCCCGCATCCGCCCGACCCCCGGGGCTGGCACGTGCTGCGCCCGGTCCGCTGAGGGGGCGGAG-



GCCCCGATCTCCCCGACCCCCCTTCTCTGCTTAGAGGAGGAGGAGCAGCGGCAGCGGCAGCAGGAGGCGACAGC



TGCCAGCCGAGGAGGCGCGGCGGAGAGGGGACTGCGGTCAGCTGCGTCCA





1350
GGCCCGTTGGCGAGGTTAGAGCGCCAGGTTGTAAGAATCGGGTCTGTGGACCTCATACCAGATAG-



GCGCGAACGCCTCTGGCAGCGGCGTCCAGGGGGTCCGGCGGCACTCGCGGTGGGGCTGCCTGGGTTGCGGGT-



GACGATCTGCGGGGTCCCGCACCCGGCCCCGCGGAGCCCGGACCCGCACGTAGGCGGCGCGGCAAAGGCACACC



CTCCTCGCGGCCGCGAACCCAGCGCCGTCCTCGCAGCGCGGCAA





1351
acccggcatccgggcaggctgcgcgcgggtgcggggcgagggcgccgcggggACTGGGACGCACGGC-



CCGCGCGCGGGACACGGCCATGGAGGACGCGGGAGCAGCTGGCCCGGGGCCGGAGCCTGAGCCCGAGC-



CCGAGCCGGAGCCCGAGCCCGCGCCGGAGCCGGAACCGGAGCCCAAGCCGGGTGCTGGCACATCCGAGGCGTTC



TCCCGACTCTGGACCGACGTGATGGGTATCCTGGTAAGTTACCTGG





1352
CCCGGACTGTAATCACGTCCACTGGGAACTGGCGCAGTAGTGGAGGGGACGCGATCAGGCCCGTG-



GCTGCGCCCAGAGCATGATAAGCCAGGGACCTCGCGGCGCAGGCGGAGGGAGGGAGAGCGTCGCGGACCCAG-



GCGGGGACAGGGAGACGCC





1353
CGCCGCCAACGCGCAGGTCTACGGTCAGACCGGCCTCCCCTACGGCCCCGGGTCTGAGGCTGCG-



GCGTTCGGCTCCAACGGCCTGGGGGGTTTCCCCCCACTCAACAGCGTGTCTCCGAGCCCGCTGATGCTACT-



GCACCCGCCGCCGCAGCTGTCGCCTTTCCTGCAGCCCCACGGCCAGCAGGTGCCCTACTACCTGGAGAACGAGC



CCAGCGGCTACACGGTGCGCGAGGCCGGC





1354
GCTGCCAGCTGCCGCTCCGGCTCCCACTTCCCACCTGCTGCCCGAGGAAGACTTCCGG-



GAGAAACGCTGTCTCCGAGCCCCCGCGCCGCCGCGCTCCCTCCGCTGCAGCAGCGGCCACCGGGTGCGCCCG-



GAGCCCTGGGACGGCCTAAACCAGTATCTCGCGGGCCCCGCGCCGGGCTCCGGGAATGGCCGCAGCAGCCCTGG



CGACCCGGGCCCCTCGGAGCTCCCCTTCAGGATCGTGCACCAAGCGCGCAC





1355
GCGCCCACCTGCGCCTCGCGGGGTCCCCGAGGTCCCGCCACCGAGCGCCCAAGGCGG-



GATCCCAGCGCGTCCTGCAGCCCGCCCAGCTTCAGGGCCGGCCCGGCGCGCGCAGGTGCGGCACTCACCGGC-



CAGGTGAAGCCGAAGGGGAAGCGGATGGGGTTGCTGAACGCGGAGTCGGCGCCCCCGCCGTCGGGCAGACTGAA



GGAGTCGACGCCCAGCACGGGGGTGACGGCGCTGCCGTAGGTGCAGGGCGGC





1356
CGGGCCAGGGCGGCATGAAGAAGTCCCGCCGCTACGTGCCCGGCACAGTGGCCCTGCGCGACGTTCG-



GCGCTACCAGAACTCCGAGCTGCTGATCAGCAAGCTGCCGCTCCTGCGAGAGCTCGGCGGTGACGCCGCT-



GCACGAGAGCGA





1357
GCTGCGACCTGGGGTCCGACGGACGCCTCCTCCGCGGGTATGAACAGTATGCCTACGATGGCAAG-



GATTACCTCGCCCTGAACGAGGACCTGCGCTCCTGGACCGCAGCGGACACTGCGGCTCAG-



ATCTCCAAGCGCAAGTGTGAGGCGGCCAATGTGGCTGAACAAAGGAGAGCCTACCTGGAGGGCACGTGCGTG-



GAGTGGCTCCACAGATACCTGGAGAACGGGAAGGAGATGCTGCAGCGCGCGGG





1358
GTTAGGAGGGCGGGGCGCGTGCGCGCGCACCTCGCTCACGCGCCGGCGCGCTCCTTTTGCAG-



GCTCGTGGCGGTCGGTCAGCGGGGCGTTCTCCCACCTGTAGCGACTCAGGTTACTGAAAAGGCGG-



GAAAACGCTGCGATGGCGGCAGCTGGGG





1359
AGCGCACCAACGCAGGCGAGGGACTGGGGGAGGAGGGAAGTGCCCTCCTGCAGCACGCGAG-



GTTCCGGGACCGGCTGGCCTGCTGGAACTCGGCCAGGCTCAGCTGGCTCGGCGCTGGGCAGCCAGGAGCCTGG-



GCCCCGGGGAGGGCGGTCCCGGGCGGCGCGGTGGGCCGAGCGCGGGTCCCGCCTCCTTGAGGCGGGCCCGGGC





1360
CGGCTGGCCCCGCCCACTCTCCGCGGCCGGAAGTGGCGGCGCCGAGTGAGGTAAATGCGTGCCCG-



GAAGCGCGACCTCGGGCGGTTGGAGGGGCTACCGGGTCTTACCAGTCCGTGGCGGGAGTCCCGGAGGAC-



CCTCGACGGGGGAGTTGCCGAGAAAAGGCCTCGCCGGCA





1361
GGGGTTGCCGTCGCAGCCAGCTGAGTGTTGCGCCAGGGGGACAGGTATGTTCCAGGCAGTGGCAAGC-



CCAACCCGAGCAAGACCTGCGCTGAAACGGATTGGCTGCCCTCCGCCCGGAGTCCGTTCTCCCTGCAGCGGC-



CAGTGCAGAGCTCAGAGGCTCAGAAACTCGCTCTCAGCCCCCTGGAGGCGGAGCCCGGGAGATAAGGTTCGCGC



TCCCCACCCGCC





1362
CCGCACTCCCGCCCGGTTCCCCGGCCGTCCGCCTATCCTTGGCCCCCTCCGCTTTCTCCGCGCCGGC-



CCGCCTCGCTTATGCCTCGGCGCTGAGCCGCTCTCCCGATTGCCCGCCGACATGAGCTGCAACGGAG-



GCTCCCACCCGCGGATCAACACTCTGGGCCGCATGATCCGCGCCGAGTCTGGCCCGGACCTGCGCTACGAGGTG



ACCAGCGGCGGCGGG





1363
ggaccccctgggcagcaccctggccacccttccatccacaacatccagaccacacggccaagg-



gcacctgaccctgtcaaaaccccaaatccagctgggcgcggtggctcatgcctgtaatcccagcatttgggag-



gccgaggcagccgg





1364
gaggcagccggatcacgaagtcaggagttcgagaccagcctgaccaacatggtgaaaccccgtctc-



tactaaaatacaaaaattagccgggcgtggtggtgcacacc





1365
GCGCGTGCGGGCGTTGTCCCGGCAACCAGGGGGCGGGGCTGGGCGTGGCACCGCCCCGCGCTCCGCT-



GCCAGGGGCGGGAGGGAGGAATGGTTGCTTCACGCCCCGGGGGAAGAGACGGGAAGCTCGGCTCTGGGT-



TGCGGGCCCCGGCGTCTCCGCGTGGGGCGCACCGTCCGACCCCCCCCTCCCGGTGTGCAGCGCCCCGCACCGCC



CCGCCTCGCCTGGGAGAAGCCGCCGGGACGCGCC





1366
CAGGATGCGGCAGCGCCCACCCGCGCGGCGTGGAGGGGGCCGGGGGCGGCGCTCGGCGCAGATG-



GCGCTCGCTGCGAGATGGATGCTCCAGGGCGGGTAATCACTCCTGGCTCAACACAGCATCCCGGGCGGAGCG-



GATGCCAGATCCCACCGCTAAGAGCCTGGGCTGGGAAAGCAATCTTTCCAGGCAGCCCCCAGCCCGGTGCGCCG



GCCCCGACAAGTCCCAGCCCTCGGAGGCAGGGCGGGGCGCAGGGA





1367
gATGCGGCCCGCGGAGGAGAGAGCAGGAGGACGGACGGGAGGGACCTCCGCGGGGAGG-



GCGCGCgggggaggcggggagggaggcgggagggggaggggACGGTGTGGATGGCCCCGAG-



GTCCAAAAAGAAAGCGCCCAACGGCTGGACGCACACCCCGCCAGGCCTCCTGGAAACGGTGCCGGTGCTGCAGA



GCCCGCGAGGTGTCTGGGAGTTGGGCGAGAGCTGCAGACTTGGAGGCTCTTATACCTCCGTG





1368
GTTCTGCGCGCGCCCGACTCCGCTGCCCGCCCCGCCAGGCCTCCGGGAGGTGGGGGCTGGGAG-



GCGTCCCCCGCTCCCGCCCCCTCCCCACCGTTCAATGAAAGATGAACTGGCGAGAGGTGAGAAGGGAAGAGG-



GCTCCCGGCTCTCTCGGGGCGGGAATCAGTGGGCCAGAGCTCGCCGGGTGGCCGCAAG





1369
CCCGCCGTGGGCGTAGTAAccgccaccgccgccgccccccgcgccaccaccaccgccgccT-



GCCTCGCCTCTGCCCGAGCTGATGAGCGAGTCGACCAAAAAAGAGTTCGCGGCGGGGCTCTCCGAGCATGA-



CATTGTTGTGGGATAATTTGGCGAAGGGAGCAGATAGCCCTTTCTGGCTGACATTTCTTGTGCAAAACATGCT-



GAATACGATTAGCAATCCCCCCGCACCGCGGCGGGCGCCCGCAGCCAATC





1370
ACCCGCCCGGGCAGCTCCAGTCCCGGACTCCGCAGCTCGGAGCGCAGCCAGCCACGGCCATTGCGG-



GACCCTATTTATCCCGACACCTCCCCTGACGTGGGCTCGGAACGCTCCCTTGGCAGCTGCAGCCGCGGCGCGG-



GCTCCCCCTCGGCCGCCCCACCCCCAGGCCCGTCGGTGCAGAAGCGGTGACATCACCCCCTCTGGGCCGCAGC





1371
CAGCGGTCGCGCCTCGTCGGGCGACGGCTGGCAGCGAAGGCCGGAGCCACAGCGCTCGGTGTAGAT-



GCCGCACGGCTGGCCCTCGCTCAGTGCGCACGTCAGGCAGCAGCCGCAGCCCGGCTCGCGCAC-



CAGCTCCGCGCACACGGCGGGCGGAGGCGCGCACTGGGCCAGTGCACGCGCGTCGCACGGCTCGCAGCGCACCA



CGGGACCCAAGCCCGCCG





1372
GCAATCGCGCTGTCTCTGAAAGGGGTGGAGAAGGGGCTGGATGAGTCCGGAAGTGGAGATTGGCT-



GCTTAGTGACGCGCGGCGTCCCGGAAGTTGACAGATACAGGGCGAGAGGCAGTGGAGGCGGGACTTG-



GATAGGGGCGGAACCTGAGACTACCTTTCTGCGATCACAGGATTCCCGGCGGTGACTTGACCCCGGAAGTGGGG



TGTGAAGCTCCGGTGCTGGTGCGGCGGGGGA





1373
GAGCGCCCGCCGTTGATGCCCCAGCTGCTCTGGCCGCGATGGGCACTGCAGGGGCTTTCCTGT-



GCGCGGGGTCTCCAGCATCTCCACGAAGGCAGAGTTGGGGGTCTGGCAGCGCGTTCTGGACTTTGCCCGCCGC-



CAGTGCGATTCTCCCTCCCGGTTCCAGTCGCCGCGGACGATGCTTCCTCCCACCCACCGCCCGCGGGCTCAGAG



AGCAGGTCCCCGCACCGCGC





1374
CATGGCCCGCTGCGCCCTCTCCGCCGGTTGGGGAGAGAAGCTCCTGGAGCGGCCAGATACCTGTTG-



GCTCCTGAGCAGCATCGCCCAGTGCAGCCTCCGTCAGGAAAAGCAGCAGAATCGACAGCCCCAGGGGGC-



GAGCGGGGTCCATGGTGCAGGGGGTCGGGCGGCCCGCTGGGCAAGGCGTCCGAGAAAGCGCCTGGCGGGAGGAG



GTGCGCGGCTTTCTGCTCCAGGCGGCCCGGGTGCCCGCTTTATGCG





1375
GGGGGCGGGGTGCAGGGGTGGAGGGGCGGGGAGGCGGGCTCCGGCTGCGCCACGCTATC-



GAGTCTTCCCTCCCTCCTTCTCTGCCCCCTCCGCTCCCGCTGGAGCCCTCCACCCTACAAGTGGCCTACAGG-



GCACAGGTGAGGCGGGACTGGACAGCTCCTGCTTTGATCGCCGGAGATCTGCAAATTCTGCCCATGTCGGGGCT



GCAGAGCACTC





1376
CGACCCTGCGCCCGGCAGTCCCCGGGGGCCGTGCGCCCGGCCCAGGCTCGGAGGTCCAGCCCAGCG-



GCGGCTCAGGCTGCGCGCCTGGCTCCCAGCCTCAGTTTCCCCATTGGTAAAGCATTGACGGTGGTTGCGGACG-



GCTTCTGCGGACAGAGCCTTGGGCTCCGACGTCTGCGCGG





1377
GGCTTCAAGTCCACGGCCCTGTGATGGGATGTGGGCAGGGCCTGAGACAGGCCGAAC-



CCAACTCTTCACAGGGCCGAATTCTTTGCCCGCAGCCCAGCACCCCGAAGGAGCTTGCCTCGGCTTCAAG-



GCGCACCTAATGGGCACCGGATCGCTGGGGCGCTGAGGATGCCGCTCCGGGGCCTCCACGAGGCGGCCTCGCCA



CGCGCCTCGGCCA





1378
CCCCACCTGCCCGCGCTGCTTCTACCTGAAACTGGCCAAGGGCCCGAGCCCGGACCGGAGCCGT-



GACTTCCCTCCGCCGGCCACGGGGCTGCCCGGATCCGCCGGGTTATGTCGCTTGGCTTTGGGCTCAGGGGT-



CACCGTGGGCAGAGGGGGGTGCCGGGGTCGCGGACTGCCACCAGGTTGAGGAAAGGAGGGGCCTTTTGGCTGGG



GAAAGAGCGTGGTGGGGGACCCGCGGCCGATGGAATCCCTGGGGCA





1379
gcgcgcggagacgcagcagcggcagcggcagcATGTCGGCCGGCGGAGCGTCAGTCCCGCCGC-



CCCCGAACCCCGCCGTGTCCTTCCCGCCGCCCCGGGGTCACCCTGCCCGCCGGCCCCGACATCCTGCG-



GACCTACTCGGGCGCCTTCGTCTGCCTGGAGATTGTAAGTGGGGCCGCCGGAGCGAGGGTCGCGCGGGGAGCGA



GGACAGGCGGCGGCATCCTTGTCCCCCGGGCTGTCTTCCTCTGCGTCCGC





1380
GTGAGCCGGCGCTCCTGATGCGGAGAGGTGCGGCCATGTCCTGGCTGGGAGCGAAGCGC-



CCTCGCTCGGGCAGTCGGAGCGAACTGTCTCCCGCGCGCTCCGCCAGCCGGGCCCTCCCGCTGGGCCCAC-



CCCCCGAGGGGCGGGGCCAGAGCGGGCGGCACCGCCTCCTCCCCGCTGTCTGGGTCGCAGGCCTTAGCGACGGG



CTGTTCTCCGGCCCCGCCCCATTCCCAGGCTCCGCCCCC





1381
TGCCGCGGGGGTGCCAAGGGAAGTGCCAGCTCAGAGGGACCATGTGGGCGCAGGCACCCAGGCG-



GCGCCGGGAGGCCTCTCGGGACTCCAGGGCTGTCCCTCCCGCAGGCTGTCCTTCCACCTCCACCCCAGGC-



CAACGCCCTCCCGCCAGCCCAGGGTCCTGTGTCCTCGAGTCCTTCCTGGGCACCCTGGTCCCATCCTTAGCCCT



GCCCGAGGGGCCCAGCCCTGCTCCAAAAGGGCTGTGGCTCCACCCAC





1382
CTGCTGCGCGCGCTGGCTCTTCTGCGAGGCCTGCTTGAGCTTGTTGCCGCCTTTGGGCTCCGGGC-



CCTCCAGCTCGTCCCTGCAGCGCCGCGGCCGCTCCTCGTAGGCCAGGCTGGAGGCAAGCTCCTTCTCCT-



CAAAGCTGCGCTGCAGCTTCTGGAGGGCGCCCTCCCTCTCCAACAGCTTCTGCTCCAGCTCCTGGATGCTGCAC



TCGTCCGTGGAGATGGGGGAGCGG





1383
CTGGCGGCCCAGGTCGCTCCTGCCCAACCCGGGGACCCATCTCTTCCCCCGACTCCGACGACTGGT-



GCGTCTTGCCCGGACATGCCCGGCCGCAGGCGACCCGGGCCACGCACCCCCGCCGTGTCCCCCTCTCTCCCT-



GCCCTCTCCAGGCGCCAGGCACGCTCTTCCCCAGCCAGGGACCGCGGCGGGGACTCACCAACAGCAGGACCGCG



GCGACAACGAGCACAAGGGTCTTGGGGACCCGGGGCCCAGGCC





1384
AGCGCCCCGGCCGCCTGATGGCCGAGGCAGGGTGCGACCCAGGACCCAGGACGGCGTCGGGAAC-



CATACCATGGCCCGGATCCCCAAGACCCTAAAGTTCGTCGTCGTCATCGTCGCGGTCCTGCTGCCAGT-



GAGTCCCGGCCGCGGTCCCTGGCTGGGGAAGAGCGCACCTGGCGCCGGGAGGGGGCAGGGAGACGGGGACACGG



CAGGGATGCCTGGCCCTGGTCACCTGCGGCCGGGCA





1385
GCCGCACGGGACAGCCAGGGGGAGCGCGCGCTCTGCTCCCTCGCGGCCCGGTCGCTCCTGCCCAGC-



CCGGGCACCCCACTCTTCCCCTGACTCCGACGGCGGGTTCGTCCTGCCCAGACATGCCCGGCCGCAGGCGAC-



CCGGGCCAAGCATCCCCACCGTGTCCCCCTCTCTCCCTGCCCACTCCCGGCGC





1386
CCCGGACATGCCCCGCCACAAGTGACCCGGGCCAGGCACCCCCGCCGCGTCCCCCTCTCTCTCTGC-



CCCCTCCCGGTGCCAGGCGCGCTTTTCCCCAGGCAGGACCGCGGTGGGGACTCACCTGCAGCAGGAC-



CCCGACGACGACAAACTTGAAGGTCTTGTGGACCCGGAGCCGAGGGCTGGCTTCCCGCGCCGGCCTGGGT





1387
cgggggccgccgcctgacttcggacaccggccccgcacccgccaggaggggagggaaggggag-



gcggggagagcgacggcggggggcgggcggtggaccccgcctcccccggcacagcctgctgaggggaa-



gagggggtctccgctcttcctcagtgcactctctgactgaagcccggcgcgtggggtgcagcgggagtgcgagg



ggactggacaggtgggaagatgggaatgaggaccgggcggcgggaa





1388
CAGTGGCGGCCCTCGGCCTGCGGTCGGAGGCGGCGCGGGCGGGGAGGCGGCGCTGCGGGCTGGGT-



GCGCCCCGGCTCCCGGAGGTGCGGCGAGCAGGAAggcgcggggcggcgggcgcgcggcACTGACTCCGGAG-



GCTGCAGGGCTGGAGTGCGCGGGGCTCCTACGGCCGAGCCCTCGGAGCCGCCCCGCGCAGCCAATCAGCTCCCG



GCGGGGCGAGCCGCACTCGTTACCACGTCCGTCACCGGCGCG





1389
GCCCGGCGCGGATAACGGTCCGGCGGGAGGACACGGCGGTCCCTACAGCATCGCGGCGGGCCAG-



GCTCGGGCAGGGGCCGTGCTCAGGTGCGGCAGACGGACGGGCCGGCGCCTCTGAAGTCACCCG-



GCTCCTTTACGAACTGAGCCCGTTTTGGCTGGGAGGGTT





1390
GCTCCGGGTGGGGAGGGAGGCTGGCAGCTCACCCCCGGGGGCGAGGGGTCTGCGTTAGCCGTAGC-



CACGGGAGCCCGGGCTTCTGGGACGCTCAGCCGTGCGCTACCCGGTGCAGCTGCTTTCTCACCAGCTCGCGG-



GTGGGTCCTGCCGCGGCTCGGCGACCCGCGCCCCCTTGCGAGCGACCCAGCGTGAAACCAGCCCAAAGGGCG-



GCCTCGCCCG





1391
GCCTGGGCGCAGAACGGGGTCCCTCGGCAGGACCCTCGCCGCGACAGCCTCAGCAGGGGATCGTC-



GAGCAAAAGCCCGCAGGAATGCTCCTTTCTGGGGCCCCGCCCTCCCGGCCGACAGCTTTTAGGTAGACGTG-



GAGGCGACTCAGATCGCCTCGCGGTTCCCGGGATGGCGCGGTCGCCCCCAACGCGAGGCTGCCTGGGGCACCCG



GCTCTTTTCCTGGGCGTCCGCGGCC





1392
GGTCCTAATCCCCAGGCTGCGCTGACAGGATTAGGCTCCGTTCCTCCCCATAATGTTCCCAGGAC-



GAGCCTCATGGGGACGAACTACAAATCCCAGCATGCACCAGTCTTCGCCCGCCCGGCGGGAGGGCAACGGCT-



GACCAGGACCGCAGGCAAGCACCGCGGCGACGGTTCCAGCCAGGAAAATGAGAGCCTCTTGGGCCACGTTCCAA



ACGG





1393
CCGCGTCCCCGGCTGCTCCTCCTCGTGCTggcggcggcggcggcggcggcggcggcgCT-



GCTCCCGGGGGCGACGGGTGAgcggcggcgcggcgggcgggcgactgcggggcgcgcgggccggacccg-



gccTCTGGCTCGCTCCTGCTCTTTCTCAAACATggcgcggggccgggggcgcaggtggcggcgccggggcccgg



gccgggctctcgtggcgccgcgcggctcggcggctgccgggcgAACCGCAAGC





1394
GGCAGGGCTGACGTTGGGAGCGCTATGAGCTGCCGGGCAGGGTCCTCACCGGGGGCTTCCTCTGCGG-



GCCAGGGCTGCCGGGCGCCACCGGGACGCGAGCGCGCACGCCTCGGCCCGGCGGCCGCGCTCCTCGCAC-



CGCCTTCTCCGCAGGTCTTTATTCATCATCTCATctccctcttccccttctccttctcctttgcctccttctcc



tttgcctccttctcctcctcttcctccccctcctccaccaccacc





1395
CCGTGGGCGCAGGGGCTGTGGCCGGGGCGGTGGGCGGGCGGTGCCGCCAGGTGAGACTGGCTGCCGT-



GGCGCGGAGCTGCGAACTGGTCGGCGGCGCAAGGCGCGGACTCCGGTGAGTTGTGTGGAGCGCGCGCGGCCAT-



GGGCGCGGGCCACGGGCGGGTGGGAGGGTGGGGGGCCAGAGGGGCGGGGGAGGGTCACTCGGCGGCTCCCGGTG



CCGCCGCCGCCCGCCACCGCCTCTGCTCCCCGCG





1396
cctgcgcacgcgggaagggctgccggaggcgcccgtagggaggcgcgcgcgcgggcggctcagggc-



ccgcgttcctctccctcccgcctaccgccactttcccgccctgtgtgcgcccccacccccaccac-



catcttcccaccctcagcgcgggcgccc





1397
GCGGACGCAGCCGAGCTCAAAGCCGCTCTGGCCGCAGGGTGCGGACGCGTCGCGGAGTCCTCACTGC-



CCCGCCTCGCTCTGGCAGAGTGGGGAGCCAGCCGGCAAAGAATTCCGTTTTCAGCTGGGCCAAGGGGCCG-



GCGTCTCCCCACCCCCTTAGGCTCCGCCCCCTGTCCGCTGTGATCGCCGGGAGGCCAGGCCC





1398
GACCCATGGCGGGGCAGGCGGCGGCGCTGTCGGGCGGGCAGGGGTGGCGGGAGGCGGTGGCGCAGC-



GAGCAGCGGCCTCCAGCGCTGGTGGCTCCCTTTATAGGAGCGCTGGAGACACGGGCCCCGCCCGCCCTGCAGC-



CCCGCCCTGCAGTCCCGGAGCGCCGAGGAGTGCGCGCCCCCTCGCCCCCGCCCCACCTCGGCTGGGAGGCTGGT



GCGGACGCCGGGTG





1399
ccgctccccgcccctggctccgcctggc-



cccactcccctccgcgcgccttccctcttctcccccgctccccGCGGACGCTCCTCTCTTTCCCAGTGGGC-



CAACTTTATGCTGAAATTTCTTTTCTGCCCTTTTTTGGGATGTTTCCCCATTGGGAGGCGGAGCCGGGCTGCGG



CGGGGAAGGCGGAGGGCGAGGGGAAGAGTCACTGAGCTGCGGGGCATAGGGGGTCCGGGGCGAGGT-



GCCTTCTCCCACCCAG





1400
tgtgccgcgcggttgggaggagggtcgtgagcgtgagcgtgggagcgctgggggctctgctcgcgt-



gctgctctgaagttgttccccgatgcgccgtaggaagctgggattctcccatccggacgtgggacgcaggg-



gaggggtaggtttcaccgtccgggctgatgactcgtggcctccggggctcctgg





1401
CACTCACGCTCTCAGCCCGGGGAATCCCAGCGGGGAGGAGGGAGGGAG-



GTCGTTTTCTTCAGCTCCCCAGGTGGTCTGTGCTGGGTGTGCTGACGGTCCTTTTGGGAAAACAG-



GTCCACCTTTGCCAGCGTAATTCAGAAAGAGATGTAATTTTCTGAGAGCACACACCTGGGCAGGAGATCGC





1402
GGCAAGCGGGCTTCGGGAAGAATGCAGTTGGTGAGGAAGCTCGGCGAGGCGTGCCCGTGCAGCTGC-



CCCTGGCCCTGACTGCTGGTGCGAGGCAGTGCACGACTCAGCTGGCCGGGGCCTGCTGTCCCGCCGGTGC-



CACGCACCTGCAGACGCCCGGGCTGTGCCATCTCCTGGGCCGGTCCGGGGGCTGGGGCGGGGCGAAAAAGAAAA



AGCTCTGATCTCTGCCTTCGCCTCGCGCAGCTGTGCGGCGAGCCC





1403
CCCGCGGGCCGGGTGAGAACAGGTGGCGCCGGCCCGACCAGGCGCTTTGTGTCGGGGCGCGAG-



GATCTGGAGCGAACTGCTGCGCCTCGGTGGGCCGCTCCCTTCCCTCCCTTGCTCCCCCGGGCGGCCGCACGC-



CGGGTCGGCCGGGTAACGGAGAGGGAGTCGCCAGGAATGTGGCTCTGGGGACTGCCTCGCTCGGGGAAGGGGAG



AGGGTGGCCACGGTGTTAGGAGAGGCGCGGGAGCCGAGAGGTGGCG





1404
GGCGGCGGCTGGAGAGCGAGGAGGAGCGGGTGGCCCCGCGCTGCGCCCGCCCTCGCCTCACCTG-



GCGCAGGTAGGTGTGGCCGCGTCCCCTACCCGGCCGGGACTTTCTGGTAAGGAGAGGAGGTTACGGG-



GAACGACGCGCTGCTTTCATGCCCTTTCTTGTTCTACCTTCATCGGCCGAGGTAAAAGTGCTGAAACCATGTGA



ATAAAATACAGGTGGGTTCCGCCAGCTTCGCTCC





1405
GGGCCCCGGGACTCGGCTTGCACGAGCCAGTCTGGGGACCGGGGAGGCGGGGAGAGGGAAGGG-



GAAAGCGCGGACGCGGCCCAAACCTCCAGTAGCCGCAGCCGCCGTCGCCGAGTAGGGCCGGGCAGCCAGCCGG-



GCCTGGCGCAGCATCAGTGCCCGCTGCCGCTTCCGCTCGATACTCGCCCGCACCGAGGCAGGCAGCTCCGCGGG



TTGCTCTAAAGCCGCCGCCTCCGGCAAAGCCCCGTCGGCCGCC





1406
ACGGAATGTGGGGTGCGGGCCTGAATATTATAAACAAAACCAAAAAACACTGGCTGGAAAG-



GAAGTAAGCGGATTCTTCGTAAAGTCTATCAAAAGTCTTTTCGTTTCCCCCTCCCCCTTTCCCCACCGCCCAC-



CAAAATGAGCCGCGTTTGAGCACCTCAGGTCTGGAAAGCCGGCCAGGAGTGGGGGAGACCGAGGCACCCGCGGC



C





1407
GCGGCTGCTGCCGAGGCTCCTGGTTTCCACCGCCGCCCTCGGGGATCATGCCGCCATCGCGGTTCAT-



GCCGTTCTCGTGGTTCACACCGCCCTCAGGGTTCATATTACCCATGAGGCCTGGAGCTCCTTGGCCAACATG-



GCCTTCTGCGCTTGATGCTGCCCCCAGCTGAGGTGTGGGGCTTATTTTTACCTGGTATACACTCAGGCAGTAGA



ACACGGTGTCGTGGACGAGCGAACGCGCCATGGCTGGAGCGC





1408
CCGCTGCGCGAGGGAgggggcccgaggcgcccccggcccgcccTCCTCCCGGTCTTCGGATCCGAGC-



CGGTCCTCGGGAAAGAGCCTGCCACCGCGTCCCCGCAGCCACCCTCTCCGCGTGCCCGGCCCTCTCCAGTG-



GCGGGGGCACGTGGGCGCGCGGGGTGCGTGGCAAGCCGCCCCTCTCCCCACGCCCGTCCGGC





1409
GGGGTGCGGCGTCTGGTCAGCCAGGGGTGAATTCTCAGGACTGGTCGGCAGTCAAGGTGAGGACCCT-



GAGTGTAAACTGAAGAGACCACCCCCACCTGTAACAAAGAGGGCCCCACTAAGTCCCGCTTCTGCATTTG-



GTCCTGAGAGGCTCCGGTAAAGCCGTCCGGCAATGTTCCACCTGGAAAGTTCCAGGGCAGGGGAAGGGTGGGGG



GAGGGGCAGTCGCGGGGGA





1410
GCCGGGGGAAATGCGGCCTCTAAGCTCTCCGCTGAGGCGGCTTGGAAGGAATAGTGACTGACGTg-



gaggtgggggaggtggctggcccgggcgaggcccagggagagggagaggaggcgggtgggagaggaggagggT-



GTATCTCCTTTCGTCGGCCCGCCCCTTGGCTTCTGCACTGATGGTGGGTGGATGAGTAATGCATCCAGGAAGCC



TGGAGGCCTGTGGTTTCCGCACCCGCTGCCACCC





1411
tgcctggtaggactgacggctgcctttgtcctcctcctctccaccccgcctccccccaccct-



gccttccccccctcccccgtcttctctcccgcagctgcctcagtcggctactctcagccaacccccctcac-



cacccttctccccacccgcccccccgcccccgtcggcccagcgctgccagcccgagtttgcagagag-



gtaactccctttggctgcgagcgggcgagc





1412
GCGCGGGCGCCTCGATCTCCCGCGCGCGCGCGTGCGCGAGACCCCCCTTTGGCCCCCTACCCT-



GCAGCAAGGGTAGCGTGACGTAATGCAACCTCAGCATGTCAGCAGCAATATAAAGGAGAATGAGGCG-



GCGCGCCTCCCAGACGCAGAGTAGATTGTGATTGGCTCGGGCTGCGGAACCTCG





1413
CCCGGCTGGTCGGCGCTCCTCGCAGGCGGTGTCCCGGTCCGGAGCGATCTGCGCGCTCGGCCCCGCG-



GCCGCGCCCTCCCCGAAGCCCTTGCTTTGTTCTGTGAGCGCCTCGTGTCAGCCAGGCGCAGTGAGCT-



CACGGGGGCGTCCCGGGTCCGCATCCTCCCAGGAGCTGGGGAGCCGCTCGCTGGGCGCGGACCCGCTGCCTGAC



GCTGCAAACTACACGGTTTCGGTCCCCCGCGC





1414
CCGGGGCTGGGACGGCGCTTccaggcggagaaagacctccgcgggccgcgcgcggccttccccctgc-



gaggatcgccattggcccgggttggctttggaaagcggcggtGGCTTTGGGCCGGGCTCGGC





1415
GGGCGGGGTGGGGCTGGAGCtcctgtctcttggccagctgaatggaggcccagtggcaacacag-



gtcctgcctggggatcaggtctgctctgcaccccaccttgctgcctggagccgcccacctgacaacctct-



catccctgctctgcagatccggtcccatccccactgcccaccccacccccccagcactccacccagttcaacgt



tccacgaacccccagaaccagccctcatcaacaggcagcaagaaggg





1416
GTGCGGTTGGGCGGGGCCCTgtgccccactgcggagtgcgggtcgggaagcggagagagaagcagct-



gtgtaatccgctggatgcggaccagggcgctccccattcccgtcgggagcccgccgattggctgggtgtgg-



gcgcacgtgaccgacatgtggctgtattggtgcagcccgccagggtgtcactggagacagaatggaggtgctgc



cggactcggaaatggggtaggtgctggagccaccatggccagg





1417
GGCGGTGCCTCCGGGGCTCAcctggctgcagccacgcaccccctctcagtggcgtcggaact-



gcaaagcacctgtgagcttgcggaagtcagttcagactccagcccGCTCCAGCCCGGCCCGACCC





1418
GGCGGTGCCTCCGGGGCTCAcctggctgcagccacgcaccccctctcagtggcgtcggaact-



gcaaagcacctgtgagcttgcggaagtcagttcagactccagcccGCTCCAGCCCGGCCCGACCC





1419
CGGGAGCCCGCCCCCGAGAGgtgggctgcgggcgctcgaggcccagccgccgccgccgccgccgc-



cgccgccgcctccgccgccgccgccgccgccgccgccgccgcgctgccgcacgccccctggcagcg-



gcgcctccgtcaccgccgccgcccgcgctcgccgtcggcccgccgcccgctcagaggcggccctccaccggaag



tgaaaccgaaacggagctgagcgcctgactgaggccgaacccccggcccg





1420
TCCTGCCATCCGCGCCTTTGCActtttctttttgagttgacatttcttggtgctttttg-



gtttctcgctgttgttgggtgctttttggtttgttcttgtccctttttcgtttgctcatcctttttg-



gcgctaactcttaggcagccagcccagcagcccgaagcccgggcagccgcgctccgcggccccggggcagcgcg



gcgggaaccgcagccaagccccccgacacggggcgcacgggggccgggcagcccg





1421
AGGCACAGGGGCAGCTCCGGCACggctttctcaggcctatgccggagcctcgagggctggagagcgg-



gaagacaggcagtgctcggggagttgcagcaggacgtcaccaggagggcgaagcggccacgggaggggggc-



cccgggacattgcgcagcaaggaggctgcaggggctcggcctgcgggcgccggtcccacgaggcactgcggccc



agggtctggtgcggagagggcccacagtggacttggtgacgct





1422
CGACCCCTCCGACCGTGCTTCCGgtgagggtcctgggcccctttcccactctctagagaca-



gagaaatagggcttcgggcgcccagcgtttcctgtggcctctgggacctcttggccagggacaaggacccgt-



gacttccttgcttgctgtgtggcccgggagcagctcagacgctggctccttctgtccctctgcccgtggacatt



agctcaagtcactgatcagtcacaggggtggcctgtcaggtcaggcgg





1423
CCCGCAGGGTGGCTGCGTCCttccagggcctggcctgagggcaggggtg-



gtttgctcccccttcagcctccgggggctggggtcagtgcggtgctaacacggctctctctgtgctgtgg-



gacttccaggcaggcccgcaagccgtgtgagccgtcgcagccgtggcatcgttgaggagtgct-



gtttccGCAGCTGTGACCTGGCCCTCCTGGA





1424
GCGTCTGCCGGCCCCTCCCCttgtccgtcccctccgcgccgctggcgcgcgccttctgaatgc-



caagcattgccataaactccggggacaaaagcctgggtcacaaaagccccctctagaagttcacaccctgag-



gcttccctggcaaggctgggggccgtttggcccttccatgtggactgcaaaaacagtgttggaatgcaggactc



tgggtatgttctcgaaagttgttacaaccccaacccagggttgacc





1425
TAGGCCGCCGGGCAGCCACCgcgctcctctggctctcctgctccatcgcgctcctccgcgcccttgc-



cacctccaacgcccgtgcccagcagcgcgcGGCTGCCCAACAGCGCCGGA





1426
GGGGAGCGGGGACGCGAGCAgcaccagaatccgcgggagcgcggctgttcctggtagggccgtgt-



caggtgacggatgtagctagggggcgagctgcctggagttgcgttccaggcgtccggcccctgggccgtcac-



cgcggggcgcccgcgctgagggtgggaagatggtggtgggggtgggggcgcacacagggcgggaaagtggcggt



aggcgggagggagaggaacgcgggccctgagccgcccgcgcgcg





1427
GCCGGCTGGCTCCCCACTCTGCcagagcgaggcggggcagtgaggactccgcgacgcgtccgcac-



cctgcggccagagcggctttgagctcggctgcgtccgcgctaggcgctttttcccagaagcaatccag-



gcgcgcccgctggttcttgagcgccaggaaaagcccggagctaacgaccggccgctcggccactgcacggggcc



ccaagccgcagaaggacgacgggagggtaatgaagctgagcccaggtc





1428
TCGCTCACGGCGTCCCCTTGCCtggaaagataccgcggtccctccagaggatttgagggacagg-



gtcggagggggctcttccgccagcaccggaggaagaaagaggaggggctggctggtcaccagagggtggggcg-



gaccgcgtgcgctcggcggctgcggagagggggagagcaggcagcgggcggcggggagcagcatggagccggcg



gcggggagcagcatggagccttcggctgactggctggccacggc





1429
TCCCCGCTGCCCTGGCGCTCcccctttgatttattagggctgccgggttggcgcagat-



tgctttttcttctcttccatcccatcctcccttctggtcctcctttccacagtgggagtccgtgctcct-



gctcctcggttggctcctaagtgccccgccaggtcccctctcctttcgctctcccggctccggctcccgactct



tcggcccgctggcatctgcttccctcccctgcctcgtttctcgtcgcccctgct





1430
GGCCAGAGGCAGGCCCGCAGCtccctgccccgcctctgtgcctccgccaacccgacaacgct-



tgctcccaccccgatccccgcacccgcgcgaAGTGGGCCCTCCGGTCGTCGGC





1431
TGCCCGGGTCATCGGACGGGAGgccgcgccacgtgagggcggcaagagggcactggccctgcggc-



gaggccccagcgaggggcgcttccCCGAGGGGCCAGCCTGGGCA





1432
CCCAGTGCGCACGGCGAGGCagtagcccggccccgcactgctgataggtgcaggcag-



gacagtccctccaccgcggctcggggcgtcctgattggtgcggagccacgtcagtcgcacccggagaagg-



gtctgggaggaggcggaggcggaGAGGGCTGGGGAGGGCCGCG





1433
AGCGTCCCAGCCCGCGCACCgaccagcgccccagttccccacagacgccggcgggcccgg-



gagcctcgcggacgtgacgccgcgggcggaagtgacgttttcccgcggttggacgcggCGCTCAGTTGCCGG-



GCGGGG





1434
TGCTCCCCCGGGTCGGAGCCccccggagctgcgcgcgggcttgcagcgcctcgcccgcgct-



gtcctcccggtgtcccgcttctccgcgccccagccgccggctgccagcttttcggggccccgagtcgcac-



ccagcgaagagagcgggcccgggacaagctcgaactccggccgcctcgcccttccccggctccgctccctctgc



cccctcggggtcgcgcgcccacgatgctgcagggccctggctcgctgctg





1435
CGCTCGCATTGGGGCGCGTCccccatccgcccccaactgtggtgtcgcgacaggtcctattgcgggt-



gtctgcggtgggaagggcggtggtgactgggagcATGCGGGGTAACCGCAGTGGGCA





1436
TGCGGCAAGCCCGCCATGATGtccacgtgacaaaagccatgatatacatatgacaacgcctgccata-



ttgtccctgcggcaaaacccaacacgaaaagcacacagcaaagacaaagaggcccgccatgttttacactgcg-



gcaagaccttcagccgccatcttttcctgtgTGACCGCACATGTCCACCACCATGC





1437
TCTTGAGCCTCAGGAGTGAAAAGGCCCCTTGggaaaccctcacccaggagatacacaggagcactg-



gctttggcagcagctcacaatgagaaagaTGCCTGTCACAGCCTTTGCCTTCCTCTTCTATG





1438
GGACCATGAGTGTTTCCATGCTTGGCATCAGAcatgtcttctacccctattcagtctgtcatccact-



ggtcaagaatcccaaacattctaaaactgtgtccacatctcttctgggtaactcttatgattggagg-



gcttcctgaggtgtgaagtctatcacagatccagtgactaacttctagcttcatcttattctcacttaggggag



aagagttgaggcccaagcaaacctcttcttaccattggcttagggaa





1439
tcagccactgcttcgcaggctgacgttactgacgtggtgccagcgacggagggcgagaacgc-



cagcgcggcgcagccggacgtgaacgcgcagatcaccgcagcggttgcggcagaaaacagccgcattatggg-



gatcctcaactgtgaggaggctcacggacgcgaagaacaggcacgcgtgctggcagaaacccccggtatgaccg



tgaaaacggcccgccgcattctggccgcagcaccacagagtgcacag





1440
cggccagctgcgcggcgactccggggactccagggcgcccctctgcggccgacgcccggggt-



gcagcggccgccggggctggggccggcgggagtccgcgggaccctccagaagagcggccggcgccgtgact-



cagcactggggcggagcggggc








Claims
  • 1.-15. (canceled)
  • 16. A nucleic acid primer or hybridization probe set specific for at least one potentially methylated region of at least one marker gene suitable to diagnose or predict lung cancer or a lung cancer type.
  • 17. The set of claim 16, wherein the at least one the marker gene is further defined as WT1, SALL3, TERT, ACTB, or CPEB4.
  • 18. The set of claim 16, wherein the lung cancer is adenocarcinoma or squamous cell carcinoma.
  • 19. The set of claim 16, further comprising a nucleic acid primer or hybridization probe specific for at least one additional marker gene defined as ABCB1, ACTB, AIM1L, APC, AREG, BMP2K, BOLL, C5AR1, C5orf4, CADM1, CDH13, CDX1, CLIC4, COL21A1, CPEB4, CXADR, DLX2, DNAJA4, DPH1, DRD2, EFS, ERBB2, ERCC1, ESR2, F2R, FAM43A, GABRA2, GAD1, GBP2, GDNF, GNA15, GNAS, HECW2, HIC1, HIST1H2AG, HLAG, HOXA1, HOXA10, HSD17B4, HSPA2, IRAK2, ITGA4, JUB, KCNJ15, KCNQ1, KIF5B, KL, KRT14, KRT17, LAMC2, MAGEB2, MBD2, MSH4, MT1G, MT3, MTHFR, NEUROD1, NHLH2, NKX2-1, ONECUT2, PENK, PITX2, PLAGL1, PTTG1, PYCARD, RASSF1, S100A8, SALL3, SERPINB5, SERPINE1, SERPINI1, SFRP2, SLC25A31, SMAD3, SPARC, SPHK1, SRGN, TERT, THRB, TJP2, TMEFF2, TNFRSF10C, TNFRSF25, TP53, ZDHHCI1, ZNF256, ZNF711, F2R, HOXA10, KL, SALL3, SPARC, TNFRSF25, or WT1.
  • 20. The set of claim 16, further defined as a nucleic acid primer or hybridization probe set comprising nucleic acid primers or hybridization probes being specific for potentially methylated regions of at least 50% of the marker genes in at least one of the following combinations: WT1, DLX2, SALL3, TERT, PITX2, HOXA10, F2R, CPEB4, NHLH2, SMAD3, ACTB, HOXA1, BOLL, APC, MT1G, PENK, SPARC, DNAJA4, RASSF1, HLA-G, ERCC1, ONECUT2, APC, ABCB1, ZNF573, KCNJ15, ZDHHC11, SFRP2, GDNF, PTTG1, SERPINI1, and TNFRSF10C;WT1, PITX2, SALL3, F2R, DLX2, TERT, HOXA10, MSH4, NHLH2, GNA15, PENK, RASSF1, BOLL, HOXA1, ONECUT2, ABCB1, SPARC, MT1G, HSPA2, SFRP2, PYCARD, GAD1, C5orf4, C5AR1, GNDF, ZDHHC11, SERPINE1, NKX2-1, PITX2, C5AR1, GDNF, ZDHHC11, SERPINE1, NKX2-1, PITX2, C5AR1, ZNF256, FAM43A, SFRP2, MT3, SERPINE1M, CLIC4, TNFRSF10C, GABRA2, MTHFR, ESR2, NEUROG1, PITX2, PLAGL1, TMEFF2, PTTG1, CADM1, S100A8, EFS, JUB, ITGA4, MAGEB2, ERBB2, SRGN, GNAS, TJP2, KCNJ15, SLC25A31, ZNF573, TNFRSF25, APC, KCNQ1, LAMC2, SPHK1 DNAJA4, APC, MBD2, ERCC1 HLA-G, CXADR, TP53, ACTB, KL, SMAD3, HIST1H2AG, and CPEB4;WT1 DLX2, SALL3, TERT, TNFRSF25, ACTB, SMAD3, and CPEB4;WT1, DLX2, SALL3, TERT, PITX2, TNFRSF25, KL, ACTB, SMAD3, and CPEB4;WT1, PITX2, SALL3, DLX2, TERT, HOXA10, RASSF1, SPARC, IRAK2, ZNF711, DNAJA4, HLA-G, CXADR, TP53, ACTB, and CPEB4;WT1, PITX2, SALL3, F2R, TERT, HOXA10, RASSF1, SPARC, IRAK2, ZNF711, DRD2, DNAJA4, CXADR, TP53, ACTB, and CPEB4;WT1, ACTB, DLX2, PITX2, SALL3, HOXA10, TERT, CPEB4, HLA-G, SPARC, RASSF1, DNAJA4, CXADR, TP53, IRAK2, and ZNF711;F2R, ZNF256, CDH13, SERPINB5, KRT14, DLX2, AREG, THRB, HSD17B4, SPARC, HECW2, and COL21A1;KL, HIST1H2AG, TJP2, SRGN, CDX1, TNFRSF25, APC, HIC1, APC, GNA15, ACTB, WT1, KRT17, AIM1L, DPH1, PITX2, PITX2, KIF5B, BMP2K, GBP2, NHLH2, GDNF, and BOLL;WT1, DLX2, SALL3, TERT, PITX2, HOXA10, F2R, CPEB4, NHLH2, SMAD3, ACTB, HOXA1, BOLL, APC, MT1G, PENK, SPARC, DNAJA4, RASSF1, HLA-G, ERCC1, ONECUT2, APC, ABCB1, ZNF573, KCNJ15, ZDHHC11, SFRP2, GDNF, PTTG1, SERPINI1, and TNFRSF10C;HOXA10 and NEUROD1;WT1, PITX2, SALL3, F2R, TERT, HOXA10, RASSF1, SPARC, IRAK2, ZNF711, DRD2, DNAJA4, CXADR, TP53, ACTB, CPEB4, DLX2, TNFRSF25, KL, and SMAD3;TNFRSF25, SALL3, RASSF1, TERT, SPARC, F2R, HOXA10, ZNF711, and PITX2SALL3, PITX2, SPARC, F2R, TERT, RASSF1, HOXA10, CXADR, and KLSALL3, SPARC, PITX2, F2R, TERT, RASSF1, HOXA10, and KL;SALL3, PITX2, SPARC, F2R, HOXA10, DRD2, ACTB, DNAJA4, CXADR, KL;SALL3, SPARC, PITX2, F2R, TERT, RASSF1, HOXA10, TNFRSF25, DNAJA4, TP53, CXADR, and KL;SPARC, SALL3, F2R, PITX2, RASSF1, HOXA10, TERT, KL, and TNFRSF25;SALL3, SPARC, PITX2, F2R, TERT, RASSF1, HOXA10, KL, TNFRSF25, CXADR; andHOXA10, RASSF1, and F2R.
  • 21. The set of claim 16, further defined as comprising not more than 100000 probes or primer pairs.
  • 22. The set of claim 16, further defined as comprising not more than 100000 probes or primer pairs.
  • 23. The set of claim 22, further defined as comprising immobilized probes on a solid surface.
  • 24. The set of claim 22, wherein the primer pairs and probes are specific for a methylated upstream region of an open reading frame of the marker genes.
  • 25. The set of claim 22, wherein the probes or primers are specific for methylation in the genetic regions defined by any of SEQ ID NOs 1081 to 1440 including the adjacent up to 500 base pairs corresponding to any of gene marker IDs 1 to 359.
  • 26. The set of claim 25, wherein the probes or primers are of SEQ ID NOs 1 to 1080.
  • 27. A method of identifying or predicting a lung cancer or a lung cancer type in a patient, comprising: obtaining a set of nucleic acid primers or hybridization probes of claim 16;using the set to determine the methylation status of genes for which the members of the set are specific in a sample of DNA from the patient; andcomparing the methylation status of the genes with the status of a confirmed lung cancer type positive and/or negative state, thereby identifying lung cancer or lung cancer type, if any, in the patient.
  • 28. The method of claim 27, wherein the methylation status is determined by methylation specific PCR analysis, methylation specific digestion analysis and either or both of hybridization analysis to non-digested or digested fragments or PCR amplification analysis of non-digested or digested fragments.
  • 29. A method of determining a subset of diagnostic markers for potentially methylated genes from the genes of gene marker IDs 1-359 of Table 1, suitable for the diagnosis or prognosis of lung cancer or lung cancer type, comprising: a) obtaining data of the methylation status of at least 50 random genes selected from the 359 genes of gene marker IDs 1-359 in at least 1 sample of a confirmed lung cancer or lung cancer type state and at least one sample of a lung cancer or lung cancer type negative state;b) correlating the results of the obtained methylation status with the lung cancer or lung cancer type;c) optionally repeating the obtaining a) and correlating b) steps for a different combination of at least 50 random genes selected from the 359 genes of gene marker IDs 1-359; andd) selecting as many marker genes which in a classification analysis have a p-value of less than 0.1 in a random-variance t-test, or selecting as many marker genes which in a classification analysis together have a correct lung cancer or lung cancer type prediction of at least 70% in a cross-validation test;
  • 30. The method of claim 29, wherein a) is further defined as comprising obtaining data of the methylation status of at least 50 random genes selected from the 359 genes of gene marker IDs 1-359 in at least 5 samples of a confirmed lung cancer or lung cancer type state.
  • 31. The method of claim 29, wherein the correlated results for each gene b) are rated by their correct correlation to the disease or tumor type positive state, preferably by p-value test, and selected in step d) in order of the rating.
  • 32. The method of claim 29, wherein not more than 40 marker genes are selected in step d) for the subset.
  • 33. The method of claim 29, wherein the step a) of obtaining data of the methylation status comprises determining data of the methylation status by methylation specific PCR analysis, methylation specific digestion analysis, or hybridization analysis to non-digested or digested fragments, or PCR amplification analysis of non-digested or digested fragments.
  • 34. A method of identifying or predicting a lung cancer or a lung cancer type in a patient, comprising: providing a set of a diagnostic subset of markers identified by a method of claim 29;using the set to determine methylation status of genes for which the members of the set are specific in a sample comprising DNA from the patient; andcomparing the methylation status of the genes with the status of a confirmed lung cancer type positive and/or negative state, thereby identifying lung cancer or lung cancer type, if any, in the patient.
  • 35. The method of claim 34, wherein the methylation status is determined by methylation specific PCR analysis, methylation specific digestion analysis and either or both of hybridization analysis to non-digested or digested fragments or PCR amplification analysis of non-digested or digested fragments.
Priority Claims (1)
Number Date Country Kind
09450020.4 Jan 2009 EP regional
Continuations (1)
Number Date Country
Parent 13146901 Jul 2011 US
Child 15096848 US