COMPOSITIONS AND METHODS FOR DIAGNOSING LUNG CANCERS USING GENE EXPRESSION PROFILES

Information

  • Patent Application
  • 20200123613
  • Publication Number
    20200123613
  • Date Filed
    June 21, 2017
    7 years ago
  • Date Published
    April 23, 2020
    4 years ago
Abstract
Methods and compositions are provided for diagnosing lung cancer in a mammalian subject by use of 10 or more selected genes, e.g., a gene expression profile, from the blood of the subject which is characteristic of disease. The gene expression profile includes 10 or more genes of Table I or Table II herein.
Description
BACKGROUND OF THE INVENTION

Lung cancer is the most common worldwide cause of cancer mortality. In the United States, lung cancer is the second most prevalent cancer in both men and women and will account for more than 174,000 new cases per year and more than 162,000 cancer deaths. In fact, lung cancer accounts for more deaths each year than from breast, prostate and colorectal cancers combined.


The high mortality (80-85% in five years), which has shown little or no improvement in the past 30 years, emphasizes the fact that new and effective tools to facilitate early diagnosis prior to metastasis to regional nodes or beyond the lung are needed.


High risk populations include smokers, former smokers, and individuals with markers associated with genetic predispositions. Because surgical removal of early stage tumors remains the most effective treatment for lung cancer, there has been great interest in screening high-risk patients with low dose spiral CT (LDCT). This strategy identifies non-calcified pulmonary nodules in approximately 30-70% of high risk individuals but only a small proportion of detected nodules are ultimately diagnosed as lung cancers (0.4 to 2.7%). Currently, the only way to differentiate subjects with lung nodules of benign etiology from subjects with malignant nodules is an invasive biopsy, surgery, or prolonged observation with repeated scanning Even using the best clinical algorithms, 20-55% of patients selected to undergo surgical lung biopsy for indeterminate lung nodules, are found to have benign disease and those that do not undergo immediate biopsy or resection require sequential imaging studies. The use of serial CT in this group of patients runs the risk of delaying potential curable therapy, along with the costs of repeat scans, the not-insignificant radiation doses, and the anxiety of the patient.


Ideally, a diagnostic test would be easily accessible, inexpensive, demonstrate high sensitivity and specificity, and result in improved patient outcomes (medically and financially). Others have shown that classifiers which utilize epithelial cells have high accuracy. However, harvesting these cells requires an invasive bronchoscopy. See, Silvestri et al, N Engl J Med. 2015 July 16; 373(3): 243-251, which is incorporated herein by reference.


Efforts are in progress to develop non-invasive diagnostics using sputum, blood or serum and analyzing for products of tumor cells, methylated tumor DNA, single nucleotide polymorphism (SNPs) expressed messenger RNA or proteins. This broad array of molecular tests with potential utility for early diagnosis of lung cancer has been discussed in the literature. Although each of these approaches has its own merits, none has yet passed the exploratory stage in the effort to detect patients with early stage lung cancer, even in high-risk groups, or patients which have a preliminary diagnosis based on radiological and other clinical factors. A simple blood test, a routine event associated with regular clinical office visits, would be an ideal diagnostic test.


SUMMARY OF THE INVENTION

In one aspect, a composition or kit for diagnosing or evaluating a lung cancer in a mammalian subject includes ten (10) or more polynucleotides or oligonucleotides, wherein each polynucleotide or oligonucleotide hybridizes to a different gene, gene fragment, gene transcript or expression product in a patient sample. Each gene, gene fragment, gene transcript or expression product is selected from the genes of Table I or Table II. In one embodiment, at least one polynucleotide or oligonucleotide is attached to a detectable label. In one embodiment, the composition or kit includes polynucleotides or oligonucleotides which detect the gene, gene fragment, gene transcript or expression product of each of the 559 genes in Table I. In another embodiment, the composition or kit includes polynucleotides or oligonucleotides which detect the gene, gene fragment, gene transcript or expression product of each of the 100 genes in Table II.


In another aspect, a composition or kit for diagnosing or evaluating a lung cancer in a mammalian subject includes ten (10) or more ligands, wherein each ligand hybridizes to a different gene expression product in a patient sample. Each gene expression product is selected from the genes of Table I or Table II. In one embodiment, at least one ligand is attached to a detectable label. In one embodiment, the composition or kit includes ligands which detect the expression products of each of the 559 genes in Table I. In another embodiment, the composition or kit includes ligands which detect the expression products of each of the 100 genes in Table II.


The compositions described herein enable detection of changes in expression in the genes in the subject's gene expression profile from that of a reference gene expression profile. The various reference gene expression profiles are described below. In one embodiment, the composition provides the ability to distinguish a cancerous tumor from a non-cancerous nodule.


In another aspect, a method for diagnosing or evaluating a lung cancer in a mammalian subject involves identifying changes in the expression of three or more genes in the sample of a subject, said genes selected from the genes of Table I or Table II, and comparing that subject's gene expression levels with the levels of the same genes in a reference or control, wherein changes in expression of said gene expression correlates with a diagnosis or evaluation of a lung cancer. In one embodiment, the changes in expression of said gene expression provides the ability to distinguish a cancerous tumor from a non-cancerous nodule.


In another aspect, a method for diagnosing or evaluating a lung cancer in a mammalian subject involves identifying a gene expression profile in the blood of a subject, the gene expression profile comprising 10 or more gene expression products of 10 or more informative genes as described herein. The 10 or more informative genes are selected from the genes of Table I or Table II. In one embodiment, the gene expression profile contains all 559 genes of Table I. In another embodiment, the gene expression profile contains all 100 genes of Table II. The subject's gene expression profile is compared with a reference gene expression profile from a variety of sources described below. Changes in expression of the informative genes correlate with a diagnosis or evaluation of a lung cancer. In one embodiment, the changes in expression of said gene expression provides the ability to distinguish a cancerous tumor from a non-cancerous nodule.


In another aspect, a method of detecting lung cancer in a patient is provided. The method includes obtaining a sample from the patient; and detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product.


In yet another aspect, a method of diagnosing lung cancer in a subject is provided. The method includes obtaining a blood sample from a subject; detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product; and diagnosing the subject with cancer when changes in expression of the subject's genes from those of the reference are detected.


In another aspect, a method of diagnosing and treating lung cancer in a subject having a neoplastic growth is provided. The method includes obtaining a blood sample from a subject; detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product; diagnosing the subject with cancer when changes in expression of the subject's genes from those of the reference are detected; and removing the neoplastic growth. Other appropriate treatments may also be provided.


Other aspects and advantages of these compositions and methods are described further in the following detailed description of the preferred embodiments thereof.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a table showing patient characteristics for the samples used in Example 1.



FIGS. 2A and 2B are graphs showing the cross validated support vector machine classifier (CV SVM) of all 610 samples (FIG. 2A, Accuracy=0.75, ROC Area=0.81. According to the curve, when the sensitivity is 0.91, the specificity is 0.46; when the sensitivity is 0.72, the specificity is 0.77) and a balanced set of 556 samples (FIG. 2B, Accuracy=0.76, ROC Area=0.81, According to the curve, when the sensitivity is 0.90, the specificity is 0.48; when the sensitivity is 0.76, the specificity is 0.77), using the 559 Classifier. The full and balanced sets show similar performance.



FIG. 3 is a bar graph showing sensitivity of the classifier by nodule size groups (x-axis). Data shows that larger nodules are more likely to be misclassified (p=1.54*10−4).



FIGS. 4A to 4C show the classification of samples groups (cancer, FIG. 4B, n=204; and nodule, FIG. 4C, n=331) stratified by lesion size. Over cancers >5 mm and higher, r=0.95. For nodules of all sizes, r=0.97. The chart (FIG. 4A) shows the sensitivity and specificity of the classification of cancers and nodules based on lesion size. These numbers are shown in bar graph form below.



FIGS. 5A and 5B are graphs showing the cross validated support vector machine classifier (CV SVM) of all cancer samples (n=278) vs. small nodules (<10 mm) (n=244) (FIG. 5A, Accuracy=0.79, ROC Area=0.85. According to the curve, when the sensitivity is 0.90, the specificity is 0.54; when the sensitivity is 0.77, the specificity is 0.82) and 10-fold CV SVM using all cancer samples (n=278) vs. large nodules (≥10 mm) (n=88) (FIG. 5B, Accuracy=0.76, ROC Area=0.71. According to the curve, when the sensitivity is 0.90, the specificity is 0.24; when the sensitivity is 0.87, the specificity is 0.42).



FIG. 6 is a graph showing the cross validated support vector machine classifier (CV SVM) of 25% of the data set used for the 559 Classifier, used as a testing set for the 100 Classifier. ROC Area=0.82. According to the curve, when the sensitivity is 0.90, the specificity is 0.62; when the sensitivity is 0.79, the specificity is 0.68; and when the sensitivity is 0.71, the specificity is 0.75.





DETAILED DESCRIPTION OF THE INVENTION

The methods and compositions described herein apply gene expression technology to blood screening for the detection and diagnosis of lung cancer. The compositions and methods described herein provide the ability to distinguish a cancerous tumor from a non-cancerous nodule, by determining a characteristic RNA expression profile of the genes of the blood of a mammalian, preferably human, subject. The profile is compared with the profile of one or more subjects of the same class (e.g., patients having lung cancer or a non-cancerous nodule) or a control to provide a useful diagnosis.


These methods of lung cancer screening employ compositions suitable for conducting a simple and cost-effective and non-invasive blood test using gene expression profiling that could alert the patient and physician to obtain further studies, such as a chest radiograph or CT scan, in much the same way that the prostate specific antigen is used to help diagnose and follow the progress of prostate cancer. The application of these profiles provides overlapping and confirmatory diagnoses of the type of lung disease, beginning with the initial test for malignant vs. non-malignant disease.


“Patient” or “subject” as used herein means a mammalian animal, including a human, a veterinary or farm animal, a domestic animal or pet, and animals normally used for clinical research. In one embodiment, the subject of these methods and compositions is a human.


“Control” or “Control subject” as used herein refers to the source of the reference gene expression profiles as well as the particular panel of control subjects described herein. In one embodiment, the control or reference level is from a single subject. In another embodiment, the control or reference level is from a population of individuals sharing a specific characteristic. In yet another embodiment, the control or reference level is an assigned value which correlates with the level of a specific control individual or population, although not necessarily measured at the time of assaying the test subject's sample. In one embodiment, the control subject or reference is from a patient (or population) having a non-cancerous nodule. In another embodiment, the control subject or reference is from a patient (or population) having a cancerous tumor. In other embodiments, the control subject can be a subject or population with lung cancer, such as a subject who is a current or former smoker with malignant disease, a subject with a solid lung tumor prior to surgery for removal of same; a subject with a solid lung tumor following surgical removal of said tumor; a subject with a solid lung tumor prior to therapy for same; and a subject with a solid lung tumor during or following therapy for same. In other embodiments, the controls for purposes of the compositions and methods described herein include any of the following classes of reference human subject with no lung cancer. Such non-healthy controls (NHC) include the classes of smoker with non-malignant disease, a former smoker with non-malignant disease (including patients with lung nodules), a non-smoker who has chronic obstructive pulmonary disease (COPD), and a former smoker with COPD. In still other embodiments, the control subject is a healthy non-smoker with no disease or a healthy smoker with no disease.


“Sample” as used herein means any biological fluid or tissue that contains immune cells and/or cancer cells. The most suitable sample for use in this invention includes whole blood. Other useful biological samples include, without limitation, peripheral blood mononuclear cells, plasma, saliva, urine, synovial fluid, bone marrow, cerebrospinal fluid, vaginal mucus, cervical mucus, nasal secretions, sputum, semen, amniotic fluid, bronchoscopy sample, bronchoalveolar lavage fluid, and other cellular exudates from a patient having cancer. Such samples may further be diluted with saline, buffer or a physiologically acceptable diluent. Alternatively, such samples are concentrated by conventional means.


As used herein, the term “cancer” refers to or describes the physiological condition in mammals that is typically characterized by unregulated cell growth. More specifically, as used herein, the term “cancer” means any lung cancer. In one embodiment, the lung cancer is non-small cell lung cancer (NSCLC). In a more specific embodiment, the lung cancer is lung adenocarcinoma (AC or LAC). In another more specific embodiment, the lung cancer is lung squamous cell carcinoma (SCC or LSCC). In another embodiment, the lung cancer is a stage I or stage II NSCLC. In still another embodiment, the lung cancer is a mixture of early and late stages and types of NSCLC.


The term “tumor,” as used herein, refers to all neoplastic cell growth and proliferation, whether malignant or benign, and all pre-cancerous and cancerous cells and tissues. The term “nodule” refers to an abnormal buildup of tissue which is benign. The term “cancerous tumor” refers to a malignant tumor.


By “diagnosis” or “evaluation” it is meant a diagnosis of a lung cancer, a diagnosis of a stage of lung cancer, a diagnosis of a type or classification of a lung cancer, a diagnosis or detection of a recurrence of a lung cancer, a diagnosis or detection of a regression of a lung cancer, a prognosis of a lung cancer, or an evaluation of the response of a lung cancer to a surgical or non-surgical therapy. In one embodiment, “diagnosis” or “evaluation” refers to distinguishing between a cancerous tumor and a benign pulmonary nodule.


As used herein, “sensitivity” (also called the true positive rate), measures the proportion of positives that are correctly identified as such (e.g., the percentage of sick people who are correctly identified as having the condition).


As used herein, “specificity” (also called the true negative rate) measures the proportion of negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition).


By “change in expression” is meant an upregulation of one or more selected genes in comparison to the reference or control; a downregulation of one or more selected genes in comparison to the reference or control; or a combination of certain upregulated genes and down regulated genes.


By “therapeutic reagent” or “regimen” is meant any type of treatment employed in the treatment of cancers with or without solid tumors, including, without limitation, chemotherapeutic pharmaceuticals, biological response modifiers, radiation, diet, vitamin therapy, hormone therapies, gene therapy, surgical resection, etc.


By “informative genes” as used herein is meant those genes the expression of which changes (either in an up-regulated or down-regulated manner) characteristically in the presence of lung cancer. A statistically significant number of such informative genes thus form suitable gene expression profiles for use in the methods and compositions. Such genes are shown in Table I and Table II below. Such genes make up the “expression profile”.


The term “statistically significant number of genes” in the context of this invention differs depending on the degree of change in gene expression observed. The degree of change in gene expression varies with the type of cancer and with the size or spread of the cancer or solid tumor. The degree of change also varies with the immune response of the individual and is subject to variation with each individual. For example, in one embodiment of this invention, a large change, e.g., 2-3 fold increase or decrease in a small number of genes, e.g., in about 10 to 20 genes, is statistically significant. In another embodiment, a smaller relative change in about 15 more genes is statistically significant.


Thus, the methods and compositions described herein contemplate examination of the expression profile of a “statistically significant number of genes” ranging from 5 to about 559 genes in a single profile. In one embodiment, the genes are selected from Table I. In another embodiment, the genes are selected from Table II. In one embodiment, the gene profile is formed by a statistically significant number of 5 or more genes. In one embodiment, the gene profile is formed by a statistically significant number of 10 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 15 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 20 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 25 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 30 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 35 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 40 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 45 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 50 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 60 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 65 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 70 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 75 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 80 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 85 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 90 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 95 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 100 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 200 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 300 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 350 or more genes. In still another embodiment, the gene profile is formed by 400 or more genes. In still another embodiment, the gene profile is formed by 539 genes. In still another embodiment, the gene profile is formed by 559 genes. In still other embodiments, the gene profiles examined as part of these methods contain, as statistically significant numbers of genes, from 10 to 559 genes, and any numbers therebetween. In another embodiment, the gene profile is formed by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, or all 559 genes of Table I. In another embodiment, the gene profile is formed by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or all 100 genes of Table II.


Table I and Table II below refer to a collection of known genes useful in discriminating between a subject having a lung cancer, e.g., NSCLC, and subjects having benign (non-malignant) lung nodules. The sequences of the genes identified in Table I and Table II are publicly available. One skilled in the art may readily reproduce the compositions and methods described herein by use of the sequences of the genes, all of which are publicly available from conventional sources, such as GenBank. The GenBank accession number for each gene is provided.


The term “microarray” refers to an ordered arrangement of hybridizable array elements, preferably polynucleotide or oligonucleotide probes, on a substrate.


The term “polynucleotide,” when used in singular or plural form, generally refers to any polyribonucleotide or polydeoxribonucleotide, which may be unmodified RNA or DNA or modified RNA or DNA. Thus, for instance, polynucleotides as defined herein include, without limitation, single- and double-stranded DNA, DNA including single- and double-stranded regions, single- and double-stranded RNA, and RNA including single- and double-stranded regions, hybrid molecules comprising DNA and RNA that may be single-stranded or, more typically, double-stranded or include single- and double-stranded regions. In addition, the term “polynucleotide” as used herein refers to triple-stranded regions comprising RNA or DNA or both RNA and DNA. The strands in such regions may be from the same molecule or from different molecules. The regions may include all of one or more of the molecules, but more typically involve only a region of some of the molecules. One of the molecules of a triple-helical region often is an oligonucleotide. The term “polynucleotide” specifically includes cDNAs. The term includes DNAs (including cDNAs) and RNAs that contain one or more modified bases. Thus, DNAs or RNAs with backbones modified for stability or for other reasons are “polynucleotides” as that term is intended herein. Moreover, DNAs or RNAs comprising unusual bases, such as inosine, or modified bases, such as tritiated bases, are included within the term “polynucleotides” as defined herein. In general, the term “polynucleotide” embraces all chemically, enzymatically and/or metabolically modified forms of unmodified polynucleotides, as well as the chemical forms of DNA and RNA characteristic of viruses and cells, including simple and complex cells.


The term “oligonucleotide” refers to a relatively short polynucleotide, including, without limitation, single-stranded deoxyribonucleotides, single- or double-stranded ribonucleotides, RNA:DNA hybrids and double-stranded DNAs. Oligonucleotides, such as single-stranded DNA probe oligonucleotides, are often synthesized by chemical methods, for example using automated oligonucleotide synthesizers that are commercially available. However, oligonucleotides can be made by a variety of other methods, including in vitro recombinant DNA-mediated techniques and by expression of DNAs in cells and organisms.


The terms “differentially expressed gene”, “differential gene expression” and their synonyms, which are used interchangeably, refer to a gene whose expression is activated to a higher or lower level in a subject suffering from a disease, specifically cancer, such as lung cancer, relative to its expression in a control subject, such as a subject having a benign nodule. The terms also include genes whose expression is activated to a higher or lower level at different stages of the same disease. It is also understood that a differentially expressed gene may be either activated or inhibited at the nucleic acid level or protein level, or may be subject to alternative splicing to result in a different polypeptide product. Such differences may be evidenced by a change in mRNA levels, surface expression, secretion or other partitioning of a polypeptide, for example. Differential gene expression may include a comparison of expression between two or more genes or their gene products, or a comparison of the ratios of the expression between two or more genes or their gene products, or even a comparison of two differently processed products of the same gene, which differ between normal subjects, non-health controls and subjects suffering from a disease, specifically cancer, or between various stages of the same disease. Differential expression includes both quantitative, as well as qualitative, differences in the temporal or cellular expression pattern in a gene or its expression products among, for example, normal and diseased cells, or among cells which have undergone different disease events or disease stages. For the purpose of this invention, “differential gene expression” is considered to be present when there is a statistically significant (p<0.05) difference in gene expression between the subject and control samples.


The term “over-expression” with regard to an RNA transcript is used to refer to the level of the transcript determined by normalization to the level of reference mRNAs, which might be all measured transcripts in the specimen or a particular reference set of mRNAs.


The phrase “gene amplification” refers to a process by which multiple copies of a gene or gene fragment are formed in a particular cell or cell line. The duplicated region (a stretch of amplified DNA) is often referred to as “amplicon.” Usually, the amount of the messenger RNA (mRNA) produced, i.e., the level of gene expression, also increases in the proportion of the number of copies made of the particular gene expressed.


In the context of the compositions and methods described herein, reference to “10 or more”, “at least 10” etc. of the genes listed in Table I or Table II means any one or any and all combinations of the genes listed. For example, suitable gene expression profiles include profiles containing any number between at least 5 through 559 genes from Table I. In another example, suitable gene expression profiles include profiles containing any number between at least 5 through 100 genes from Table II. In one embodiment, gene profiles formed by genes selected from a table are used in rank order, e.g., genes ranked in the top of the list demonstrated more significant discriminatory results in the tests, and thus may be more significant in a profile than lower ranked genes. However, in other embodiments the genes forming a useful gene profile do not have to be in rank order and may be any gene from the table. As used herein, the term “100 Classifier” or “100 Biomarker Classifier” refers to the 100 genes of Table II. As used herein, the term “559 Classifier” or “559 Biomarker Classifier” refers to the 559 genes of Table I. However, subsets of the genes of Table I or Table II, as described herein, are also useful, and, in another embodiment, the terms may refer to those subsets as well.


As used herein, “labels” or “reporter molecules” are chemical or biochemical moieties useful for labeling a nucleic acid (including a single nucleotide), polynucleotide, oligonucleotide, or protein ligand, e.g., amino acid or antibody. “Labels” and “reporter molecules” include fluorescent agents, chemiluminescent agents, chromogenic agents, quenching agents, radionucleotides, enzymes, substrates, cofactors, inhibitors, magnetic particles, and other moieties known in the art. “Labels” or “reporter molecules” are capable of generating a measurable signal and may be covalently or noncovalently joined or bound to an oligonucleotide or nucleotide (e.g., a non-natural nucleotide) or ligand.


Unless defined otherwise in this specification, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs and by reference to published texts, which provide one skilled in the art with a general guide to many of the terms used in the present application.


I. GENE EXPRESSION PROFILES

The inventors have shown that the gene expression profiles of the whole blood of lung cancer patients differ significantly from those seen in patients having non-cancerous lung nodules. For example, changes in the gene expression products of the genes of Table I and/or Table II can be observed and detected by the methods of this invention in the normal circulating blood of patients with early stage solid lung tumors.


The gene expression profiles described herein provide new diagnostic markers for the early detection of lung cancer and could prevent patients from undergoing unnecessary procedures relating to surgery or biopsy for a benign nodule. Since the risks are very low, the benefit to risk ratio is very high. In one embodiment, the methods and compositions described herein may be used in conjunction with clinical risk factors to help physicians make more accurate decisions about how to manage patients with lung nodules. Another advantage of this invention is that diagnosis may occur early since diagnosis is not dependent upon detecting circulating tumor cells which are present in only vanishing small numbers in early stage lung cancers.


In one aspect, a composition is provided for classifying a nodule as cancerous or benign in a mammalian subject. In one embodiment, the composition includes at least 10 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In another embodiment, the composition includes at least 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the polynucleotide or oligonucleotide or ligand hybridizes to an mRNA.












TABLE I





Rank
Sequence ID#
Gene
Class Name


















1
PLEKHG4
NM_015432.3
Endogenous


2
SLC25A20
NM_000387.5
Endogenous


3
LETM2
NM_144652.3
Endogenous


4
GLIS3
NM_001042413.1
Endogenous


5
LOC100132797
XR_036994.1
Endogenous


6
ARHGEF5
NM_005435.3
Endogenous


7
TCF7L2
NM_030756.4
Endogenous


8
SFRS2IP
NM_004719.2
Endogenous


9
CFD
NM_001928.2
Endogenous


10
AZI2
NM_022461.4
Endogenous


11
STOM
NM_004099.5
Endogenous


12
CD1A
NM_001763.2
Endogenous


13
PANK2
NM_153640.2
Endogenous


14
CNIH4
NM_014184.3
Endogenous


15
EVI2A
NM_014210.3
Endogenous


16
BATF
NM_006399.3
Endogenous


17
TCP1
NM_030752.2
Endogenous


18
BX108566
BX108566.1
Endogenous


19
ANXA1
NM_000700.2
Endogenous


20
PSMA3
NM_152132.2
Endogenous


21
IRF4
NM_002460.1
Endogenous


22
STAG3
NM_012447.3
Endogenous


23
NDUFS4
NM_002495.2
Endogenous


24
HAT1
NM_003642.3
Endogenous


25
ANXA1 b
NM_000700.1
Endogenous


26
LOC148137
NM_144692.1
Endogenous


27
LDHA
NM_001165416.1
Endogenous


28
PSME3
NM_005789.3
Endogenous


29
REPS1
NM_001128617.2
Endogenous


30
CDH5
NM_001795.3
Endogenous


31
NAT5
NM_181528.3
Endogenous


32
PLAC8
NM_001130715.1
Endogenous


33
GSTO1
NM_004832.2
Endogenous


34
DGUOK
NM_080916.2
Endogenous


35
OLR1
NM_002543.3
Endogenous


36
MYST4
NM_012330.3
Endogenous


37
TIMM8B
ENST00000504148.1
Endogenous


38
LY96
NM_015364.4
Endogenous


39
CCDC72
NM_015933.4
Endogenous


40
ATP5I
NM_007100.2
Endogenous


41
WDR91
NM_014149.3
Endogenous


42
MAGEA3
NM_005362.3
Endogenous


43
AK093878
AK093878.1
Endogenous


44
EYA3
NM_001990.3
Endogenous


45
ACAA2
NM_006111.2
Endogenous


46
ETFDH
NM_004453.3
Endogenous


47
CCT6A
NM_001762.3
Endogenous


48
HSCB
NM_172002.3
Endogenous


49
EMR4
NM_001080498.2
Endogenous


50
USP5
NM_003481.2
Endogenous


51
SIK1
NM_173354.3
Endogenous


52
SYNJ1
NM_003895.3
Endogenous


53
KLRB1
NM_002258.2
Endogenous


54
CLK2
XM_941392.1
Endogenous


55
SNORA56
NR_002984.1
Endogenous


56
TP53BP1
NM_005657.2
Endogenous


57
RBX1
NM_014248.3
Endogenous


58
CNPY2
NM_014255.5
Endogenous


59
RELA
NM_021975.2
Endogenous


60
LOC732371
XM_001133019.1
Endogenous


61
TMEM218
NM_001080546.2
Endogenous


62
LOC91431
NM_001099776.1
Endogenous


63
GZMB
NM_004131.3
Endogenous


64
CAMP
NM_004345.4
Endogenous


65
RBM16
NM_014892.4
Endogenous


66
MID1IP1
NM_021242.5
Endogenous


67
LOC399942
XM_934471.1
Endogenous


68
COMMD6
NM_203497.3
Endogenous


69
PPP6C
NM_002721.4
Endogenous


70
BCOR
NM_017745.5
Endogenous


71
PDCD10
NM_145859.1
Endogenous


72
HLA-DMB
NM_002118.3
Endogenous


73
DNAJB1
NM_006145.2
Endogenous


74
KYNU
NM_001032998.1
Endogenous


75
TM2D2
NM_078473.2
Endogenous


76
FAM179A
NM_199280.2
Endogenous


77
FAM43A
NM_153690.4
Endogenous


78
QTRTD1
NM_024638.3
Endogenous


79
MARCKSL1
NM_023009.5
Endogenous


80
FAM193A
NM_003704.3
Endogenous


81
AK026725
AK026725.1
Endogenous


82
SERPINB10
NM_005024.1
Endogenous


83
OSBP
ILMN_1706376.1
Endogenous


84
ST6GAL1
NM_003032.2
Endogenous


85
NDUFAF2
NM_174889.4
Endogenous


86
UBE2I
NM_194259.2
Endogenous


87
CTAG1B
NM_001327.2
Endogenous


88
TRAF6
NM_145803.1
Endogenous


89
REPIN1
NM_014374.3
Endogenous


90
LAMA5
NM_005560.4
Endogenous


91
TBC1D12
NM_015188.1
Endogenous


92
TGIF1 b
NM_173208.1
Endogenous


93
LOC728533
XR_015610.3
Endogenous


94
CLN8
NM_018941.3
Endogenous


95
COX7B
NM_001866.2
Endogenous


96
DYNC2LI1
NM_016008.3
Endogenous


97
ANP32B
NM_006401.2
Endogenous


98
PTGDR2
NM_004778.1
Endogenous


99
MRPS16
NM_016065.3
Endogenous


100
NIPBL
NM_133433.3
Endogenous


101
PPP2R5C
NM_178588.1
Endogenous


102
DPF2
NM_006268.4
Endogenous


103
RAB10
NM_016131.4
Endogenous


104
MYADM
NM_001020820.1
Endogenous


105
CCND3
NM_001760.2
Endogenous


106
CC2D1B
NM_032449.2
Endogenous


107
HLA-G
NM_002127.4
Endogenous


108
CKS2
NM_001827.1
Endogenous


109
HPSE
NM_006665.5
Endogenous


110
UBE2G1
NM_003342.4
Endogenous


111
MED16
NM_005481.2
Endogenous


112
LOC339674
XM_934917.1
Endogenous


113
RNF114
NM_018683.3
Endogenous


114
KIR2DS3
NM_012313.1
Endogenous


115
AMD1
NM_001634.4
Endogenous


116
S100A8
NM_002964.4
Endogenous


117
NFATC4
NM_001136022.2
Endogenous


118
RPL39L
NM_052969.1
Endogenous


119
LOC399753
XM_930634.1
Endogenous


120
FKBP1A
NM_054014.3
Endogenous


121
CHMP5
NM_016410.5
Endogenous


122
CABC1
NM_020247.4
Endogenous


123
HLA-B
NM_005514.6
Endogenous


124
TRIM39
NM_021253.3
Endogenous


125
LOC645914
XM_928884.1
Endogenous


126
CD79A
NM_021601.3
Endogenous


127
GLRX
ILMN_1737308.1
Endogenous


128
RPL26L1
NM_016093.2
Endogenous


129
USP21
NM_012475.4
Endogenous


130
CD70
NM_001252.2
Endogenous


131
SPINK5
NM_006846.3
Endogenous


132
HUWE1
NM_031407.6
Endogenous


133
STK38
NM_007271.3
Endogenous


134
SEMG1
NM_003007.2
Endogenous


135
NDUFA4
NM_002489.3
Endogenous


136
MYADM b
NM_001020820.1
Endogenous


137
SGK1 b
NM_005627.3
Endogenous


138
SLAMF8
NM_020125.2
Endogenous


139
LOC653773
XM_938755.1
Endogenous


140
RPS24
NM_001026.4
Endogenous


141
LOC338799
NR_002809.2
Endogenous


142
MAP3K7
NM_145333.1
Endogenous


143
KLRD1
NM_002262.3
Endogenous


144
LOC732111
XM_001134275.1
Endogenous


145
CD69
NM_001781.2
Endogenous


146
DDIT4
NM_019058.2
Endogenous


147
C1orf222
NM_001003808.1
Endogenous


148
PFAS
NM_012393.2
Endogenous


149
USP9Y
NM_004654.3
Endogenous


150
COLEC12
NM_130386.2
Endogenous


151
VPS37C
NM_017966.4
Endogenous


152
SAP130
NM_024545.3
Endogenous


153
CDC42EP2
NM_006779.3
Endogenous


154
LOC643319
XM_927980.1
Endogenous


155
ASF1B
NM_018154.2
Endogenous


156
AK094576
AK094576.1
Endogenous


157
BANP
NM_079837.2
Endogenous


158
TBK1
NM_013254.2
Endogenous


159
GNS
NM_002076.3
Endogenous


160
IL1R2
NM_173343.1
Endogenous


161
CLEC4C
NM_203503.1
Endogenous


162
TM9SF1
NM_006405.6
Endogenous


163
PTGDR
NM_000953.2
Endogenous


164
GOLGA3
NM_005895.3
Endogenous


165
CLEC4A
NM_194448.2
Endogenous


166
TSC1
NM_000368.4
Endogenous


167
SFMBT1
NM_001005158.2
Endogenous


168
GLT25D1
NM_024656.2
Endogenous


169
LOC100130229
XM_001717158.1
Endogenous


170
PHF8
NM_015107.2
Endogenous


171
PUM1
NM_001020658.1
Endogenous


172
SMARCC1
NM_003074.3
Endogenous


173
AK126342
AK126342.1
Endogenous


174
ACSL5
NM_203379.1
Endogenous


175
TGIF1
NM_003244.2
Endogenous


176
BF375676
BF375676.1
Endogenous


177
SPA17
NM_017425.3
Endogenous


178
FLNB
NM_001457.3
Endogenous


179
FAM105B
NM_138348.4
Endogenous


180
CPPED1
NM_018340.2
Endogenous


181
TRIM32
NM_012210.3
Endogenous


182
RNF34
NM_025126.3
Endogenous


183
SLC45A3
NM_033102.2
Endogenous


184
P2RY10
NM_198333.1
Endogenous


185
AKR1C3
NM_003739.4
Endogenous


186
NME1-NME2
NM_001018136.2
Endogenous


187
AMPD3
NM_000480.2
Endogenous


188
HSP90AB1
NM_007355.3
Endogenous


189
RBM4B
NM_031492.3
Endogenous


190
DMBT1
NM_007329.2
Endogenous


191
TMCO1
NM_019026.3
Endogenous


192
CASP2
NM_032983.3
Endogenous


193
C1orf103
NM_018372.3
Endogenous


194
ARHGAP17
NM_018054.5
Endogenous


195
IFNA17
NM_021268.2
Endogenous


196
CTSZ
NM_001336.3
Endogenous


197
DBI
NM_001079862.1
Endogenous


198
TXNRD1 b
NM_182743.2
Endogenous


199
KIAA0460
NM_015203.4
Endogenous


200
PDGFD
NM_033135.3
Endogenous


201
ATG5
NM_004849.2
Endogenous


202
ITFG2
NM_018463.3
Endogenous


203
HERC1
NM_003922.3
Endogenous


204
MEN1
NM_130799.2
Endogenous


205
IFI27L2
NM_032036.2
Endogenous


206
LOC729887
XR_040891.2
Endogenous


207
PI4K2A
NM_018425.3
Endogenous


208
RAG1
NM_000448.2
Endogenous


209
CREB5
NM_182898.3
Endogenous


210
SLC6A12
NM_003044.4
Endogenous


211
CDKN1A
NM_000389.2
Endogenous


212
AW173314
AW173314.1
Endogenous


213
SAP130 b
NM_024545.3
Endogenous


214
ABCA5
NM_018672.4
Endogenous


215
SLC25A37
NM_016612.2
Endogenous


216
MYLIP
NM_013262.3
Endogenous


217
GATA2
NM_001145662.1
Endogenous


218
ATP5L
NM_006476.4
Endogenous


219
RPS27L
NM_015920.3
Endogenous


220
DB338252
DB338252.1
Endogenous


221
FRAT2
NM_012083.2
Endogenous


222
CCL4
NM_002984.2
Endogenous


223
CD79B
NM_000626.2
Endogenous


224
MBD1
NM_015844.2
Endogenous


225
TIAM1
NM_003253.2
Endogenous


226
HSD11B1
NM_181755.1
Endogenous


227
TPR
NM_003292.2
Endogenous


228
EID2B
NM_152361.2
Endogenous


229
PDSS1
NM_014317.3
Endogenous


230
C9orf164
NM_182635.1
Endogenous


231
ARHGEF18
NM_015318.3
Endogenous


232
TXNRD1
NM_001093771.2
Endogenous


233
HNRNPAB
NM_004499.3
Endogenous


234
TTN
NM_133378.4
Endogenous


235
EP300
NM_001429.2
Endogenous


236
CCDC97
NM_052848.1
Endogenous


237
HK3
NM_002115.2
Endogenous


238
CRKL
NM_005207.3
Endogenous


239
NCOA5
NM_020967.2
Endogenous


240
AK124143
AK124143.1
Endogenous


241
LBA1
NM_014831.2
Endogenous


242
SLC9A3R1
NM_004252.3
Endogenous


243
CRY2
NM_021117.3
Endogenous


244
ATG4B
NM_178326.2
Endogenous


245
CD97
NM_078481.3
Endogenous


246
TTC9
NM_015351.1
Endogenous


247
BMPR2
NM_001204.6
Endogenous


248
LPIN2
NM_014646.2
Endogenous


249
UBA1
NM_003334.3
Endogenous


250
SETD1B
XM_037523.11
Endogenous


251
PRPF8
NM_006445.3
Endogenous


252
RNASE2
NM_002934.2
Endogenous


253
KIAA0101
NM_014736.4
Endogenous


254
ARG1
NM_000045.3
Endogenous


255
UBTF
NM_001076683.1
Endogenous


256
MFSD1
NM_022736.2
Endogenous


257
IDO1
NM_002164.3
Endogenous


258
MS4A6A
NM_022349.3
Endogenous


259
C22orf30
NM_173566.2
Endogenous


260
HNRNPK
NM_031263.2
Endogenous


261
ARL8B
NM_018184.2
Endogenous


262
SETD2
NM_014159.6
Endogenous


263
NCAPG
NM_022346.4
Endogenous


264
EEF1B2
NM_001037663.1
Endogenous


265
TRIM39 b
NM_172016.2
Endogenous


266
EHD4
NM_139265.3
Endogenous


267
IRF1
NM_002198.1
Endogenous


268
LOC100129022
XM_001716591.1
Endogenous


269
TRAF3IP2
NM_147686.3
Endogenous


270
PSMA6
NM_002791.2
Endogenous


271
RHOG
NM_001665.3
Endogenous


272
CN312986
CN312986.1
Endogenous


273
PSMB8
NM_004159.4
Endogenous


274
ZNF239
NM_001099283.1
Endogenous


275
CLPTM1
NM_001294.3
Endogenous


276
NADK
NM_023018.4
Endogenous


277
C8orf76
NM_032847.2
Endogenous


278
LIF
NM_002309.3
Endogenous


279
EGR1
NM_001964.2
Endogenous


280
ARG1 b
NM_000045.2
Endogenous


281
MERTK
NM_006343.2
Endogenous


282
RHOU
NM_021205.5
Endogenous


283
PFDN5 b
NM_145897.2
Endogenous


284
MAGEA1
NM_004988.4
Endogenous


285
SEC24C
NM_198597.2
Endogenous


286
SLC11A1
NM_000578.3
Endogenous


287
TCF20
NM_181492.2
Endogenous


288
AHCYL1
NM_001242676.1
Endogenous


289
TPT1
NM_003295.3
Endogenous


290
KIR2DL5A
XM_001126354.1
Endogenous


291
IRAK2
NM_001570.3
Endogenous


292
C17orf51
XM_944416.1
Endogenous


293
C14orf156
NM_031210.5
Endogenous


294
ATP2C1
NM_014382.3
Endogenous


295
SOCS1
NM_003745.1
Endogenous


296
JAK1
NM_002227.1
Endogenous


297
RSL24D1
NM_016304.2
Endogenous


298
AP2S1
NM_021575.3
Endogenous


299
PHRF1
NM_020901.3
Endogenous


300
GPI
NM_000175.2
Endogenous


301
NCR1
NM_004829.5
Endogenous


302
AKAP4
NM_139289.1
Endogenous


303
CD160
NM_007053.3
Endogenous


304
DDX23
NM_004818.2
Endogenous


305
GNL3
NM_014366.4
Endogenous


306
NFKB2
NM_002502.2
Endogenous


307
CSK
NM_004383.2
Endogenous


308
PELP1
NM_014389.2
Endogenous


309
KLRF1 b
NM_016523.2
Endogenous


310
CS
NM_004077.2
Endogenous


311
PHCA
NM_018367.6
Endogenous


312
LOC644315
XR_017529.2
Endogenous


313
NUDT18
NM_024815.3
Endogenous


314
XCL2
NM_003175.3
Endogenous


315
KLRC1
NM_002259.3
Endogenous


316
ARHGAP18
NM_033515.2
Endogenous


317
CTDSP2
NM_005730.3
Endogenous


318
P2RY5
NM_005767.5
Endogenous


319
CREB1
NM_004379.3
Endogenous


320
RHOB
NM_004040.3
Endogenous


321
DCAF7
NM_005828.4
Endogenous


322
NUP153
NM_005124.3
Endogenous


323
AFTPH
NM_017657.4
Endogenous


324
EWSR1
NM_005243.3
Endogenous


325
LYN
NM_002350.1
Endogenous


326
CYBB
NM_000397.3
Endogenous


327
TMEM70
NM_017866.5
Endogenous


328
PPP1R3E
XM_927029.1
Endogenous


329
PSMB1
NM_002793.3
Endogenous


330
RERE b
NM_012102.3
Endogenous


331
RXRA
NM_002957.5
Endogenous


332
GZMA
NM_006144.3
Endogenous


333
ERLIN1
NM_006459.3
Endogenous


334
KRTAP10-3
NM_198696.2
Endogenous


335
SAMSN1
NM_022136.3
Endogenous


336
LRRC47
NM_020710.2
Endogenous


337
MARCKS
NM_002356.6
Endogenous


338
HOPX
NM_139211.4
Endogenous


339
KLRF1
NM_016523.1
Endogenous


340
NFAT5
NM_138713.3
Endogenous


341
SLC15A2
NM_021082.3
Endogenous


342
STK16
NM_003691.2
Endogenous


343
KIR_Activating_Subgroup_2
NM_014512.1
Endogenous


344
TBCE
NM_001079515.2
Endogenous


345
BAG3
NM_004281.3
Endogenous


346
SFRS4
NM_005626.4
Endogenous


347
AW270402
AW270402.1
Endogenous


348
CCL3L1
NM_021006.4
Endogenous


349
HERC3
NM_014606.2
Endogenous


350
RPL34
NM_000995.3
Endogenous


351
ALAS1
NM_000688.4
Endogenous


352
CCR9
NM_031200.1
Endogenous


353
CORO1C
ILMN_1745954.1
Endogenous


354
FAIM3
NM_005449.4
Endogenous


355
SFPQ
NM_005066.2
Endogenous


356
HOOK3
NM_032410.3
Endogenous


357
CD36
NM_000072.3
Endogenous


358
IL7
NM_000880.2
Endogenous


359
CBLL1
NM_024814.3
Endogenous


360
HVCN1
NM_032369.3
Endogenous


361
HMGB1
NM_002128.4
Endogenous


362
SIN3A
NM_015477.2
Endogenous


363
CASP3
NM_032991.2
Endogenous


364
BQ189294
BQ189294.1
Endogenous


365
NDRG2
NM_016250.2
Endogenous


366
BX400436
BX400436.2
Endogenous


367
IFNAR2
NM_000874.3
Endogenous


368
MS4A6A b
NM_152851.2
Endogenous


369
KLRC2
NM_002260.3
Endogenous


370
S100A12 b
NM_005621.1
Endogenous


371
ATM
NM_000051.3
Endogenous


372
NLRP3
NM_001079821.2
Endogenous


373
HAVCR2
NM_032782.3
Endogenous


374
C4B
NM_001002029.3
Endogenous


375
CTSW
NM_001335.3
Endogenous


376
TMEM170B
NM_001100829.2
Endogenous


377
EIF4ENIF1
NM_019843.2
Endogenous


378
CCL3
NM_002983.2
Endogenous


379
CHCHD3
NM_017812.2
Endogenous


380
CST7
NM_003650.3
Endogenous


381
SFRS15
NM_020706.2
Endogenous


382
STIP1
NM_006819.2
Endogenous


383
MPDU1
NM_004870.3
Endogenous


384
DHX16 b
NM_001164239.1
Endogenous


385
INTS4
NM_033547.3
Endogenous


386
USP16
NM_001032410.1
Endogenous


387
IFNAR1
NM_000629.2
Endogenous


388
ITCH
NM_001257138.1
Endogenous


389
FOXK2
NM_004514.3
Endogenous


390
LOC642812
XR_036892.1
Endogenous


391
KIAA1967
NM_021174.5
Endogenous


392
LOC440928
XM_942885.1
Endogenous


393
NDUFV2
NM_021074.4
Endogenous


394
IL4
NM_000589.2
Endogenous


395
CIAPIN1
NM_020313.3
Endogenous


396
CXCL2
NM_002089.3
Endogenous


397
TXN
NM_003329.3
Endogenous


398
PRG2
NM_002728.4
Endogenous


399
MS4A2
NM_000139.3
Endogenous


400
YPEL1
NM_013313.4
Endogenous


401
POLR2A
NM_000937.4
Endogenous


402
C19orf10
NM_019107.3
Endogenous


403
IGFBP7
NM_001553.2
Endogenous


404
ITGAE
NM_002208.4
Endogenous


405
CXCR5 b
NM_001716.3
Endogenous


406
BID
NM_001196.2
Endogenous


407
LOC100133273
XR_039238.1
Endogenous


408
FNBP1
NM_015033.2
Endogenous


409
IFNGR1
NM_000416.1
Endogenous


410
STAT6
NM_003153.4
Endogenous


411
CR2
NM_001006658.2
Endogenous


412
CCL3L3
NM_001001437.3
Endogenous


413
RFWD2
NM_022457.6
Endogenous


414
SP2
NM_003110.5
Endogenous


415
BAT2D1
NM_015172.3
Endogenous


416
CX3CL1
NM_002996.3
Endogenous


417
GPATCH3
NM_022078.2
Endogenous


418
CASP1
NM_033294.3
Endogenous


419
NAGK
NM_017567.4
Endogenous


420
IER5
NM_016545.4
Endogenous


421
PHLPP2
NM_015020.3
Endogenous


422
RPL31
NM_000993.4
Endogenous


423
SPEN
NM_015001.2
Endogenous


424
TMSB4X
NM_021109.3
Endogenous


425
IL8RB
NM_001557.3
Endogenous


426
XPC
NR_027299.1
Endogenous


427
SNX11
NM_152244.1
Endogenous


428
SPN
NM_003123.3
Endogenous


429
ANKHD1
NM_017747.2
Endogenous


430
CCR6
NM_031409.2
Endogenous


431
DZIP3
NM_014648.3
Endogenous


432
MRPL27
NM_148571.1
Endogenous


433
SREBF1
NM_001005291.2
Endogenous


434
CD14
NM_000591.2
Endogenous


435
TNFSF8
NM_001244.3
Endogenous


436
C3
NM_000064.2
Endogenous


437
FAM50B
NM_012135.1
Endogenous


438
RASSF5
NM_182664.2
Endogenous


439
BU743228
BU743228.1
Endogenous


440
NFATC1
NM_172389.1
Endogenous


441
DOCK5
NM_024940.6
Endogenous


442
PACS1
NM_018026.3
Endogenous


443
CYP1B1
NM_000104.3
Endogenous


444
CLIC3
ILMN_1796423.1
Endogenous


445
PSMA4
NM_002789.3
Endogenous


446
ZNF341
NM_032819.4
Endogenous


447
PRPF3
NM_004698.2
Endogenous


448
PSMA6 b
NM_002791.2
Endogenous


449
LOC648927
XR_038906.2
Endogenous


450
KCTD12
NM_138444.3
Endogenous


451
LOC440389
XM_498648.3
Endogenous


452
U2AF2
NM_007279.2
Endogenous


453
CLEC5A
NM_013252.2
Endogenous


454
PRRG4
NM_024081.5
Endogenous


455
TNFRSF9
NM_001561.5
Endogenous


456
NDUFB3
NM_002491.2
Endogenous


457
BCL6
NM_001130845.1
Endogenous


458
SGK1
NM_005627.3
Endogenous


459
CIP29
NM_033082.3
Endogenous


460
CD160 b
NM_007053.2
Endogenous


461
ARCN1
NM_001655.4
Endogenous


462
LOC151162
NR_024275.1
Endogenous


463
GPR65
NM_003608.3
Endogenous


464
CCR1
NM_001295.2
Endogenous


465
TFCP2
NM_005653.4
Endogenous


466
SGK
NM_005627.3
Endogenous


467
RNF214
NM_207343.3
Endogenous


468
TMC8
NM_152468.4
Endogenous


469
RBM14
NM_006328.3
Endogenous


470
USP34
NM_014709.3
Endogenous


471
BACH2
NM_021813.3
Endogenous


472
LILRA5
NM_021250.3
Endogenous


473
C5orf21
NM_032042.5
Endogenous


474
LOC441073
XR_018937.2
Endogenous


475
TAX1BP1
NM_001079864.2
Endogenous


476
TNFSF13
NM_003808.3
Endogenous


477
PIM2
NM_006875.3
Endogenous


478
RNF19B
NM_153341.3
Endogenous


479
EPHX2
NM_001979.5
Endogenous


480
LILRA5 b
NM_181879.2
Endogenous


481
ABCF1
NM_001025091.1
Endogenous


482
C4orf27
NM_017867.2
Endogenous


483
PSMB7
NM_002799.2
Endogenous


484
LPCAT4
NM_153613.2
Endogenous


485
TRIM21
NM_003141.3
Endogenous


486
LOC728835
XM_001133190.1
Endogenous


487
NFKB1
NM_003998.3
Endogenous


488
CR2 b
NM_001006658.1
Endogenous


489
HMGB2
NM_002129.3
Endogenous


490
IL1B
NM_000576.2
Endogenous


491
C20orf52
NM_080748.2
Endogenous


492
DNAJB6
NM_058246.3
Endogenous


493
PFDN5
NM_145897.2
Endogenous


494
RPS6
NM_001010.2
Endogenous


495
LEF1
NM_016269.4
Endogenous


496
DKFZp761P0423
XM_291277.4
Endogenous


497
LOC647340
XR_018104.1
Endogenous


498
FTHL16
XR_041433.1
Endogenous


499
COX6C
NM_004374.2
Endogenous


500
BCL10
NM_003921.2
Endogenous


501
CD48
NM_001778.2
Endogenous


502
ZMIZ1
NM_020338.3
Endogenous


503
GZMH
NM_033423.4
Endogenous


504
TRRAP
NM_003496.3
Endogenous


505
SH2D3C
NM_170600.2
Endogenous


506
UBC
NM_021009.3
Endogenous


507
TXNDC17
NM_032731.3
Endogenous


508
ATP5J2
NM_004889.3
Endogenous


509
KIAA1267
NM_015443.3
Endogenous


510
RFX1
NM_002918.4
Endogenous


511
WDR1
NM_005112.4
Endogenous


512
LOC100129697
XM_001732822.2
Endogenous


513
TOMM7
NM_019059.2
Endogenous


514
ARHGAP26
NM_015071.4
Endogenous


515
HSPA6
NM_002155.4
Endogenous


516
FLJ10357
NM_018071.4
Endogenous


517
ITGAL
NM_002209.2
Endogenous


518
BX089765
BX089765.1
Endogenous


519
RERE
NM_001042682.1
Endogenous


520
C15orf39
NM_015492.4
Endogenous


521
BX436458
BX436458.2
Endogenous


522
RWDD1
NM_001007464.2
Endogenous


523
TMBIM6
NM_003217.2
Endogenous


524
SLC6A6
NM_003043.5
Endogenous


525
KIAA0174
NM_014761.3
Endogenous


526
IL16
NM_004513.4
Endogenous


527
EGLN1
NM_022051.1
Endogenous


528
LOC391126
XR_017684.2
Endogenous


529
TAPBP
NM_003190.4
Endogenous


530
NUMB
NM_001005744.1
Endogenous


531
CENTD2
NM_001040118.2
Endogenous


532
CLSTN1
NM_001009566.2
Endogenous


533
PSMA4 b
NM_002789.4
Endogenous


534
LOC648000
XM_371757.4
Endogenous


535
COX7C
NM_001867.2
Endogenous


536
PIK3CD
NM_005026.3
Endogenous


537
UQCRQ
NM_014402.4
Endogenous


538
IDS
NM_006123.4
Endogenous


539
C19orf59
NM_174918.2
Endogenous


540
MYL12A
NM_006471.3
Housekeeping


541
EIF2B4
NM_015636.3
Housekeeping


542
DGUOK b
NM_080916.2
Housekeeping


543
PSMC1
NM_002802.2
Housekeeping


544
CHFR
NM_018223.2
Housekeeping


545
ARPC2
NM_005731.2
Housekeeping


546
ATP5B
NM_001686.3
Housekeeping


547
RPL3
NM_001033853.1
Housekeeping


548
ZNF143
NM_003442.5
Housekeeping


549
PSMD7
NM_002811.4
Housekeeping


550
TBP
NM_003194.4
Housekeeping


551
DHX16
NM_003587.4
Housekeeping


552
TUG1
NR_002323.2
Housekeeping


553
GUSB
NM_000181.3
Housekeeping


554
HDAC3
NM_003883.3
Housekeeping


555
SDHA
NM_004168.3
Housekeeping


556
PGK1
NM_000291.3
Housekeeping


557
STAMBP
NM_006463.4
Housekeeping


558
MTCH1
NM_014341.2
Housekeeping


559
TUBB
NM_178014.2
Housekeeping



















TABLE II





Rank
Sequence ID#
Gene
Class Name


















1
TPR
NM_003292.2
Endogenous


2
DNAJB1
NM_006145.2
Endogenous


3
PDCD10
NM_145859.1
Endogenous


4
PSMB7
NM_002799.2
Endogenous


5
MERTK
NM_006343.2
Endogenous


6
AFTPH
NM_017657.4
Endogenous


7
BCOR
NM_017745.5
Endogenous


8
RASSF5
NM_182664.2
Endogenous


9
SNX11
NM_152244.1
Endogenous


10
ANP32B
NM_006401.2
Endogenous


11
C4B
NM_001002029.3
Endogenous


12
NME1-NME2
NM_001018136.2
Endogenous


13
DGUOK
NM_080916.2
Endogenous


14
CYP1B1
NM_000104.3
Endogenous


15
MPDU1
NM_004870.3
Endogenous


16
MED16
NM_005481.2
Endogenous


17
FAM179A
NM_199280.2
Endogenous


18
CPPED1
NM_018340.2
Endogenous


19
LOC648927
XR_038906.2
Endogenous


20
ANKHD1
NM_017747.2
Endogenous


21
CN312986
CN312986.1
Endogenous


22
PHCA
NM_018367.6
Endogenous


23
CD1A
NM_001763.2
Endogenous


24
NCOA5
NM_020967.2
Endogenous


25
SLC6A12
NM_003044.4
Endogenous


26
LOC728533
XR_015610.3
Endogenous


27
TRAF3IP2
NM_147686.3
Endogenous


28
TBCE
NM_001079515.2
Endogenous


29
CCT6A
NM_001762.3
Endogenous


30
P2RY5
NM_005767.5
Endogenous


31
RNASE2
NM_002934.2
Endogenous


32
CLN8
NM_018941.3
Endogenous


33
REPS1
NM_001128617.2
Endogenous


34
TPT1
NM_003295.3
Endogenous


35
LOC100129022
XM_001716591.1
Endogenous


36
KLRC1
NM_002259.3
Endogenous


37
AZI2
NM_022461.4
Endogenous


38
FAM193A
NM_003704.3
Endogenous


39
PLAC8
NM_001130715.1
Endogenous


40
LDHA
NM_001165416.1
Endogenous


41
GPATCH3
NM_022078.2
Endogenous


42
RBM14
NM_006328.3
Endogenous


43
KYNU
NM_001032998.1
Endogenous


44
PPP2R5C
NM_178588.1
Endogenous


45
S100A12 b
NM_005621.1
Endogenous


46
SFMBT1
NM_001005158.2
Endogenous


47
CCR6
NM_031409.2
Endogenous


48
TRIM39
NM_021253.3
Endogenous


49
AK126342
AK126342.1
Endogenous


50
SLC45A3
NM_033102.2
Endogenous


51
IL4
NM_000589.2
Endogenous


52
UBE2I
NM_194259.2
Endogenous


53
PRPF3
NM_004698.2
Endogenous


54
NDUFB3
NM_002491.2
Endogenous


55
CRKL
NM_005207.3
Endogenous


56
IDO1
NM_002164.3
Endogenous


57
PUM1
NM_001020658.1
Endogenous


58
BCL10
NM_003921.2
Endogenous


59
TMBIM6
NM_003217.2
Endogenous


60
C17orf51
XM_944416.1
Endogenous


61
BANP
NM_079837.2
Endogenous


62
HAVCR2
NM_032782.3
Endogenous


63
BAG3
NM_004281.3
Endogenous


64
DBI
NM_001079862.1
Endogenous


65
C4orf27
NM_017867.2
Endogenous


66
TSC1
NM_000368.4
Endogenous


67
LPCAT4
NM_153613.2
Endogenous


68
SAMSN1
NM_022136.3
Endogenous


69
SNORA56
NR_002984.1
Endogenous


70
ARG1
NM_000045.3
Endogenous


71
IL1R2
NM_173343.1
Endogenous


72
CCND3
NM_001760.2
Endogenous


73
USP9Y
NM_004654.3
Endogenous


74
ATP2C1
NM_014382.3
Endogenous


75
PSMB1
NM_002793.3
Endogenous


76
NDUFAF2
NM_174889.4
Endogenous


77
VPS37C
NM_017966.4
Endogenous


78
HAT1
NM_003642.3
Endogenous


79
LOC732371
XM_001133019.1
Endogenous


80
LOC148137
NM_144692.1
Endogenous


81
CCR1
NM_001295.2
Endogenous


82
CCDC97
NM_052848.1
Endogenous


83
PPP6C
NM_002721.4
Endogenous


84
GPI
NM_000175.2
Endogenous


85
PIM2
NM_006875.3
Endogenous


86
STAT6
NM_003153.4
Endogenous


87
BATF
NM_006399.3
Endogenous


88
EIF4ENIF1
NM_019843.2
Endogenous


89
HSP90AB1
NM_007355.3
Endogenous


90
U2AF2
NM_007279.2
Endogenous


91
CYBB
NM_000397.3
Endogenous


92
WDR1
NM_005112.4
Endogenous


93
PSMB8
NM_004159.4
Endogenous


94
TBC1D12
NM_015188.1
Endogenous


95
LOC648000
XM_371757.4
Endogenous


96
XCL2
NM_003175.3
Endogenous


97
PTGDR
NM_000953.2
Endogenous


98
ACSL5
NM_203379.1
Endogenous


99
CASP1
NM_033294.3
Endogenous


100
UBTF
NM_001076683.1
Endogenous









In one embodiment, a novel gene expression profile or signature can identify and distinguish patients having cancerous tumors from patients having benign nodules. See for example the genes identified in Table I and Table II which may form a suitable gene expression profile. In another embodiment, a portion of the genes of Table I form a suitable profile. In yet another embodiment, a portion of the genes of Table II form a suitable profile. As discussed herein, these profiles are used to distinguish between cancerous and non-cancerous tumors by generating a discriminant score based on differences in gene expression profiles as exemplified below. The validity of these signatures was established on samples collected at different locations by different groups in a cohort of patients with undiagnosed lung nodules. See Example 7 and FIGS. 2A-2B and FIG. 6. The lung cancer signatures or gene expression profiles identified herein (i.e., Table I or Table II) may be further optimized to reduce the numbers of gene expression products necessary and increase accuracy of diagnosis.


In one embodiment, the composition includes 10 to 559 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In another embodiment, the composition includes 10 to 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table II. In another embodiment, the composition includes 10 to 559 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In another embodiment, the composition includes 10 to 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table II. In another embodiment, the composition includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, or 559 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In another embodiment, the composition includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table II. In one embodiment, the composition includes at least 3 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 5 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 10 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 15 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 20 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 25 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 30 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 35 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 40 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 45 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 50 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 55 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 60 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 65 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 70 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 75 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 80 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 85 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 90 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 95 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 150 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 200 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 250 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 300 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 350 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 400 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 450 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 500 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes polynucleotides or oligonucleotides or ligands capable of hybridizing to each different gene, gene fragment, gene transcript or expression product listed in Table I. In another embodiment, the composition includes polynucleotides or oligonucleotides or ligands capable of hybridizing to each different gene, gene fragment, gene transcript or expression product listed in Table II.


In yet another embodiment, the expression profile is formed by the first 3 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 5 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 10 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 15 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 20 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 25 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 30 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 35 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 40 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 45 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 50 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 55 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 60 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 65 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 70 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 75 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 80 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 85 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 90 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 95 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 100 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 150 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 200 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 250 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 300 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 350 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 400 genes in rank order of Table I. In yet another embodiment, the expression profile is formed by the first 539 genes in rank order of Table I.


As discussed below, the compositions described herein can be used with the gene expression profiling methods which are known in the art. Thus, the compositions can be adapted accordingly to suit the method for which they are intended to be used. In one embodiment, at least one polynucleotide or oligonucleotide or ligand is attached to a detectable label. In certain embodiments, each polynucleotide or oligonucleotide is attached to a different detectable label, each capable of being detected independently. Such reagents are useful in assays such as the nCounter, as described below.


In another embodiment, the composition comprises a capture oligonucleotide or ligand, which hybridizes to at least one polynucleotide or oligonucleotide or ligand. In one embodiment, such capture oligonucleotide or ligand may include a nucleic acid sequence which is specific for a portion of the oligonucleotide or polynucleotide or ligand which is specific for the gene of interest. The capture ligand may be a peptide or polypeptide which is specific for the ligand to the gene of interest. In one embodiment, the capture ligand is an antibody, as in a sandwich ELISA.


The capture oligonucleotide also includes a moiety which allows for binding with a substrate. Such substrate includes, without limitation, a plate, bead, slide, well, chip or chamber. In one embodiment, the composition includes a capture oligonucleotide for each different polynucleotide or oligonucleotide which is specific to a gene of interest. Each capture oligonucleotide may contain the same moiety which allows for binding with the same substrate. In one embodiment, the binding moiety is biotin.


Thus, a composition for such diagnosis or evaluation in a mammalian subject as described herein can be a kit or a reagent. For example, one embodiment of a composition includes a substrate upon which the ligands used to detect and quantitate mRNA are immobilized. The reagent, in one embodiment, is an amplification nucleic acid primer (such as an RNA primer) or primer pair that amplifies and detects a nucleic acid sequence of the mRNA. In another embodiment, the reagent is a polynucleotide probe that hybridizes to the target sequence. In another embodiment, the target sequences are illustrated in Table III. In another embodiment, the reagent is an antibody or fragment of an antibody. The reagent can include multiple said primers, probes or antibodies, each specific for at least one gene, gene fragment or expression product of Table I or Table II. Optionally, the reagent can be associated with a conventional detectable label.


In another embodiment, the composition is a kit containing the relevant multiple polynucleotides or oligonucleotide probes or ligands, optional detectable labels for same, immobilization substrates, optional substrates for enzymatic labels, as well as other laboratory items. In still another embodiment, at least one polynucleotide or oligonucleotide or ligand is associated with a detectable label. In certain embodiments, the reagent is immobilized on a substrate. Exemplary substrates include a microarray, chip, microfluidics card, or chamber.


In one embodiment, the composition is a kit designed for use with the nCounter Nanostring system, as further discussed below.


II. GENE EXPRESSION PROFILING METHODS

Methods of gene expression profiling that were used in generating the profiles useful in the compositions and methods described herein or in performing the diagnostic steps using the compositions described herein are known and well summarized in U.S. Pat. No. 7,081,340. Such methods of gene expression profiling include methods based on hybridization analysis of polynucleotides, methods based on sequencing of polynucleotides, and proteomics-based methods. The most commonly used methods known in the art for the quantification of mRNA expression in a sample include northern blotting and in situ hybridization; RNAse protection assays; nCounter® Analysis; and PCR-based methods, such as RT-PCR. Alternatively, antibodies may be employed that can recognize specific duplexes, including DNA duplexes, RNA duplexes, and DNA-RNA hybrid duplexes or DNA-protein duplexes. Representative methods for sequencing-based gene expression analysis include Serial Analysis of Gene Expression (SAGE), and gene expression analysis by massively parallel signature sequencing (MPSS).


In certain embodiments, the compositions described herein are adapted for use in the methods of gene expression profiling described herein, and those known in the art.


A. Patient Sample


The “sample” or “biological sample” as used herein means any biological fluid or tissue that contains immune cells and/or cancer cells. In one embodiment, a suitable sample is whole blood. In another embodiment, the sample may be venous blood. In another embodiment, the sample may be arterial blood. In another embodiment, a suitable sample for use in the methods described herein includes peripheral blood, more specifically peripheral blood mononuclear cells. Other useful biological samples include, without limitation, plasma or serum. In still other embodiment, the sample is saliva, urine, synovial fluid, bone marrow, cerebrospinal fluid, vaginal mucus, cervical mucus, nasal secretions, sputum, semen, amniotic fluid, bronchoalveolar lavage fluid, and other cellular exudates from a subject suspected of having a lung disease. Such samples may further be diluted with saline, buffer or a physiologically acceptable diluent. Alternatively, such samples are concentrated by conventional means. It should be understood that the use or reference throughout this specification to any one biological sample is exemplary only. For example, where in the specification the sample is referred to as whole blood, it is understood that other samples, e.g., serum, plasma, etc., may also be employed in another embodiment.


In one embodiment, the biological sample is whole blood, and the method employs the PaxGene Blood RNA Workflow system (Qiagen). That system involves blood collection (e.g., single blood draws) and RNA stabilization, followed by transport and storage, followed by purification of Total RNA and Molecular RNA testing. This system provides immediate RNA stabilization and consistent blood draw volumes. The blood can be drawn at a physician's office or clinic, and the specimen transported and stored in the same tube. Short term RNA stability is 3 days at between 18-25° C. or 5 days at between 2-8° C. Long term RNA stability is 4 years at −20 to −70° C. This sample collection system enables the user to reliably obtain data on gene expression in whole blood. In one embodiment, the biological sample is whole blood. While the PAXgene system has more noise than the use of PBMC as a biological sample source, the benefits of PAXgene sample collection outweighs the problems. Noise can be subtracted bioinformatically by the person of skill in the art.


In one embodiment, the biological samples may be collected using the proprietary PaxGene Blood RNA System (PreAnalytiX, a Qiagen, BD company). The PAXgene Blood RNA System comprises two integrated components: PAXgene Blood RNA Tube and the PAXgene Blood RNA Kit. Blood samples are drawn directly into PAXgene Blood RNA Tubes via standard phlebotomy technique. These tubes contain a proprietary reagent that immediately stabilizes intracellular RNA, minimizing the ex-vivo degradation or up-regulation of RNA transcripts. The ability to eliminate freezing, batch samples, and to minimize the urgency to process samples following collection, greatly enhances lab efficiency and reduces costs. Thereafter, the miRNA is detected and/or measured using a variety of assays.


B. Nanostring Analysis


A sensitive and flexible quantitative method that is suitable for use with the compositions and methods described herein is the nCounter® Analysis system (NanoString Technologies, Inc., Seattle Wash.). The nCounter Analysis System utilizes a digital color-coded barcode technology that is based on direct multiplexed measurement of gene expression and offers high levels of precision and sensitivity (<1 copy per cell). The technology uses molecular “barcodes” and single molecule imaging to detect and count hundreds of unique transcripts in a single reaction. Each color-coded barcode is attached to a single target-specific probe (i.e., polynucleotide, oligonucleotide or ligand) corresponding to a gene of interest, i.e., a gene of Table I. Mixed together with controls, they form a multiplexed CodeSet. In one embodiment, the CodeSet includes all 559 genes of Table I. In another embodiment, the CodeSet includes all 100 genes of Table II. In another embodiment, the CodeSet includes at least 3 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 5 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 10 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 15 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 20 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 25 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 30 genes of Table I or Table II. In yet another embodiment, the CodeSet includes at least 40 genes of Table I or Table II. In yet another embodiment, the CodeSet includes at least 50 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 60 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 70 genes of Table I or Table II. In yet another embodiment, the CodeSet includes at least 80 genes of Table I or Table II. In yet another embodiment, the CodeSet includes at least 90 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 100 genes of Table I. In another embodiment, the CodeSet includes at least 200 genes of Table I. In another embodiment, the CodeSet includes at least 300 genes of Table I. In yet another embodiment, the CodeSet includes at least 400 genes of Table I. In yet another embodiment, the CodeSet includes at least 500 genes of Table I. In yet another embodiment, the CodeSet is formed by the first 539 genes in rank order of Table I. In yet another embodiment, the CodeSet includes any subset of genes of Table I, as described herein. In another embodiment, the CodeSet includes any subset of genes of Table II, as described herein.


The NanoString platform employs two ˜50 base probes per mRNA that hybridizes in solution. The Reporter Probe carries the signal; the Capture Probe allows the complex to be immobilized for data collection. The probes are mixed with the patient sample. After hybridization, the excess probes are removed and the probe/target complexes aligned and immobilized to a substrate, e.g., in the nCounter Cartridge.


The target sequences utilized in the Examples below for each of the genes of Table I and Table II are shown in Table III below, and are reproduced in the sequence listing. These sequences are portions of the published sequences of these genes. Suitable alternatives may be readily designed by one of skill in the art.


Sample Cartridges are placed in the Digital Analyzer for data collection. Color codes on the surface of the cartridge are counted and tabulated for each target molecule.


A benefit of the use of the NanoString nCounter system is that no amplification of mRNA is necessary in order to perform the detection and quantification. However, in alternate embodiments, other suitable quantitative methods are used. See, e.g., Geiss et al, Direct multiplexed measurement of gene expression with color-coded probe pairs, Nat Biotechnol. 2008 March; 26(3):317-25. doi: 10.1038/nbt1385. Epub 2008 Feb. 17, which is incorporated herein by reference in its entirety.













TABLE III






Se-






quence

Posi-




ID#
Gene
tion
Target Sequence







  1
ABCA5
NM_018672.4
 6839-
AAGGAAGACTGTGTGTAGAATCT





 6938
TACGTAATAGTCTGATTCTTTGA






CTCTGTGGCTAGAATGACAGTTA






TCTATGGAGGTGGTAGAATTAAG






CCATACCT





  2
ABCF1
NM_00102509
 2875-
CCTAAACAAACAAGAGGTGACC




1.1
 2974 
ACCTTATTGTGAGGTTCCATCCA






GCCAAGTTTATGTGGCCTATTGT






CTCAGGACTCTCATCACTCAGAA






GCCTGCCTC





  3
ACAA2
NM_006111.2
 1605-
CTCACTGTGACCCATCCTTACTC





 1704
TACTTGGCCAGGCCACAGTAAAA






CAAGTGACCTTCAGAGCAGCTGC






CACAACTGGCCATGCCCTGCCAT






TGAAACAG





  4
PHCA
NM_018367.6
 3324-
AGCCAATAGTGATTTGTTTGCAT





 3423
ATCACCTAATGTGAAAAGTGCTC






ATCTGTGAACTCTACAGCAAATT






ATATTTTAGAAAATACTTTGTGA






GGCCGGGC





  5
ACSL5
NM_203379.1
 2701-
CTATCACTCATGTCAATCATATC





 2800
TATGAGACAAATGTCTCCGATGC






TCTTCTGCGTAAATTAAATTGTG






TACTGAAGGGAAAAGTTTGATCA






TACCAAAC





  6
CABC1
NM_020247.4
 2536-
TTCTAGAGTGAGATTTGTGTTTT





 2635
CTGCCCTTTTCCTCTCCAGCCGA






TGGGCTGGAGCTGGGAGAGGTGC






TGAGCTAACAGTGCCAACAAGT






GCTCCTTAA





  7
CD97
NM_078481.3
 3186-
GCCAGTACTCGGGACAGACTAA





 3285
GGGCGCTTGTCCCATCCTGGACT






TTTCCTCTCATGTCTTTGCTGCA






GAACTGAAGAGACTAGGCGCTGG






GGCTCAGCT





  8
AFTPH
NM_017657.4
 2741-
CTACCACCCGTCCAGTTTGACTG





 2840
GAGTAGCAGTGGCCTTACTAACC






CTTTAGATGGTGTGGATCCGGAG






TTGTATGAGTTAACAACTTCTAA






GCTGGAAA





  9
AHCYL1
NM_00124267
 2401-
CTACCCGGCAGGTAGGTTAGATG




6.1
 2500
TGGGTGGTGCATGTTAATTTCCC






TTAGAAGTTCCAAGCCCTGTTTC






CTGCGTAAAGGTGGTATGTCCAG






TTCAGAGA





 10
AK
AK026725.1
 1869-
AATGAAATTACTGTAGAGTCAGC



026725

 1968
AAAGAAGTAGAGAAGAAAAAAC






ACCAAGAATGAGGAGAACCTAG






CAAGGGCAGGCTTTTGGAAGCA






AGAGGTAGATA





 11
AK
AK093878.1
 1554-
AGAATTTCTTGGTAGCTTTACAC



093878

 1653
CGAAAAATGCGTGTAACTAAAT






ACCAGACATCTTGACCATTCAGC






TAGAACCCTGGCAGCAACAGAG






CTATTTAATT





 12
AK
AK094576.1
 1765-
CCCCTCCAGCCAGCCCTGCGTGG



094576

 1864
TTGTGGCCCCACTGCAGAAACGC






CTCCGCTTAACACTCCAGCCTCT






CTTCTATTCGGTCAGGCCACAGC






TGCTGACT





 13
AK
AK124143.1
 2252-
GTACCTGGTAGAAATTGTGTCTT



124143

 2351
GGAATGACCCTTTCGAGTTATTG






ACATGGCTCTGATGAATAGAACA






TGAGCCCCAAAACTAAATCCAA






AAGGAATTT





 14
AK
AK126342.1
 2906-
CTTATTGATTAGTGAATGTAGCT



126342

 3005
TAAGCCTTTGTATGTGTCCTCAG






GGGGCAGACCGACTTTAAGAGG






GACCAGATAACGTTTGAATGGA






GGGATTATAT





 15
AKAP4
NM_139289.1
  417-
CTGTAAGTGTCCTCAACTGGCTT





  516
CTCAGTGATCTCCAGAAGTATGC






CTTGGGTTTCCAACATGCACTGA






GCCCCTCAACCTCTACCTGTAAA






CATAAAGT





 16
AKR1C3
NM_003739.4
 1097-
GAGGACGTCTCTATGCCGGTGAC





 1196
TGGACATATCACCTCTACTTAAA






TCCGTCCTGTTTAGCGACTTCAG






TCAACTACAGCTGAGTCCATAGG






CCAGAAAG





 17
ALAS1
NM_000688.4
 1616-
GGGGATCGGGATGGAGTCATGC





 1715
CAAAAATGGACATCATTTCTGGA






ACACTTGGCAAAGCCTTTGGTTG






TGTTGGAGGGTACATCGCCAGCA






CGAGTTCTC





 18
AMD1
NM_001634.4
  572-
ACCACCCTCTTGCTGAAAGCACT





  671
GGTTCCCCTGTTGAAGCTTGCTA






GGGATTACAGTGGGTTTGACTCA






ATTCAAAGCTTCTTTTATTCTCG






TAAGAATT





 19
AMPD3
NM_000480.2
 3389-
GTGATGCTCAGGGGCTGTCAAAG





 3488
TGACTGCGTTCATCAGTTTTACA






CTGGGGCTGCTACATAATATTTT






CATTTGAACGAAGAACTTCAAAA






AGCACAGG





 20
ANKHD1
NM_017747.2
 7665-
CTTGGAACCCTATGATAAAAGTT





 7764
ATCCAAAATTCAACTGAATGCAC






TGATGCCCAGCAGATTTGGCCTG






GCACGTGGGCACCTCATATTGGA






AACATGCA





 21
ANP32B
NM_006401.2
  661-
CACCTTGGAACCTTTGAAAAAGT





  760
TAGAATGTCTGAAAAGCCTGGAC






CTCTTTAACTGTGAGGTTACCAA






CCTGAATGACTACCGAGAGAGT






GTCTTCAAG





 22
ANXA1 
NM_000700.1
  516-
GAAATCAGAGACATTAACAGGG



b

  615
TCTACAGAGAGGAACTGAAGAG






AGATCTGGCCAAAGACATAACCT






CAGACACATCTGGAGATTTTCGG






AACGCTTTGC





 23
ANXA1
NM_000700.2
 1191-
TGGATGAAACCAAAGGAGATTA





 1290
TGAGAAAATCCTGGTGGCTCTTT






GTGGAGGAAACTAAACATTCCCT






TGATGGTCTCAAGCTATGATCAG






AAGACTTTA





 24
AP2S1
NM_021575.3
  746-
CGAGTAACCGTGCCGTTGTCGTG





  845
TGATGCCATAAGCGTCTGTGCGT






GGAGTCCCCAATAAACCTGTGGT






CCTGCCTGGCCTTGCCGTCAAAA






AAAAAAAA





 25
CENTD2
NM_00104011
 4923-
AAACTCCAGAACAGCAGAAAGC




8.2
 5022
GGGTGCTGTAGAGGAGCACTCA






GCTCACGGGGAGGGAGCTCTTG






GCTGAGCTTCTACAGGGCTGAGA






GCTGCGCTTTG





 26
ARCN1
NM_001655.4
 3437-
CACTTTTAGCTGGTTGAAAAGTA





 3536
CCACTCCCACTCTGAACATCTGG






CCGTCCCTGCAAAGAGTGTACTG






TGCTTGAAGCAGAGCACTCACAC






ATAAATGG





 27
ARG1 
NM_000045.2
  506-
AAGGAACTAAAAGGAAAGATTC



b

  605
CCGATGTGCCAGGATTCTCCTGG






GTGACTCCCTGTATATCTGCCAA






GGATATTGTGTATATTGGCTTGA






GAGACGTGG





 28
ARG1
NM_000045.3
  989-
TTCGGACTTGCTCGGGAGGGTAA





 1088
TCACAAGCCTATTGACTACCTTA






ACCCACCTAAGTAAATGTGGAA






ACATCCGATATAAATCTCATAGT






TAATGGCAT





 29
ARHGAP
NM_018054.5
 3027-
CATGTATGGTCTGTGTCTCCCCA



17

 3126
GTCCCCTCAGAACCATGCCCATG






GATGGTGACTGCTGGCTCTGTCA






CCTCATCAAACTGGATGTGACCC






ATGCCGCC





 30
ARHGAP
NM_033515.2
 2499-
TTTTTGACCAAAAAGATAACAAA



18

 2598
TACCAGGTATGGCAAGTTGTGAA






GACAGCACATTAAAACATACCTA






ATTTCACAGTATTCCTGTCACGA






CAGAATGT





 31
ARHGAP
NM_015071.4
 6088-
TCCCTGAGCTTTCCCAGTAGCCT



26

 6187
CCAGTTTCCTTTGTAAGACCCAG






GGATCACTTAGCCATAGCCTGAA






TCTTTTAGGGGTATTAAGGTCAG






CCTCTCAC





 32
ARHGEF
NM_015318.3
 5128-
GATTACAACATTTCCTCACTGCG



18

 5227
GGATATTTCTGACCCGCTTTAGA






ACTTAAGACCTGATTCTAGCAAT






AAACGTGTCCGAGATGAGCGGT






GAAAAAAAA





 33
FLJ
NM_018071.4
 5402-
GAATGTGTCTCCTCCACAGTGGC



10357

 5501
TCCCAGAGGTTCCACACACTCTC






TGAAGCTCCTTCTCCCACACTGC






ACCTACTCCTTGAGGCTGAACTG






GTCACAGA





 34
ARHGEF
NM_005435.3
 5151-
GGGGGACCATTGGGGCCTGAGC



5

 5250
CAAGGAACTTTCCTTCTACTGCC






TTATAGTGCTTAAACATTCTCCG






CCTCCAGGGTGCAGATTCAGAGC






TGGCCAGAG





 35
ARL8B
NM_018184.2
 2491-
ACCATTACAAAGAATGTGGCAA





 2590
CTTGCTTGTGCCTAAAAGGAGGA






ATTGGAACTAGAATGTGTGACTC






TGTGGGGACTGCATAGGTTTGTT






AATTGACCT





 36
ARPC2
NM_005731.2
  951-
ACGGGGAAGACGTTTTCATCCCG





 1050
CTAATCTTGGGAATAAGAGGAG






GAAGCGGCTGGCAACTGAAGGC






TGGAACACTTGCTACTGGATAAT






CGTAGCTTTT





 37
ASF1B
NM_018154.2
 1476-
CTGTCTCCGGGCCAGGGTCAGGG





 1575
ACCCTCTGCCTCTGGCAGCCTTA






ACCTGTCCTCTGCTAGGACCAGG






GTGATTTCAAGCCAGGGAAGCA






ACTGGGACC





 38
ATG4B
NM_178326.2
  106-
GGACGCAGCTACTCTGACCTACG





  205
ACACTCTCCGGTTTGCTGAGTTT






GAAGATTTTCCTGAGACCTCAGA






GCCCGTTTGGATACTGGGTAGAA






AATACAGC





 39
ATG5
NM_004849.2
 1105-
TGCAGTGGCTGAGTGAACATCTG





 1204
AGCTACCCGGATAATTTTCTTCA






TATTAGTATCATCCCACAGCCAA






CAGATTGAAGGATCAACTATTTG






CCTGAACA





 40
ATM
NM_000051.3
   31-
ACGCTAAGTCGCTGGCCATTGGT





  130
GGACATGGCGCAGGCGCGTTTGC






TCCGACGGGCCGAATGTTTTGGG






GCAGTGTTTTGAGCGCGGAGACC






GCGTGATA





 41
ATP2C1
NM_014382.3
 4070-
TAAAAAGTCCCCAAACCCAAAC





 4169
AAATGGTTTATGAACCAGAGTAT






ATGTGGAAGATTCTTTGCTGGTC






TTGCTCTGTGTGCATCTGAAGCT






TCTTTGGCC





 42
ATP5B
NM_001686.3
 1626-
CTATATGGTGGGACCCATTGAAG





 1725
AAGCTGTGGCAAAAGCTGATAA






GCTGGCTGAAGAGCATTCATCGT






GAGGGGTCTTTGTCCTCTGTACT






GTCTCTCTC





 43
ATP5I
NM_007100.2
  256-
TTGCCAGAGAATTGGCAGAAGA





  355
TGACAGCATATTAAAGTGAGTGA






CCCTGCGACCCACTCTTTGGACC






AGCAGCGGATGAATAAAGCTTC






CTGTGTTGTG





 44
ATP5J2
NM_004889.3
  267-
GCTGGCATGCTACGTGCTCTTTA





  366
GCTACTCCTTTTCCTACAAGCAT






CTCAAGCACGAGCGGCTCCGCA






AATACCACTGAAGAGGACACAC






TCTGCACCCC





 45
ATP5L
NM_006476.4
  196-
GGGACGGGGTCCTGCAGCGGGT





  295
CCTTCCGGCGGGTGACATTCAGC






CGGCGGTTCGGGGCGACGGACT






CTCCATTCCAGAACCATGGCCCA






ATTTGTCCGT





 46
AW
AW173314.1
  419-
AGCAGAAGGCAGGGGAGTCCAC



173314

  518
ACAGGGCAAGCAGCAACCAGGC






TTCTGAGGACAGGAAAGGAGGG






AGCATCTGGTGGGAAGCTGGCG






AGGAGGGGCTGG





 47
AW
AW270402.1
  203-
GATATCTCACACACGGAATAATC



270402

  302
ATTAAGAAACAACCACTGTTGAG






CAAAGTTGATAGGCAGTAAGGA






AATAAAGTGGACATAAACACAG






CAGTACTAAT





 48
AZI2
NM_022461.4
 3031-
GAATTGGTGTCAGATGCTGGAAT





 3130
TTATTCTGACCAATGAACACAGC






TGACTCAGGGGAGTACAATCTCC






TGCCAAGTAATAGAACCAAACC






CAATATGCA





 49
BACH2
NM_021813.3
 8696-
TCCAGAACCAGTCTGATGCAAGT





 8795
GCACCTCTAATATATGCCTTACA






AACTCCAGAGGCCATATTCAAAA






CAGGGTCTTCTCAGTGTATGCAA






GGGGCTGC





 50
BAG3
NM_004281.3
 2304-
CCCCACCACCTGTTAGCTGTGGT





 2403
TGTGCACTGTCTTTTGTAGCTCT






GGACTGGAGGGGTAGATGGGGAG






TCAATTACCCATCACATAAATAT






GAAACATT





 51
BANP
NM_079837.2
 2125-
GGAGCCCTTTGCTGTGTGCTCTG





 2224
TCCAGTGTCATGAGGCAGGTGTT






TGCAAAGCCAGCTCTCGGTTCCG






ATGGGGTATTGCTGACCTACTTT






TCTAGGGG





 52
BATF
NM_006399.3
  294-
CCTGGCAAACAGGACTCATCTGA





  393
TGATGTGAGAAGAGTTCAGAGG






AGGGAGAAAAATCGTATTGCCG






CCCAGAAGAGCCGACAGAGGCA






GACACAGAAGG





 53
BCL10
NM_003921.2
 1251-
TGAAAATACCATCTTCTCTTCAA





 1350
CTACACTTCCCAGACCTGGGGAC






CCAGGGGCTCCTCCTTTGCCACC






AGATCTACAGTTAGAAGAAGAA






GGAACTTGT





 54
BCL6
NM_00113084
 3401-
CCTCACGGTGCCTTTTTTCACGG




5.1
 3500
AAGTTTTCAATGATGGGCGAGCG






TGCACCATCCCTTTTTGAAGTGT






AGGCAGACACAGGGACTTGAAG






TTGTTACTA





 55
BCOR
NM_017745.5
 5794-
ATACAAAGCTCTGATGACAGGCC





 5893
ATGACTGTAGAGTGGTCAGAACT






GTGTGGTTGGTTTGAGGGAGCGA






ATTCGGGGAAGGCACTTGGTGAT






ATAACTTT





 56
BF
BF375676.1
  141-
TGTATTTCTGTGCAATGAGAGAG



375676

  240
GCTCTTTATGGTGGTGCTACAAA






CAAGCTCATCTTTGGAACTGGCA






CTCTGCTTGCTGTCCAGCCAAAT






ATCCAGAA





 57
BID
NM_001196.2
 1876-
AAGCACGACAGTGGATGCTGGG





 1975
TCCATATCACACACATTGCTGTG






AACAGGAAACTCCTGTGACCAC






AACATGAGGCCACTGGAGACGC






ATATGAGTAAG





 58
BMPR2
NM_001204.6
 1164-
CAGCGGCCCTGGCGGGTGCCCTG





 1263
GCTACCATGGACCATCCTGCTGG






TCAGCACTGCGGCTGCTTCGCAG






AATCAAGAACGGCTATGTGCGTT






TAAAGATC





 59
BQ
BQ189294.1
  416-
GCTGGAGTGATTGGCCCTGATGA



189294

  515
CCATGGAGAAAAGAGAGTAGGG






AGAACAGTATAACCAGAAGTCA






GGGGGGTCTCCTGGAATCCCTCC






TCACAATACC





 60
BU
BU743228.1
  154-
CCCTGTGGGCCTTGCAGGCCAGT



743228

  253
CCAGGCAGGTCTTTCACACTGTT






GTCCCACATAACAGAAAAAGCT






GAGCAGACAGGGTAGGAAACAC






ACTTGCATCT





 61
BX
BX089765.1
  106-
TTAAGCAACTTGCTCCAGTGACG



089765

  205
CAGCTGGTAAGCAGCAGAGCTG






GGATTAAAACCCAGGCATTCTGA






TTCCACCACCTACACACTTAGCC






ATTCCGCCC





 62
BX
BX108566.1
  365-
ATTTAGGGTGAGAGCTTCACAGC



108566

  464
TGAAAATCTCCTTTAAAGAAAAC






GCGGCCCAAATGTGCTGGGAGG






AGAAGCCAGTGGATCTAGGAGG






GGGCCCGGCG





 63
BX
BX400436.2
    1-
ATATTTTGGAGAGGGAAGTTGGC



400436

  100
TCACTGTTGTAGAGGACCTGAAC






AAGGTGTTCCCACCCGAGGTCGC






TGTGTTTGAGCCATCAGAAGCAG






AGATCTCC





 64
BX
BX436458.2
  518-
ATGCAGACAATTTGCCTGTGAGA



436458

  617
TGAGGAAAATTCTCTGGAAGATT






TAGGCCCTGAGAGCTGAAAAGG






GACCCTAAACATTACCTGGTGAC






AACTGCCCT





 65
C15orf
NM_015492.4
 3535-
CCTGAGCTTTTAACGTGAGGGTC



39

 3634
TTTATTGGATAGGACTACTCCCT






ATTTCTTGCCTAGAGAACACACA






TGGGCTTTGGAGCCCGACAGACC






TGGGCTTG





 66
C17orf
XM_944416.1
 4909-
AAGGATGGGGGTGGATTGACCA



51

 5008
AGCTGGGCCAGAGGTGCGAGGA






GCTGATCTGCGAGCCCTGTGTGC






CTGTGAGTCCTGGCGGAGTGGCC






GTGCGTGGTG





 67
C3
NM_000064.2
 4397-
CATCTACCTGGACAAGGTCTCAC





 4496
ACTCTGAGGATGACTGTCTAGCT






TTCAAAGTTCACCAATACTTTAA






TGTAGAGCTTATCCAGCCTGGAG






CAGTCAAG





 68
C4B
NM_00100202
 4438-
GAGTCCAGGGTGCACTACACCGT




9.3
 4537
GTGCATCTGGCGGAACGGCAAG






GTGGGGCTGTCTGGCATGGCCAT






CGCGGACGTCACCCTCCTGAGTG






GATTCCACG





 69
C4orf
NM_017867.2
  682-
GAACCGTGAAGATGAAACAGAG



27

  781
AGATAAGAAAGTTGTGACAAAG






ACCTTTCATGGTGCAGGCTTGGT






TGTTCCAGTAGATAAAAATGATG






TTGGGTACCG





 70
C8orf
NM_032847.2
 1029-
TAAAAGATGAAGTTCACCCAGA



76

 1128
GGTGAAGTGTGTTGGCTCCGTAG






CCCTGACTGCCTTGGTGACTGTA






TCCTCAGAAGAATTTGAAGACAA






GTGGTTCAG





 71
C9orf
NM_182635.1
  529-
CGCTGGCCATGGGGAAGCCACCT



164

  628
CCAGGGCAGTCCCAGGGACTGA






ATTGGAAGTTGTCCCAAGTCACT






TCAGGTCCAACTGGGACAGCAG






AGGTAACCCC





 72
CAMP
NM_004345.4
  623-
TTGTCCAGAGAATCAAGGATTTT





  722
TTGCGGAATCTTGTACCCAGGAC






AGAGTCCTAGTGTGTGCCCTACC






CTGGCTCAGGCTTCTGGGCTCTG






AGAAATAA





 73
CASP1
NM_033294.3
  219-
ATTTATCCAATAATGGACAAGTC





  318
AAGCCGCACACGTCTTGCTCTCA






TTATCTGCAATGAAGAATTTGAC






AGTATTCCTAGAAGAACTGGAGC






TGAGGTTG





 74
CASP2
NM_032983.3
 3347-
CCCACCACTCTTGACTCAGGTGG





 3446
TGTCCTTCTTCCTCAAGTCTTGA






CAATTCCCGGGCCCTTCAGTCCC






TGAGCAGTCTACTTCTGTGTCT






GTCACCACA





 75
CASP3
NM_032991.2
  686-
ACTCCACAGCACCTGGTTATTAT





  785
TCTTGGCGAAATTCAAAGGATGG






CTCCTGGTTCATCCAGTCGCTTT






GTGCCATGCTGAAACAGTATGCC






GACAAGCT





 76
CBLL1
NM_024814.3
 1967-
ATGAGGGGGAAAAAAACTTATG





 2066
TGTAGTCAATCTTTTAAGCTTTG






ACTGTTTTGGGAAGGAAGAGTAC






CTCTTATCGAGGTAGTATAAAAC






ACATAGGGT





 77
CC2D1B
NM_032449.2
 4183-
TTGCATAAGCACAGCTCAAGAAC





 4282
TGAGCTTTGTATGTGTCCTTTTG






GGGGATAACAGGGCTGGACCATG






CTTCCCTGCCCTTAAACGCAGAG






CTTTTAGT





 78
KIAA
NM_021174.5
  201-
GGGAGAGGGCCCACACAGTCTC



1967

  300
CTCGCCGGCACCGGCCTCCTCCA






TTTTTCCGGGCCTTGCGTGGAGG






GTTTTGGCGGATGTTTTTGAACG






AAGGAATGT





 79
CCDC97
NM_052848.1
 2867-
ATCCAGAGTGAGACAGCATTGG





 2966
AGGGACAAGTGTGCATGCAGAT






GTCCTCAGACGGGAAGGTTTGAG






AAGGGTCAGATGGTAGGCGGGC






CTAACAAGGGC





 80
CCL3
NM_002983.2
  160-
CAGTTCTCTGCATCACTTGCTGC





  259
TGACACGCCGACCGCCTGCTGCT






TCAGCTACACCTCCCGGCAGATT






CCACAGAATTTCATAGCTGACTA






CTTTGAGA





 81
CCL3L1
NM_021006.4
  422-
GGAGCCTGAGCCTTGGGAACAT





  521
GCGTGTGACCTCTACAGCTACCT






CTTCTATGGACTGGTTATTGCCA






AACAGCCACACTGTGGGACTCTT






CTTAACTTA





 82
CCL3L3
NM_00100143
  402-
GGGGAGGAGCAGGAGCCTGAGC




7.3
  501
CTTGGGAACATGCGTGTGACCTC






CACAGCTACCTCTTCTATGGACT






GGTTATTGCCAAACAGCCACACT






GTGGGACTC





 83
CCL4
NM_002984.2
   36-
TTCTGCAGCCTCACCTCTGAGAA





  135
AACCTCTTTGCCACCAATACCAT






GAAGCTCTGCGTGACTGTCCTGT






CTCTCCTCATGCTAGTAGCTGCC






TTCTGCTC





 84
CCND3
NM_001760.2
 1216-
GGCCAGCCATGTCTGCATTTCGG





 1315
TGGCTAGTCAAGCTCCTCCTCCC






TGCATCTGACCAGCAGCGCCTTT






CCCAACTCTAGCTGGGGGTGGGC






CAGGCTGA





 85
CCR1
NM_001295.2
  536-
CATCATTTGGGCCCTGGCCATCT





  635
TGGCTTCCATGCCAGGCTTATAC






TTTTCCAAGACCCAATGGGAATT






CACTCACCACACCTGCAGCCTTC






ACTTTCCT





 86
CCR6
NM_031409.2
  936-
CTTTAACTGCGGGATGCTGCTCC





 1035
TGACTTGCATTAGCATGGACCGG






TACATCGCCATTGTACAGGCGAC






TAAGTCATTCCGGCTCCGATCCA






GAACACTA





 87
CCR9
NM_031200.1
 1096-
CCCTGTTCTCTATGTTTTTGTGG





 1195
GTGAGAGATTCCGCCGGGATCTC






GTGAAAACCCTGAAGAACTTGGG






TTGCATCAGCCAGGCCCAGTGGG






TTTCATTT





 88
CCT6A
NM_001762.3
  281-
GCCCAAGGGCACCATGAAGATG





  380
CTCGTTTCTGGCGCTGGAGACAT






CAAACTTACTAAAGACGGCAAT






GTGCTGCTTCACGAAATGCAAAT






TCAACACCCA





 89
CD14
NM_000591.2
  886-
GCCCAAGCACACTCGCCTGCCTT





  985
TTCCTGCGAACAGGTTCGCGCCT






TCCCGGCCCTTACCAGCCTAGAC






CTGTCTGACAATCCTGGACTGGG






CGAACGCG





 90
CD160 
NM_007053.2
  501-
TTGATGTTCACCATAAGCCAAGT



b

  600
CACACCGTTGCACAGTGGGACCT






ACCAGTGTTGTGCCAGAAGCCAG






AAGTCAGGTATCCGCCTTCAGGG






CCATTTTT





 91
CD160
NM_007053.3
 1286-
AAAGGAAGACAGCCAGATCCAG





 1385
TGATTGACTTGGCATGAAAATGA






GAAAATGCAGACAGACCTCAAC






ATTCAACAACATCCATACAGCAC






TGCTGGAGGA





 92
CD1A
NM_001763.2
 1816-
CCTGTTTTAGATATCCCTTACTC





 1915
CAGAGGGCCTTCCCTGACTTACA






AGTGGGAAGCAGTCTCTTCCTGG






TCTGAACTCCCGCCACATTTTAG






CCGTACTT





 93
CD36
NM_000072.3
 1619-
TAAAGAATCTGAAGAGGAACTA





 1718
TATTGTGCCTATTCTTTGGCTTA






ATGAGACTGGGACCATTGGTGAT






GAGAAGGCAAACATGTTCAGAAG






TCAAGTAAC





 94
CD48
NM_001778.2
  271-
AATTTAAAGGCAGGGTCAGACTT





  370
GATCCTCAGAGTGGCGCACTGTA






CATCTCTAAGGTCCAGAAAGAG






GACAACAGCACCTACATCATGA






GGGTGTTGAA





 95
CD69
NM_001781.2
 1360-
TATACAGTGTCTTACAGAGAAAA





 1459
GACATAAGCAAAGACTATGAGG






AATATTTGCAAGACATAGAATAG






TGTTGGAAAATGTGCAATATGTG






ATGTGGCAA





 96
CD70
NM_001252.2
  191-
CCTATGGGTGCGTCCTGCGGGCT





  290
GCTTTGGTCCCATTGGTCGCGGG






CTTGGTGATCTGCCTCGTGGTGT






GCATCCAGCGCTTCGCACAGGCT






CAGCAGCA





 97
CD79A
NM_021601.3
  617-
TGAAGATGAAAACCTTTATGAAG





  716
GCCTGAACCTGGACGACTGCTCC






ATGTATGAGGACATCTCCCGGGG






CCTCCAGGGCACCTACCAGGATG






TGGGCAGC





 98
CD79B
NM_000626.2
  350-
GAAGCTGGAAAAGGGCCGCATG





  449
GAAGAGTCCCAGAACGAATCTCT






CGCCACCCTCACCATCCAAGGCA






TCCGGTTTGAGGACAATGGCATC






TACTTCTGT





 99
CDC
NM_006779.3
 1779-
AGGGCTTTGTGGAGGACAGGCCT



42EP2

 1878
TGCCCTCAAGAACGTCGTACCTG






ACGCTGAGCCTGTCATGAGAATG






CAACAGGAGCAAACCAAGTGTT






GCTGTGACA





100
CDH5
NM_001795.3
 3406-
TCTCCCCTTCTCTGCCTCACCTG





 3505
GTCGCCAATCCATGCTCTCTTTC






TTTTCTCTGTCTACTCCTTATCC






CTTGGTTTAGAGGAACCCAAGAT






GTGGCCTT





101
CDKN1A
NM_000389.2
 1976-
CATGTGTCCTGGTTCCCGTTTCT





 2075
CCACCTAGACTGTAAACCTCTCG






AGGGCAGGGACCACACCCTGTAC






TGTTCTGTGTCTTTCACAGCTCC






TCCCACAA





102
CFD
NM_001928.2
  860-
CTGGTTGGTCTTTATTGAGCACC





  959
TACTATATGCAGAAGGGGAGGC






CGAGGTGGGAGGATCATTGGAT






CTCAGGAGTTCGAGATCAGCATG






GGCCACGTAG





103
CHCHD3
NM_017812.2
 1173-
TCCACCCTAACAAAGTAGGATGG





 1272
GGTTGGGGGCTAAATTAATTGGA






GTGGGGCGAGGAGAGAGCCAGA






AAACATAGATCCGAGGGCAGCA






GTGCTGGGTG





104
CHFR
NM_018223.2
 2836-
CGCCGCTCCCTCATGCTGCCCGG





 2935
GCCCTTCCTCCAAGACCCTACAG






AGCCTGAGGGGCACCTTGGCTTC






CGCCTGTGCTAGCTTTGCCATGT






CATCTGGA





105
CHMP5
NM_016410.5
 1148-
ACTAAGGAAATGGAATCTTAAA





 1247
AGTCTATGACAGTGTAACTCTAC






AGTCTCAAAATGACCTGATAAAT






TGATAAGACAAAGATGAGATTA






TTGGGGCTGT





106
CIAPIN
NM_020313.3
 1816-
GCATGTCTTGTAAAGAGAGGGG



1

 1915
ATGTGCATTTGTGTGTGATGTTG






GATAGTCATCCACGCTCAGTTTG






GACCATTGGAGGAACTTAGTGTC






ACGCACAAA





107
CKS2
NM_001827.1
  228-
AGACTTGGTGTCCAACAGAGTCT





  327
AGGCTGGGTTCATTACATGATTC






ATGAGCCAGAACCACATATTCTT






CTCTTTAGACGACCTCTTCCAAA






AGATCAAC





108
CLEC4A
NM_194448.2
  389-
ATTTCTACTGAATCAGCATCTTG





  488
GCAAGACAGTGAGAAGGACTGT






GCTAGAATGGAGGCTCACCTGCT






GGTGATAAACACTCAAGAAGAG






CAGGATTTCA





109
CLEC4C
NM_203503.1
  571-
TACGAGAGTATCAACAGTATCAT





  670
CCAAGCCTGACCTGCGTCATGGA






AGGAAAGGACATAGAAGATTGG






AGCTGCTGCCCAACCCCTTGGAC






TTCATTTCA





110
CLEC5A
NM_013252.2
 3251-
CCCCATCCAACCCTTAGACTCAC





 3350
GAACAAATCCACCTGAGATCAG






CAGAGCCACCCTAGATCAGCTGA






AACTCTAAGCACAAAAATAAAA






ACTTATCACT





111
CLIC3
ILMN_179642
   99-
CGTACGCCGCTACCTGGACAGCG




3.1
  198
CGATGCAGGAGAAAGAGTTCAA






ATACACGTGTCCGCACAGCGCCG






AGATCCTGGCGGCCTACCGGCCC






GCCGTGCAC





112
CLK2
XM_941392.1
  552-
GATTATAGCCGGGATCGGGGAG





  651
ATGCCTACTATGACACAGACTAT






CGGCATTCCTATGAATATCAGCG






GGAGAACAGCAGTTACCGCAGC






CAGCGCAGCA





113
CLN8
NM_018941.3
 4486-
GGCGCCAGAGCTGGGCTCTTCAA





 4585
CACGGCATTTAGCGCAGAAAGTC






GTGGTTCAGGCAGTATGGGCCGC






TGTGACAAAACACCTAAGACTG






GGTAGTTTA





114
CLPTM1
NM_001294.3
 2389-
TCTGTGTTTCCAGCCATCTCGCC





 2488
CTGCCAGCCCAGCACCACTGGGA






ATCATGGTGAAGCTGATGCAGCG






TTGCCGAGGGGGTGGGTTGGGC






GGGGGTGGG





115
CLSTN1
NM_00100956
 4990-
TTGAATACTGTTCTGTGACCCTG




6.2
 5089
ACTGCTAGTTCTGAGGACACTGG






TGGCTGTGCTATGTGTGGCCATC






CTCCATGTCCCGTCCCTGTAGCT






GCTCTGTT





116
CN
CN312986.1
  491-
AGGAAACTAAGACATGGAAAGG



312986

  590
TTAGGTAACTTGCCCAAGGTCGC






ACAGCTAGTAAGTGGCAGACAT






CCAGAGTCTCTGCTCTGCTCTTA






ACTCTCACCA





117
CNIH4
NM_014184.3
  526-
AATGACTGAAGCTGGAGAAGCC





  625
GTGGTTGAAGTCAGCCTACACTA






CAGTGCACAGTTGAGGAGCCAG






AGACTTCTTAAATCATCCTTAGA






ACCGTGACCA





118
CNPY2
NM_014255.5
 1038-
TTGCAGTAAGCGAACAGATCTTT





 1137
GTGACCATGCCCTGCACATATCG






CATGATGAGCTATGAACCACTGG






AGCAGCCCACACTGGCTTGATGG






ATCACCCC





119
COLEC
NM_130386.2
  901-
ACACAAGCCAGGCTATCCAGCG



12

 1000
AATCAAGAACGACTTTCAAAATC






TGCAGCAGGTTTTTCTTCAAGCC






AAGAAGGACACGGATTGGCTGA






AGGAGAAAGT





120
GLT25D
NM_024656.2
 3067-
CTGTGTGCCAGGCCTCACAGACT



1

 3166
CCCAGTTGGGTTGAAGAATGGTT






GACTGAGTTTGATTCTTCCTGTA






CCCTCGGTCGTCTGAGCTGTGTG






CGGACAAC





121
COMMD6
NM_203497.3
   32-
CTCTCGAGTCCGGGCCGCAAGTC





  131
CCAGACGCTGCCCATGGAGGCGT






CCAGCGAGCCGCCGCTGGATGCT






AAGTCCGATGTCACCAACCAGCT






TGTAGATT





122
CORO1C
ILMN_174595
   98-
AAGTAAAGTTGTTGATGGTGGTG




4.1
  197
AAACACCGTAGGGCATGTGGTTC






AAAGAGAAGCAGGAGGGCAAGG






GAAAGTTACCCTGATCTTAGTTT






GTAGCTTAT





123
COX6C
NM_004374.2
   70-
GAAGTTTTGCCAAAACCTCGGAT





  169
GCGTGGCCTTCTGGCCAGGCGTC






TGCGAAATCATATGGCTGTAGCA






TTCGTGCTATCCCTGGGGGTTGC






AGCTTTGT





124
COX7B
NM_001866.2
  160-
CAGAGCCACCAGAAACGTACAC





  259
CTGATTTTCATGACAAATACGGT






AATGCTGTATTAGCTAGTGGAGC






CACTTTCTGTATTGTTACATGGA






CATATGTAG





125
COX7C
NM_001867.2
    1-
CAAGGTCGTGAAAAAAAAGGTC





  100
TTGGTGAGGTGCCGCCATTTCAT






CTGTCCTCATTCTCTGCGCCTTT






CGCAGAGCTTCCAGCAGCGGTAT






GTTGGGCCA





126
CPPED1
NM_018340.2
 2494-
TGTATTTGTTTCTTTACAACAGG





 2593
TGTAGGTATAGGAGGTCAAGAAA






AGGAGTTCGGTAAAGGGCATAG






CTAATAACAACCACACATTGGGC






CAGGCACAG





127
CR2 b
NM_00100665
  486-
GGTGTCAAGCAAATAATATGTGG




8.1
  585
GGGCCGACACGACTACCAACCT






GTGTAAGTGTTTTCCCTCTCGAG






TGTCCAGCACTTCCTATGATCCA






CAATGGACA





128
CR2
NM_00100665
 3581-
AGCCCAGTTTCACTGCCATATAC




8.2
 3680
TCTTCAAGGACTTTCTGAAGCCT






CACTTATGAGATGCCTGAAGCCA






GGCCATGGCTATAAACAATTACA






TGGCTCTA





129
CREB1
NM_004379.3
 4856-
TTTGATGGTAGGTCAGCAGCAGT





 4955
GCTAGTCTCTGAAAGCACAATAC






CAGTCAGGCAGCCTATCCCATCA






GATGTCATCTGGCTGAAGTTTAT






CTCTGTCT





130
CREB5
NM_182898.3
 7898-
ACCTACTCACCTTTTTCCCTTCT





 7997
AAGTTCTGCTAAATCACATCTGC






CTCATAGAGAAAGGAATGTTGCC






TTTGAGAACTGTCTTGGAGAACA






GATAAGCT





131
CRKL
NM_005207.3
 4901-
TTCTAAAGGAGCAGAAGGACAG





 5000
GTCTCTGAGACAGGATCGTTGTC






CCTACAGGAGGAACAGTGGCCTT






GCTTCTTAGACGGTCTTCACTGT






GTGTTTTAA





132
CRY2
NM_021117.3
 4013-
CAGCTCAGGTGGCCCTGAGGGCT





 4112
CCCTCGGAACAGTGCCTCAAATC






CTGACCCAAGGGCCAGCATGGG






GAAGAGATGGTTGCAGGCAAAA






TGCACTTTAT





133
CS
NM_004077.2
 2080-
CCTCCTAGCAAGACCTGTTGGTT





 2179
AGCTGGACATGCTTTGGCAATTT






TTTTATACTACCAAGTGACCATA






AAGGCATGGCATTTGTTGTGACT






GGCACCCA





134
CSK
NM_004383.2
 2501-
TCTAGGGACCCCTCGCCCCAGCC





 2600
TCATTCCCCATTCTGTGTCCCAT






GTCCCGTGTCTCCTCGGTCGCCC






CGTGTTTGCGCTTGACCATGTTG






CACTGTTT





135
CST7
NM_003650.3
  618-
CAACCACACCTTGAAGCAGACTC





  717
TGAGCTGCTACTCTGAAGTCTGG






GTCGTGCCCTGGCTCCAGCACTT






CGAGGTGCCTGTTCTCCGTTGTC






ACTGACCC





136
CTAG1B
NM_001327.2
  286-
GCGGGGCCAGGGGGCCGGAGAG





  385
CCGCCTGCTTGAGTTCTACCTCG






CCATGCCTTTCGCGACACCCATG






GAAGCAGAGCTGGCCCGCAGGA






GCCTGGCCCA





137
CTDSP2
NM_005730.3
 4685-
GAGGTCGGGCCAGCTGCCCCATT





 4784
CTTTTAACGTTGTAGGGCCTGCC






CATGGAGCGGACCCTCCTCTTTG






GGCCTCGTGAGCTTTTTTGCTTA






TCATGTTC





138
CTSW
NM_001335.3
 1076-
TGCACCGAGGGAGCAATACCTGT





 1175
GGCATCACCAAGTTCCCGCTCAC






TGCCCGTGTGCAGAAACCGGATA






TGAAGCCCCGAGTCTCCTGCCCT






CCCTGAAC





139
CTSZ
NM_001336.3
 1174-
CACTGGCTGCGAGTGTTCCTGAG





 1273
AGTTGAAAGTGGGATGACTTATG






ACACTTGCACAGCATGGCTCTGC






CTCACAATGATGCAGTCAGCCAC






CTGGTGAA





140
CX3CL1
NM_002996.3
  141-
AGCACCACGGTGTGACGAAATG





  240
CAACATCACGTGCAGCAAGATG






ACATCAAAGATACCTGTAGCTTT






GCTCATCCACTATCAACAGAACC






AGGCATCATG





141
CXCL2
NM_002089.3
  855-
ATCACATGTCAGCCACTGTGATA





  954
GAGGCTGAGGAATCCAAGAAAA






TGGCCAGTGAGATCAATGTGACG






GCAGGGAAATGTATGTGTGTCTA






TTTTGTAAC





142
IL8RB
NM_001557.3
  410-
ACCTCAAAAATGGAAGATTTTAA





  509
CATGGAGAGTGACAGCTTTGAA






GATTTCTGGAAAGGTGAAGATCT






TAGTAATTACAGTTACAGCTCTA






CCCTGCCCC





143
CXCR5 
NM_001716.3
 2619-
ACGTCCCTTTTTTCTCTGAGTAT



b

 2718
CTCCTCGCAAGCTGGGTAATCGA






TGGGGGAGTCTGAAGCAGATGCA






AAGAGGCAAGAGGCTGGATTTT





144
CYBB
NM_000397.3
 3787-
ACTGGAGAGGGTACCTCAGTTAT





 3886
AAGGAGTCTGAGAATATTGGCCC






TTTCTAACCTATGTGCATAATTA






AAACCAGCTTCATTTGTTGCTCC






GAGAGTGT





145
CYP1B1
NM_000104.3
 2361-
CTTACACCAAACTACTGAATGAA





 2460
GCAGTATTTTGGTAACCAGGCCA






TTTTTGGTGGGAATCCAAGATTG






GTCTCCCATATGCAGAAATAGAC






AAAAAGTA





146
DB
DB338252.1
  436-
GTTCTTGGTCTGTATGTGTAGGT



338252

  535
GGAGGGAGGCAAAGTTGTGGTA






ATAAAGTGGGAAGGCCCGGGAA






GAACAGCTAACTGTATAGGGGT






GAAATGACGCT





147
DBI
NM_00107986
  241-
CATAAATACAGAACGGCCCGGG




2.1
  340
ATGTTGGACTTCACGGGCAAGGC






CAAGTGGGATGCCTGGAATGAG






CTGAAAGGGACTTCCAAGGAAG






ATGCCATGAAA





148
DCAF7
NM_005828.4
 6155-
TTAACACTGTGCTGTGAAACAAC





 6254
TATGGGGAATCTCCATTGAAGGC






TACTTCATGGGCACCTGAAAGTG






GAGTGTTATAGCTATGACTTTCT






ATTTCTTG





149
DDIT4
NM_019058.2
 1414-
GACCTGTTGTAGGCAGCTATCTT





 1513
ACAGACGCATGAATGTAAGAGT






AGGAAGGGGTGGGTGTCAGGGA






TCACTTGGGATCTTTGACACTTG






AAAAATTACA





150
DDX23
NM_004818.2
 2811-
ATTGCACTGGGCCATCAGCTCAT





 2910
GCCAGGCTATGGGGGCAGCCAG






TTGGCATTGCTCCCCAGACTGAA






CAGAAACCTGGCCGCCGGATGG






GACCTCCTTT





151
DGUOK
NM_080916.2
  573-
ACATCGAGTGGCATATCTATCAG





  672
GACTGGCATTCTTTTCTCCTGTG






GGAGTTTGCCAGCCGGATCACAT






TACATGGCTTCATCTACCTCCAG






GCTTCTCC





152
DGUOK 
NM_080916.2
  903-
TTGTAAAGAATCTGTAACCAATA



b

 1002
CCATGAAGTTCAGGCTGTGATCT






GGGCTCCCTGACTTTCTGAAGCT






AGAAAAATGTTGTGTCTCCCAAC






CACCTTTC





153
DHX16 
NM_00116423
 2491-
CCCGTGTCAACTTCTTTCTCCCT



b
9.1
 2590
GGCGGTGACCACCTGGTTCTGCT






AAATGTTTACACACAGTGGGCTG






AGAGTGGTTACTCTTCCCAGTGG






TGCTATGA





154
DHX16
NM_003587.4
 3189-
ACCAAAGAGTTCATGAGACAGG





 3288
TACTGGAGATTGAGAGCAGTTGG






CTTCTGGAGGTGGCTCCCCATTA






TTATAAGGCCAAGGAGCTAGAA






GATCCCCATG





155
DKFZp
XM_291277.4
 4192-
CTCCTGCAGCTTCTGTGAGCCAA



761PO4

 4291
GCCCCAGCCTGCACCGTCGCTGC



23


CCCTTCCCTGCCTAACCCTTTCC






TGTCTCGCCTTGGAAGCACCCAT






GTCTCCCT





156
DMBT1
NM_007329.2
 3713-
CACAATGGCTGGCTCTCCCACAA





 3812
CTGTGGCCATCATGAAGACGCTG






GTGTCATCTGCTCAGCTTCCCAG






TCCCAGCCGACACCCAGCCCAGA






CACTTGGC





157
DNAJB1
NM_006145.2
 1904-
GACCTCTGGCTCCAGTGAAGCTG





 2003
AATGTCCTCACTTTGTGGGTCAC






ACTCTTTACATTTCTGTAAGGCA






ATCTTGGCACACGTGGGGCTTAC






CAGTGGCC





158
DNAJB6
NM_058246.3
 2087-
CTTCCCTGCATGCTCCCTCCCAG





 2186
TGACTTTCCTTCCCTTTCACATG






AGGATCTGCCGTTCATGTTGCTT






TCTCCTTTGTCCTCTTGGACTTG






AGGGCATT





159
DOCK5
NM_024940.6
 7201-
AAAGAGATTTCCATTTCTGCTGC





 7300
CAGAGCTGGTATTTGCCTGCCTG






ATTCTCTGTGTTTCCTGTTTCAC






CGCCACCCTTTCAGGAGAGAACT






ACACCAGT





160
DPF2
NM_006268.4
 2249-
TCTCAGCTCATGGGGAAGCCACA





 2348
TAGACATCCCTTTCTTCCCTTGC






ACGCTCGCTAGCAGCTGGTAAGG






TCTTCACACCCTGATTCCTCAAG






TTTTCTGC





161
DYNC2
NM_016008.3
  351-
TTTGGGAACTCGGTGGAGGAACC



LI1

  450
TCTTTATTGGACTTAATCAGCAT






ACCCATCACAGGTGACACCTTAC






GGACGTTTTCTCTTGTTCTCGTT






CTGGATCT





162
DZIP3
NM_014648.3
 4323-
CCCAGTGTCTTGCCCAGTAGATA





 4422
CAAGATAAATATTGCCAGAATCA






GATATCAGGAAGTAGTAAGAAA






AGGAGTTAATATGCAAACTAAAT






CACTCGCTC





163
EEF1B2
NM_00103766
  699-
GGATACGGAATTAAGAAACTTC




3.1
  798
AAATACAGTGTGTAGTTGAAGAT






GATAAAGTTGGAACAGATATGCT






GGAGGAGCAGATCACTGCTTTTG






AGGACTATG





164
EGLN1
NM_022051.1
 3976-
AGCAGCATGGACGACCTGATAC





 4075
GCCACTGTAACGGGAAGCTGGG






CAGCTACAAAATCAATGGCCGG






ACGAAAGCCATGGT





165
EGR1
NM_001964.2
 1506-
GAGGCATACCAAGATCCACTTGC





 1605
GGCAGAAGGACAAGAAAGCAGA






CAAAAGTGTTGTGGCCTCTTCGG






CCACCTCCTCTCTCTCTTCCTAC






CCGTCCCCG





166
EHD4
NM_139265.3
 2605-
TCAAACATTAAATATCCCGAGGT





 2704
CTCCTTGGTGGGTGGCAGGATTT






AAATTCAATCAAATCCTGTCCTA






GTGTGTGCAGTGTCTTCGGCCCT






GTGGACAC





167
EID2B
NM_152361.2
  628-
GCCAGTTTAGTTAACTCAGTCAT





  727
TAGGGGGAATGCAAACTGGAAG






GGAATACGGCAATGTGCAATTG






AAGGAGGAAGCACACTCCGAAA






TGGAAACAGAC





168
EIF2B4
NM_015636.3
 1497-
GTCTCTAATGAGCTAGATGACCC





 1596
TGATGATCTGCAATGTAAGCGGG






GAGAACATGTTGCGCTGGCTAAC






TGGCAGAACCACGCATCCCTACG






GTTGTTGA





169
EIF4EN
NM_019843.2
 3051-
CACACTGGGCAGGACCCTGCTTC



IF1

 3150
ATCTCGGGTTGGTTTATGGGCTT






TTACTTTGGAGCACTCTGTGTGA






AGCTGTTTGGTGGAACCCATGCA






TCTGGTGT





170
EMR4
NM_00108049
 1719-
GGGAAGACGATTGGATCAATCA




8.2
 1818
TTGCATACTCATTCACCATCATC






AACACCCTTCAGGGAGTGTTGCT






CTTTGTGGTACACTGTCTCCTTA






ATCGCCAGG





171
EP300
NM_001429.2
  716-
CCAGCCAGGCCCAACAGAGCAG





  815
TCCTGGATTAGGTTTGATAAATA






GCATGGTCAAAAGCCCAATGAC






ACAGGCAGGCTTGACTTCTCCCA






ACATGGGGAT





172
EPHX2
NM_001979.5
 1909-
CATCCTTCCACCTGCTGGGGCAC





 2008
CATTCTTAGTATACAGAGGTGGC






CTTACACACATCTTGCATGGATG






GCAGCATTGTTCTGAAGGGGTTT






GCAGAAAA





173
ERLIN1
NM_006459.3
 3197-
TGATGGCCCTGGAGGCGGGGCT





 3296
GAGGAACAGGGAAATGCCGCTG






TGAAGTCTTAAAGCACTTCTGCT






TAAACTCCCATGTGTGAGGAGTG






TGCCTCCCTG





174
ETFDH
NM_004453.3
 1904-
TGACCTCTTGTCATCTGTGGCTC





 2003
TGAGTGGTACTAATCATGAACAT






GACCAGCCGGCACACTTAACCTT






AAGGGATGACAGTATACCTGTAA






ATAGAAAT





175
EVI2A
NM_014210.3
 1410-
GAGAGAGCTAAACTGTGTAATTT





 1509
AATGGTATCTTCCTTGCTGGATG






TGGCAGAATCCACACCAGCTTAT






CAACCAACACAGCTAATTTTAGA






ATAGATCC





176
EWSR1
NM_005243.3
 2248-
AAAAATGGATAAAGGCGAGCAC





 2347
CGTCAGGAGCGCAGAGATCGGC






CCTACTAGATGCAGAGACCCCGC






AGAGCTGCATTGACTACCAGATT






TATTTTTTAA





177
EYA3
NM_001990.3
 1551-
GATTCCTGGTTAGGAACTGCATT





 1650
AAAGTCCTTACTTCTCATCCAGT






CCAGAAAGAATTGTGTGAATGTT






CTGATCACTACCACCCAGCTGGT






TCCAGCCC





178
C5orf
NM_032042.5
 4058-
TTAGAACAAGTAGAATGGGAAA



21

 4157
GGAGTGACTGATAAATCTAAGAT






TCAAAATAGTCCCGTCGAAACTT






AAAGGCCAGATTATTGCTTTGGA






GCTTTCTAT





179
FAM179
NM_199280.2
 3306-
ACTCTTAGACTCAGAGTCCTTGG



A

 3405
GAGGCAGCCGCAAGGCCACTGA






CAGAGGGGTGGCCCCTGACAGC






AAGACAACTGGCAGCTCATACCC






TTTTCAGCTG





180
FAM193
NM_003704.3
 4523-
CCCTGACTTGTAGCCAGCTTGTG



A

 4622
TAAGATCCCTTGCAGAACGAGA






AAGTTAAAAACAAGCCCACCCA






GTACTCACACCATCAAGTCTGTT






ATAGAGTGTA





181
FAM43A
NM_153690.4
 2741-
AGACCCCTGAAATGTTGCCAAAT





 2840
TCTTCAAATAACTGTTTGGGGGG






TGGGGGGAGATGAAAGAGAGTC






GCGTTTTGTTTACAGTTAAAGAC






ATCCAATAT





182
FAM50B
NM_012135.1
 1273-
TTCTGAGTATTTTAGTGTTGCCA





 1372
CCTGGATTTGCTGCATTGCTCTG






CTGAGCTGTATTGAAACCATGAC






TGGGCCCACTGTCAGACAGAAAT






TAGAATAG





183
FAIM3
NM_005449.4
 1689-
CAGGCTCTAGATCACATGGCATC





 1788
AGGCTGGGGCAGAGGCATAGCT






ATTGTCTCGGGCATCCTTCCCAG






GGTTGGGTCTTACACAAATAGAA






GGCTCTTGC





184
FKBP1A
NM_054014.3
  301-
AGAAACAAGCCCTTTAAGTTTAT





  400
GCTAGGCAAGCAGGAGGTGATC






CGAGGCTGGGAAGAAGGGGTTG






CCCAGATGAGTGTGGGTCAGAG






AGCCAAACTGA





185
FLNB
NM_001457.3
 9148-
CAGACCTGAGCTGGCTTTGGAAT





 9247
GAGGTTAAAGTGTCAGGGACGTT






GCCTGAGCCCAAATGTGTAGTGT






GGTCTGGGCAGGCAGACCTTTAG






GTTTTGCT





186
FNBP1
NM_015033.2
 5237-
TGTGTGTTGCACTAATTCTAAAC





 5336
TTTGGAGGCATTTTGCTGTGTGA






GGCCGATCGCCACTGTAAAGGTC






CTAGAGTTGCCTGTTTGTCTCTG






GAGATGGA





187
FOXK2
NM_004514.3
 4387-
TTTTTTGCCGTAGGCACCATTCT





 4486
GCATCTTGAACCCAGACTGAAGT






GTGCCTCTCACAGATGGAAGGTG






CACACGCTCCTGTCTCCTCCTCA






CTCTGCCA





188
FRAT2
NM_012083.2
 1769-
CTTGTCCTCCCAGCTGAGCTTTC





 1868
TTATTCCACCCTTTCTGGTGTCT






ATAGGAATGCATGAGAGACCCTG






GACGTTTTTCTGCTCTCTTCTGG






CCCTCCAT





189
FTHL16
XR_041433.1
  255-
GGACTCAGAGGCCGCCATCAAC





  354
CGCCAGATCAACCTAGAGCTCTG






TGCCTCCTACGTTTACCTGTCCA






TGTCTTACTGCTTTGACCGTGAT






GATGTGGCT





190
GATA2
NM_00114566
 2573-
GTCCAGTTGATTGTACGTAGCCA




2.1
 2672
CAGGAGCCCTGCTATGAAAGGA






ATAAAACCTACACACAAGGTTG






GAGCTTTGCAATTCTTTTTGGAA






AAGAGCTGGG





191
GLIS3
NM_00104241
  548-
ACTCGCGCTGGCCGGCCGGGGG




3.1
  647
AAGGGACCCGCACGCCGGGCTTT






GTTGTGGAAATCCCGGTTACCTG






GCTTATAACCCACACCATGGATA






ACTTATTGG





192
GLRX
ILMN_173730
  119-
AAAGCATAGTTGGTCTTGGTGTC




8.1
  218
ATATGGATCAGAGGCACAAGTG






CAGAGGCTGTGGTCATGCGGAA






CACTCTGTTATTTAAGATGGCTA






TCCAGATAAT





193
GNL3
NM_014366.4
 1733-
TACAGCAGGTGAACAGTCTACA





 1832
AGGTCTTTTATCTTGGATAAAAT






CATTGAAGAGGATGATGCTTATG






ACTTCAGTACAGATTATGTGTAA






CAGAACAAT





194
GNS
NM_002076.3
 4988-
CCTGTGTTTGCATCCTCTGTTCC





 5087
TATTCTGCCCTTGCTCTGTGTCA






TCTCAGTCATTTGACTTAGAAAG






TGCCCTTCAAAAGGACCCTGTTC






ACTGCTGC





195
GOLGA3
NM_005895.3
 8961-
CTCACTGACCGGAAGGTCCAGGT





 9060
GAATCTCGTCATAAGTGATCTCA






GGCTCTCACAGGATCCGGAGGG






AAATGTGTTAGAGGGTCTGGAA






AATTCAGTGC





196
GPATCH
NM_022078.2
 1686-
AGTCTGGGAGCAGCAGTCTTCGT



3

 1785
GGCTGGTTCAGGGTGTTTTGTTC






CGAGCCTGCCTGCCTGCCGGTTC






TATACCTCAGGGGCATTTTTACA






AAAAGCCC





197
GPI
NM_000175.2
 1696-
CAGTGCTCAAGTGACCTCTCACG





 1795
ACGCTTCTACCAATGGGCTCATC






AACTTCATCAAGCAGCAGCGCG






AGGCCAGAGTCCAATAAACTCGT






GCTCATCTG





198
GPR65
NM_003608.3
 1899-
TATGATTTTTCTCACTCTTTCTT





 1998
TGGACTCCAGGGTGTCAGCCATC






AGGTCTCCTAATTTTGTGTACCG






GTCTCCAACAACCCCAGCTACTG






AATACTGC





199
GSTO1
NM_004832.2
  897-
AGAGCTCTACTTACAGAACAGCC





  996
CTGAGGCCTGTGACTATGGGCTC






TGAAGGGGGCAGGAGTCAGCAA






TAAAGCTATGTCTGATATTTTCC






TTCACTAAT





200
GUSH
NM_000181.3
 2032-
GGTATCCCCACTCAGTAGCCAAG





 2131
TCACAATGTTTGGAAAACAGCCT






GTTTACTTGAGCAAGACTGATAC






CACCTGCGTGTCCCTTCCTCCCC






GAGTCAGG





201
GZMA
NM_006144.3
  636-
GCCTCCGAGGTGGAAGAGACTC





  735
GTGCAATGGAGATTCTGGAAGCC






CTTTGTTGTGCGAGGGTGTTTTC






CGAGGGGTCACTTCCTTTGGCCT






TGAAAATAA





202
GZMB
NM_004131.3
  541-
ACACTACAAGAGGTGAAGATGA





  640
CAGTGCAGGAAGATCGAAAGTG






CGAATCTGACTTACGCCATTATT






ACGACAGTACCATTGAGTTGTGC






GTGGGGGACC





203
GZMH
NM_033423.4
  718-
GGCCCCTCGTGTGTAAGGACGTA





  817
GCCCAAGGTATTCTCTCCTATGG






AAACAAAAAAGGGACACCTCCA






GGAGTCTACATCAAGGTCTCACA






CTTCCTGCC





204
HAT1
NM_003642.3
 1235-
AACCAAATAGAAATAAGCATGC





 1334
AACATGAACAGCTGGAAGAGAG






TTTTCAGGAACTAGTGGAAGATT






ACCGGCGTGTTATTGAACGACTT






GCTCAAGAGT





205
HAVCR2
NM_032782.3
  956-
TATATGAAGTGGAGGAGCCCAA





 1055
TGAGTATTATTGCTATGTCAGCA






GCAGGCAGCAACCCTCACAACCT






TTGGGTTGTCGCTTTGCAATGCC






ATAGATCCA





206
HDAC3
NM_003883.3
 1765-
AAGATGAAGAGAGAGAGATTTG





 1864
GAAGGGGCTCTGGCTCCCTAACA






CCTGAATCCCAGATGATGGGAA






GTATGTTTTCAAGTGTGGGGAGG






ATATGAAAAT





207
HERC1
NM_003922.3
14664-
CAATCGACATGGACAACTACATG





14763
CTCTCGAGAAACGTGGACAACG






CCGAGGGCTCCGACACTGACTAC






TGACCGTGCGGGTGCTCTCACCC






TCCCTTCTC





208
HERC3
NM_014606.2
 3796-
TAAGAATGATTTAGACTGACCTG





 3895
TCCTTTTTTATCTGCGCATGCGA






GAACATCACCTTCCTCTGTACAC






TTGGAAATGCCTCTGGCTTGTTG






CAGCCCTC





209
HK3
NM_002115.2
 2785-
AGTCAGAGGATGGGTCCGGCAA





 2884
AGGTGCGGCCCTGGTCACCGCTG






TTGCCTGCCGCCTTGCGCAGTTG






ACTCGTGTCTGAGGAAACCTCCA






GGCTGAGGA





210
HLA-B
NM_005514.6
  938-
CCCTGAGATGGGAGCCGTCTTCC





 1037
CAGTCCACCGTCCCCATCGTGGG






CATTGTTGCTGGCCTGGCTGTCC






TAGCAGTTGTGGTCATCGGAGCT






GTGGTCGC





211
HLA-
NM_002118.3
   21-
CCCGTGAGCTGGAAGGAACAGA



DMB

  120
TTTAATATCTAGGGGCTGGGTAT






CCCCACATCACTCATTTGGGGGG






TCAAGGGACCCGGGCAATATAG






TATTCTGCTC





212
HLA-G
NM_002127.4
 1181-
AAGAGCTCAGATTGAAAAGGAG





 1280
GGAGCTACTCTCAGGCTGCAATG






TGAAACAGCTGCCCTGTGTGGGA






CTGAGTGGCAAGTCCCTTTGTGA






CTTCAAGAA





213
HMGB1
NM_002128.4
  209-
TATGCATTTTTTGTGCAAACTTG





  308
TCGGGAGGAGCATAAGAAGAAGC






ACCCAGATGCTTCAGTCAACTTC






TCAGAGTTTTCTAAGAAGTGCTC






AGAGAGGT





214
HMGB2
NM_002129.3
  670-
TGCTGCATATCGTGCCAAGGGCA





  769
AAAGTGAAGCAGGAAAGAAGGG






CCCTGGCAGGCCAACAGGCTCA






AAGAAGAAGAACGAACCAGAAG






ATGAGGAGGAG





215
HNRNPA
NM_004499.3
 1246-
CCCCATGGAAATCACTCTCCTGT



B

 1345
TGACTATTTCCAGAGCTCTAGGT






GTTTAGGCAGCGTGTGGTGTCTG






AGAGGCCATAGCGCCATCATGG






GCTGATTTT





216
HNRNPK
NM_031263.2
  538-
TCCCTACCTTGGAAGAGGGCCTG





  637
CAGTTGCCATCACCCACTGCAAC






CAGCCAGCTCCCGCTCGAATCTG






ATGCTGTGGAATGCTTAAATTAC






CAACACTA





217
HOOK3
NM_032410.3
 2391-
GCAAGGTAGAGAAGTTGTGCCG





 2490
CTCAATCACAGACACCTGCACCC






ACAACATACTTCTGTTACACACA






AGAACATTTCAGGAAACTCAGCC






AGCTTATTT





218
HOPX
NM_139211.4
  590-
AACAATAGGAAGCTATGTGTATC





  689
TTCTGTGTAAAGCAGTGGCTTCA






CTGGAAAAATGGTGTGGCTAGC






ATTTCCCTTTGAGTCATGATGAC






AGATGGTGT





219
HPSE
NM_006665.5
 3920-
GAGGTTCCTATAATTGTCTCTGA





 4019
GTAACCCTTTGGAATGGAGAGG






GTGTTGGTCAGTCTACAAACTGA






ACACTGCAGTTCTGCGCTTTTTA






CCAGTGAAA





220
HSCB
NM_172002.3
  343-
TCCACCCAGATTTCTTCAGCCAG





  442
AGGTCTCAGACTGAAAAGGACTT






CTCAGAGAAGCATTCGACCCTGG






TGAATGATGCCTATAAGACCCTC






CTGGCCCC





221
HSD11B
NM_181755.1
  156-
GCCTACTACTACTATTCTGCAAA



1

  255
CGAGGAATTCAGACCAGAGATG






CTCCAAGGAAAGAAAGTGATTG






TCACAGGGGCCAGCAAAGGGAT






CGGAAGAGAGA





222
HSP90
NM_007355.3
 1531-
GGCATTCTCTAAAAATCTCAAGC



AB1

 1630
TTGGAATCCACGAAGACTCCACT






AACCGCCGCCGCCTGTCTGAGCT






GCTGCGCTATCATACCTCCCAGT






CTGGAGAT





223
HSPA6
NM_002155.4
 1990-
GTGGCACTCAAGCCCGCCAGGG





 2089
GGACCCCAGCACCGGCCCCATCA






TTGAGGAGGTTGATTGAATGGCC






CTTCGTGATAAGTCAGCTGTGAC






TGTCAGGGC





224
HUWE1
NM_031407.6
13637-
CCACCAACTCACCGTGTGTGTCC





13736
CAGCTGCCCCATCTTCCCCAGCG






CATACCTGTTCCTCTTCTCATTC






TCTCCCCGCCGCCTGTTTCCTCA






CCTTCTCT





225
HVCN1
NM_032369.3
  747-
TGTTCCAGGAGCACCAGTTTGAG





  846
GCTCTGGGCCTGCTGATTCTGCT






CCGGCTGTGGCGGGTGGCCCGG






ATCATCAATGGGATTATCATCTC






AGTTAAGAC





226
IDO1
NM_002164.3
   51-
CTATTATAAGATGCTCTGAAAAC





  150
TCTTCAGACACTGAGGGGCACCA






GAGGAGCAGACTACAAGAATGG






CACACGCTATGGAAAACTCCTGG






ACAATCAGT





227
IDS
NM_006123.4
 1016-
TGGATGGACATCAGGCAACGGG





 1115
AAGACGTCCAAGCCTTAAACATC






AGTGTGCCGTATGGTCCAATTCC






TGTGGACTTTCAGCGGAAAATCC






GCCAGAGCT





228
IER5
NM_016545.4
 1802-
ACTTTACACCTACCCCTCACCGG





 1901
AAAGCTAGACCCGCTTCAGGGCC






AGGAGTGGCGTTTCCGCACAGG






ATTTCCTAAGACGAGAGGGATTT






AGCCAAGAG





229
IFI27L
NM_032036.2
  305-
GTCAGTGTTGGGGGCCTGCTTGG



2

  404
GGAATTCACCTTCTTCTTCTCTC






CCAGCTGAACCCGAGGCTAAAGA






AGATGAGGCAAGAGAAAATGTA






CCCCAAGGT





230
IFNA17
NM_021268.2
  292-
TGAGATGATCCAGCAGACCTTCA





  391
ATCTCTTCAGCACAGAGGACTCA






TCTGCTGCTTGGGAACAGAGCCT






CCTAGAAAAATTTTCCACTGAAC






TTTACCAG





231
IFNAR1
NM_000629.2
 3124-
CTAATCAGCTCTCAGTGATCAAC





 3223
CCACTCTTGTTATGGGTGGTCTC






TGTCACTTTGAATGCCAGGCTGG






CTTCTCGTCTAGCAGTATTCAGA






TACCCCTT





232
IFNAR2
NM_000874.3
  632-
AAATACCACAAGATCATTTTGTG





  731
ACCTCACAGATGAGTGGAGAAG






CACACACGAGGCCTATGTCACCG






TCCTAGAAGGATTCAGCGGGAA






CACAACGTTG





233
IFNGR1
NM_000416.1
 1141-
CCCGGGCAGCCATCTGACTCCAA





 1240
TAGAGAGAGAGAGTTCTTCACCT






TTAAGTAGTAACCAGTCTGAACC






TGGCAGCATCGCTTTAAACTCGT






ATCACTCC





234
IGFBP7
NM_001553.2
  584-
ATCGGAATCCCGACACCTGTCCT





  683
CATCTGGAACAAGGTAAAAAGG






GGTCACTATGGAGTTCAAAGGAC






AGAACTCCTGCCTGGTGACCGGG






ACAACCTGG





235
IL16
NM_004513.4
 1263-
GGCATCTCCAACATCATCATCCA





 1362
ACGAAGACTCAGCTGCAAATGG






TTCTGCTGAAACATCTGCCTTGG






ACACAGGGTTCTCGCTCAACCTT






TCAGAGCTG





236
IL1B
NM_000576.2
  841-
GGGACCAAAGGCGGCCAGGATA





  940
TAACTGACTTCACCATGCAATTT






GTGTCTTCCTAAAGAGAGCTGTA






CCCAGAGAGTCCTGTGCTGAATG






TGGACTCAA





237
IL1R2
NM_173343.1
  114-
TGCTTCTGCCACGTGCTGCTGGG





  213
TCTCAGTCCTCCACTTCCCGTGT






CCTCTGGAAGTTGTCAGGAGCAA






TGTTGCGCTTGTACGTGTTGGTA






ATGGGAGT





238
IL4
NM_000589.2
  626-
GACACTCGCTGCCTGGGTGCGAC





  725
TGCACAGCAGTTCCACAGGCACA






AGCAGCTGATCCGATTCCTGAAA






CGGCTCGACAGGAACCTCTGGG






GCCTGGCGG





239
IL7
NM_000880.2
   39-
AATAACCCAGCTTGCGTCCTGCA





  138
CACTTGTGGCTTCCGTGCACACA






TTAACAACTCATGGTTCTAGCTC






CCAGTCGCCAAGCGTTGCCAAGG






CGTTGAGA





240
INTS4
NM_033547.3
  652-
CCCACGTGTCAGAACAGCAGCTA





  751
TAAAAGCCATGTTGCAGCTCCAT






GAAAGAGGACTGAAATTACACC






AAACAATTTATAATCAGGCCTGT






AAATTACTC





241
IRAK2
NM_001570.3
 1286-
GTGTTGGCCGAGGTCCTCACGGG





 1385
CATCCCTGCAATGGATAACAACC






GAAGCCCGGTTTACCTGAAGGAC






TTACTCCTCAGTGATATTCCAAG






CAGCACCG





242
IRF1
NM_002198.1
  511-
CTGTGCGAGTGTACCGGATGCTT





  610
CCACCTCTCACCAAGAACCAGAG






AAAAGAAAGAAAGTCGAAGTCC






AGCCGAGATGCTAAGAGCAAGG






CCAAGAGGAA





243
IRF4
NM_002460.1
  326-
GGGCACTGTTTAAAGGAAAGTTC





  425
CGAGAAGGCATCGACAAGCCGG






ACCCTCCCACCTGGAAGACGCGC






CTGCGGTGCGCTTTGAACAAGAG






CAATGACTT





244
KIAA
NM_014761.3
 2187-
ATGGATGGGACTCTTATGTCATA



0174

 2286
ACTTCTGTTACTCCTTTGGCCCA






TAGCTAAGGTCATCCTTCCCCAC






AGGGGTGGCTTTGGGATTGGATG






ATACAGCT





245
ITCH
NM_00125713
  439-
GAGGTGACAAAGAGCCAACAGA




8.1
  538
GACAATAGGAGACTTGTCAATTT






GTCTTGATGGGCTACAGTTAGAG






TCTGAAGTTGTTACCAATGGTGA






AACTACATG





246
ITFG2
NM_018463.3
 1985-
GTCTGGTCTTACCCATGTTCCTA





 2084
GCAACCCTGAGATGATTTTCTTC






CATTTACCAAAGCAGCCGGGTCA






GTGCTTTCTCACGTTGCCGTATT






CTTCAGGT





247
ITGAE
NM_002208.4
 3406-
CTGAATGCAGAGAACCACAGAA





 3505
CTAAGATCACTGTCGTCTTCCTG






AAAGATGAGAAGTACCATTCTTT






GCCTATCATCATTAAAGGCAGCG






TTGGTGGAC





248
ITGAL
NM_002209.2
 3906-
GTGAGGGCTTGTCATTACCAGAC





 4005
GGTTCACCAGCCTCTCTTGGTTT






CCTTCCTTGGAAGAGAATGTCTG






ATCTAAATGTGGAGAAACTGTAG






TCTCAGGA





249
JAK1
NM_002227.1
  286-
GAGAACACCAAGCTCTGGTAGC





  385
TCCAAATCGCACCATCACCGTTG






ATGACAAGATGTCCCTCCGGCTC






CACTACCGGATGAGGTTCTATTT






CACCAATT





250
KIAA
NM_015443.3
 4402-
CCTTCACATCCAGATCCCTGTCG



1267

 4501
GTGTTAGTTCCACTCTTGGTCTT






TCACGCTCCCCTTGCCTGTGGAA






CATTGTCTGGTCCTAGCTGTGGT






TCCCATTG





251
MYST4
NM_012330.3
 6541-
CCCAGACTGTAGCCATGCAGGGT





 6640
CCTGCACGGACTTTAACGATGCA






AAGAGGCATGAACATGAGTGTG






AACCTGATGCCAGCGCCAGCCTA






CAATGTCAA





252
KCTD12
NM_138444.3
 4208-
ACAAGTAAAATAACTTGACATG





 4307
AGCACCTTTAGATCCCTTCCCCT






CCATGGGCTTTGGGCCACAGAAT






GAACCTTTGAGGCCTGTAAAGTG






GATTGTAAT





253
KIAA
NM_014736.4
  236-
CGACATCAGTTTCATCGAGGAAA



0101

  335
GCTGAAAATAAATATGCAGGAG






GGAACCCCGTTTGCGTGCGCCCA






ACTCCCAAGTGGCAAAAAGGAA






TTGGAGAATT





254
SETD1B
XM_
 7779-
ATCGTGCCCAGTGTTAACCTCGG




037523.11
 7878
CTGGCCTTCACTAAGGGGACTAG






ACCTCCCTCTCCCCAGGAGCCCC






AGCCCCAGAGTGGTTTGCAATAA






TCAAGATA





255
KIR2DL
XM_00112635
  265-
GAGGTGACATATGCACAGTTGG



5A
4.1
  364
ATCACTGCGTTTTCACACAGACA






AAAATCACTTCCCCTTCTCAGAG






GCCCAAGACACCTCCAACAGAT






ACCACCATGT





256
KIR_
NM_014512.1
  719-
TCCGAAACCGGTAACCCCAGAC



Acti-

  818
ACCTACATGTTCTGATTGGGACC



vat-


TCAGTGGTCAAAATCCCTTTCAC



ing_


CATCCTCCTCTTCTTTCTCCTTC



Sub-


ATCGCTGGT



group_






2








257
KIR2D 
NM_012313.1
    1-
CCGGCAGCACCATGTCGCTCATG



S3

  100
GTCATCAGCATGGCATGTGTTGG






GTTCTTCTGGCTGCAGGGGGCCT






GGCCACATGAGGGATTCCGCAG






AAAACCTTC





258
KLRB1
NM_002258.2
  357-
CAGCAACTCCGAGAGAAATGCTT





  456
GTTATTTTCTCACACTGTCAACC






CTTGGAATAACAGTCTAGCTGAT






TGTTCCACCAAAGAATCCAGCCT






GCTGCTTA





259
KLRC1
NM_002259.3
  336-
ACCTATCACTGCAAAGATTTACC





  435
ATCAGCTCCAGAGAAGCTCATTG






TTGGGATCCTGGGAATTATCTGT






CTTATCTTAATGGCCTCTGTGGT






AACGATAG





260
KLRC2
NM_002260.3
  943-
TATGTGAGTCAGCTTATAGGAAG





 1042
TACCAAGAACAGTCAAACCCAT






GGAGACAGAAAGTAGAATAGTG






GTTGCCAATGTCTCAGGGAGGTT






GAAATAGGAG





261
KLRD1
NM_002262.3
  597-
CAATTTTACTGGATTGGACTCTC





  696
TTACAGTGAGGAGCACACCGCCT






GGTTGTGGGAGAATGGCTCTGCA






CTCTCCCAGTATCTATTTCCATC






ATTTGAAA





262
KLRF1
NM_016523.1
  544-
TATACAGAAAAACCTAAGACAA





  643
TTAAACTACGTATGGATTGGGCT






TAACTTTACCTCCTTGAAAATGA






CATGGACTTGGGTGGATGGTTCT






CCAATAGAT





263
KLRF1 
NM_016523.2
  849-
AAGTGCAATTAAATGCCAAAATC



b

  948
TCTTCTCCCTTCTCCCTCCATCA






TCGACACTGGTCTAGCCTCAGAG






TAACCCCTGTTAACAAACTAAAA






TGTACACT





264
KRTAP
NM_198696.2
  213-
CTGCTGCCAGGCGGCCTGTGAGC



10-3

  312
CCAGCCCCTGCCAGTCAGGCTGC






ACCAGCTCCTGCACGCCCTCGTG






CTGCCAGCAGTCTAGCTGCCAGC






CAGCTTGC





265
KYNU
NM_00103299
  936-
TTGCCTGCTGGTGTTCCTACAAG




8.1
 1035
TATTTAAATGCAGGAGCAGGAG






GAATTGCTGGTGCCTTCATTCAT






GAAAAGCATGCCCATACGATTA






AACCTGCGAG





266
LAMA5
NM_005560.4
11163-
CCAACCCCGGCCCCTGGTCAGGC





11262
CCCTGCAGCTGCCTCACACCGCC






CCTTGTGCTCGCCTCATAGGTGT






CTATTTGGACTCTAAGCTCTACG






GGTGACAG





267
LDHA
NM_00116541
 1348-
ATCTTGTGTAGTCTTCAACTGGT




6.1
 1447
TAGTGTGAAATAGTTCTGCCACC






TCTGACGCACCACTGCCAATGCT






GTACGTACTGCATTTGCCCCTTG






AGCCAGGT





268
LEF1
NM_016269.4
 3136-
AACACATAGTGGCTTCTCCGCCC





 3235
TTGTAAGGTGTTCAGTAGAGCTA






AATAAATGTAATAGCCAAACCC






ACTCTGTTGGTAGCAATTGGCAG






CCCTATTTC





269
LETM2
NM_144652.3
 1331-
AAAGGACCCATCACTTCTTCTGA





 1430
AGAACCTACACTCCAGGCCAAAT






CACAAATGACGGCCCAGAACAG






CAAGGCTAGTTCAAAAGGAGCA






TAAAGGACTA





270
LIF
NM_002309.3
 1241-
GGGATGGAAGGCTGTCTTCTTTT





 1340
GAGGATGATCAGAGAACTTGGG






CATAGGAACAATCTGGCAGAAG






TTTCCAGAAGGAGGTCACTTGGC






ATTCAGGCTC





271
LILRA5
NM_021250.3
 1044-
TTGAATGCTGGAGCCTTGGAAGC





 1143
GAATCTGATGGTCCTAGGAGGTT






CGGGAAGACCATCTGAGGCCTAT






GCCATCTGGACTGTCTGCTGGCA






ATTTCTTT





272
LILRA5 
NM_181879.2
  546-
CACCCTCTCAGCCCTGCCCAGTC



b

  645
CTGTGGTGACCTCAGGAGAGAA






CGTGACCCTCCAGTGTGGCTCAC






GGCTGAGATTCGACAGGTTCATT






CTGACTGAG





273
LOC
NR_002809.2
  471-
GCGGCAGCCAATCAGCGCGCGG



338799

  570
CTTCTATAGGGCTTGAGTTATTA






GACGCTGATCTCAAAACATCCTT






CATCAGACACGAAGGAGAGGCC






AACAGATGAG





274
LOC100
XM_00171659
  568-
AGGGTCATGCAGCTACTGAGGTC



129022
1.1
  667
ACAGCCTGGATTCATACACAGGT






CTGACTCCTGAGCACTTAGCCAG






GTGGCTGTAACAGTGTTCCCAGA






AACACAGG





275
LOC100
XM_00173282
 1148-
ACCTGTCTTCCGGGTCTGTTCAC



129697
2.2
 1247
CCGTCCCCTGGACTGGCACCAGC






ACAGAGGGTCGAGTGTTGGCAC






CTGTCTTCTGGGTCTCCATCCCT






CCCTTTGTT





276
LOC100
XM_00171715
 1469-
GAGAATGTCTGCGCGGAGACAG



130229
8.1
 1568
CATAGCTCTGTAGAAATGAGTGG






CAGCGTATGTAACCTGGCATTTT






GAACCCAGGAGCACAATTTTATT






AAAGGAAAA





277
LOC100
XR_036994.1
   15-
GAGTAGTAGGTGGACAGCCGTC



132797

  114
CCACACAAGGGTTTGTATCTGGG






CTACACAGATTCCCTTCAGAAAA






GCACCAATGTAAGCAACTCCCTT






ACAGTTGCT





278
LOC100
XR_039238.1
  342-
GAGATAGCTTCCTGAAATGTGTG



133273

  441
AAGGAAAATGATCAGAAAAAGA






AAGAAGCCAAAGAGAAAGGTAC






CTGGGTTCAACTAAAGTGCCAGC






CTGCTCCACC





279
LOC
NM_144692.1
 3367-
GCTCTGTCCTTTGCCGCTCAGAC



148137

 3466
CAAAAACCTTAGAGCTGTCTTTG






ACTTCTGTCTTTCCCTTCCACCC






ACAGTTAACCAGGAAATCCTGCC






ATCTCCGC





280
LOC
NR_024275.1
 5062-
GGTTACAGCCATTTTGTGTGATT



151162

 5161
CACTTCGGGGGTTAAGTAATGCA






GGATTCTGCAAACAAGGTGTCGC






CGTCCAAATGTACTGTCCTGGCA






TAGAGAGC





281
C1orf
NM_00100380
 2561-
ACATGGCGCCACGGCCACTTCCT



222
8.1
 2660
GCTGCCCTGGACCCCGCAAGCCC






AGGGACATCCAAGAGCACCCCT






CCTGAGACCCCAGACTCAGAAG






CAGCGAGAAG





282
LOC
XM_934917.1
  376-
CCCCTGGTGGACCGCGACCTCCG



339674

  475
CAAGACGCTAATGGTGCGCGAC






AACCTGGCCTTCGGCGGCCCGGA






GGTCTGAGCCGACTTGCAAAGG






GGATAGGCGG





283
LOC
XM_371757.4
  210-
GCAAAGCACTATCACAAGGAAT



648000

  309
ATAGGCAAATGTACAGAACTGA






AATTCGAGTGGCGAGGATGGCA






AGAAAAGCTGGCAACTTCTATGT






ACCTGCAGAAC





284
LOC
XR_017684.2
   82-
AAGATTATGTCTTCCCCTGTTTC



391126

  181
CAAAGAGCTGAGACAGAAGTACA






ATGTGCAATCCATGCCCATCCGA






AAGGATGATGAAGTTCAGGTTGT






ACGAGGGC





285
LOC
XM_930634.1
 1448-
ATGGGACCCACTCTACTGAGGCT



399753

 1547
TTATGTAGAACTCATAGAGGAAG






CTGGCTTTGAGGAATGAACTACC






CTGTGCTTTTCTTAGGACTAAAA






TCTCAGGA





286
LOC
XM_934471.1
   21-
GACGGTAACCGGGACCCAGTGT



399942

  120
CTGCTCCTGTCACCTTCGCCTCC






TAATCCCTAGCCACTATGCGAGA






TGACTCCTTCAACACCTTCAGTG






AGACGGGTG





287
LOC
XM_498648.3
  552-
GAGTTTTCCAAACCCTGGATTTC



440389

  651
CTTCGGAGAGAGCTAGATTCTAT






TCCATTCTTGGAATTCAGCTCCT






TGCCCTTCTCTGTGACCCCGGAT






CGCGAATG





288
LOC
XM_942885.1
 1533-
TGTTGCAAAAGCCAACTACCACT



440928

 1632
GTCAAACTTAGCCCGTTTACAAC






ATGGGGAAAGGCGTATTTCTTAC






TAATATCTCAACAACGATAACAA






TGCTGTAT





289
LOC
XR_018937.2
  287-
CGGGTGCAGCGGGAAAAGGCTA



441073

  386
ATGGCACAACTGTCCACGTAGGC






ATTCACCCCAGCAAGGTGGTTAT






CACTAGGCTAAAACTGGACAAA






GACTGTGAAA





290
LOC
XR_036892.1
  591-
GGTGAAGAATTTGTTCTATTATG



642812

  690
AAGATACTGTCTGGGCTAAAAA






GCTTACAGTGAGTGGAAGATAG






CAACTTGTAGGGTTGGTGGCTGA






ACAGGCCGAC





291
LOC
XM_927980.1
  255-
CTGGCTCAAGGATGGCACGGTGT



643319

  354
TATGTGAGCTCAATAATGCACTG






TACCCCAAGGGGCAGGTCCCAGT






AAAGAAGATCCAGGCCTCCACC






ATGGCCTTC





292
LOC
XR_017529.2
   38-
CAGGCGCTGCAAGTTCTCCCAGG



644315

  137
AGAAAGCCATGTTCAGTTCGAGC






GCCAAGATCGTGAAGCCCAATG






GCGAGAAGCCGGACGAGTTCGA






GTCCGGCCAT





293
LOC
XM_928884.1
   13-
GAAGCACTGGTAAATGTCTGCTG



645914

  112
CATTAACTCACTCAGACCAAACT






TTCTCTTATCTAGGTCCAAAAGG






AAGCTGCTCGGCTGGAAGGAAC






CTGGTGAGG





294
LOC
XR_018104.1
  670-
AGGTGCTGCAAAATTACCAGGA



647340

  769
ATACAGTCTGGCCAACAGCATCT






ACTACTCTCTGAAGGAGTCCACC






ACTAGTGAGCAGAGTGCCAGGA






TGACAGCCAT





295
LOC
XR_038906.2
 1638-
TGGAGAGAAGAATGAAGAGGTG



648927

 1737
GTGGTTCTGGGTTTGATTTGAGT






TCACCTGTGGGCAGTGGGCAGTG






TCTTGGTGAAAGGGAGCGGATA






CTACTTTTTG





296
LOC
XM_938755.1
   38-
GCCCTTCTGCCATCAACGAGGTG



653773

  137
GTGACCCAAGAACATACCATCA






ACATTCACAAGCGCATCCATGGA






GAGGGCTTCAAGAAGCGTGCTCC






TCGGGCACT





297
LOC
XR_015610.3
 1861-
GTAGTTGTCCACTGCTTTCCTGG



728533

 1960
ATGGATGGGACTCTTATGTCATA






ACTTCTATACTCCTTTGGCCCAT






AGCTAAGGTCATCCTTCCCCACA






GGGGTGGC





298
LOC
XM_00113319
  510-
CCAAACCAAAAGAGGCAAGCAA



728835
0.1
  609
GTCTGCGCTGACCCCAGTGAGTC






CTGGGTCCAGGAGTACGTGTATG






ACCTGGAACTGAACTGAGCTGCT






CAGAGACAG





299
LOC
XR_040891.2
  625-
CCCTGGGTGCCCCTTAACCCGGG



729887

  724
CGGTAGCTCGTTAAGATGGCGAA






GTGTCCGGTCCGGAACACGCGA






AACCCCAAATCCCGCCTGCCCGA






CCTCCTGAC





300
LOC
XM_00113427
  765-
GCGCGGTTGCGGTTAGCGGGCGC



732111
5.1
  864
GGTGCCAAAGCTGCCATCCCCAG






CTCACAGCTCCTCATATCCACCC






TGCCCTCATCTTTATGAATTGCG






TGTAGACC





301
LOC
XM_00113301
  182-
GCCCTTCAGAGCTGCGGGAGATC



732371
9.1
  281
ATTGATGAGTGCCGGGCCCATGA






TCCCTCTGTGCGGCCCTCTGTGG






ATGAGCAGAAGCGCAGACTTAA






TGATGTGTT





302
LOC
NM_00109977
 2666-
ATGTTGCATTGACTAGAGGAAAG



91431
6.1
 2765
AGGCATTTGTTGATTGTGGGAAA






TTTAGCCTGTTTGAGGAAAAATC






AACTTTGGGGACGAGTGATCCAA






CACTGCGA





303
P2RY5
NM_005767.5
 2026-
AGATTGTTTGCACTGGCGTGTGG





 2125
TTAACTGTGATCGGAGGAAGTGC






ACCCGCCGTTTTTGTTCAGTCTA






CCCACTCTCAGGGTAACAATGCC






TCAGAAGC





304
LPCAT4
NM_153613.2
 1560-
CCCCACACACCTCTCGAGGCACC





 1659
TCCCAGACACCAAATGCCTCATC






CCCAGGCAACCCCACTGCTCTGG






CCAATGGGACTGTGCAAGCACCC






AAGCAGAA





305
LPIN2
NM_014646.2
 5620-
AGAAAAAACTTAAAAATGGGAT





 5719
GTCCTAAAATGAAAGCTGCTCAA






AGTCACAGAACAACCGAGGGAC






AAAGGAGATTGGATGACTGGGA






AGCGCTGGCCC





306
C1orf
NM_018372.3
 1543-
TTCCAATACCCAGCTTGCTTCCA



103

 1642
TGGCCAATCTAAGGGCAGAGAA






GAATAAAGTGGAGAAACCATCT






CCTTCTACCACAAATCCACATAT






GAACCAATCC





307
LRRC47
NM_020710.2
 2461-
GGGTCAGTGACGGACACTTACCT





 2560
GACAGCGGATCCACAATATTCTC






GTGCAGTGTGTTTGGAATCCTGG






TCTGGGCTCTCGTCGTTGGCCTT






GTAGATCA





308
LY96
NM_015364.4
  439-
AAGGGAGAGACTGTGAATACAA





  538
CAATATCATTCTCCTTCAAGGGA






ATAAAATTTTCTAAGGGAAAATA






CAAATGTGTTGTTGAAGCTATTT






CTGGGAGCC





309
LYN
NM_002350.1
 1286-
TCCTGAAGAGCGATGAAGGTGG





 1385
CAAAGTGCTGCTTCCAAAGCTCA






TTGACTTTTCTGCTCAGATTGCA






GAGGGAATGGCATACATCGAGC






GGAAGAACTA





310
MAGEA1
NM_004988.4
  477-
AGGGGCCAAGCACCTCTTGTATC





  576
CTGGAGTCCTTGTTCCGAGCAGT






AATCACTAAGAAGGTGGCTGATT






TGGTTGGTTTTCTGCTCCTCAAA






TATCGAGC





311
MAGEA3
NM_005362.3
  850-
ACTGTGCCCCTGAGGAGAAAATC





  949
TGGGAGGAGCTGAGTGTGTTAG






AGGTGTTTGAGGGGAGGGAAGA






CAGTATCTTGGGGGATCCCAAGA






AGCTGCTCAC





312
MAP3K7
NM_145333.1
  671-
GCCATATTATACTGCTGCCCACG





  770
CAATGAGTTGGTGTTTACAGTGT






TCCCAAGGAGTGGCTTATCTTCA






CAGCATGCAACCCAAAGCGCTA






ATTCACAGG





313
MARCKS
NM_002356.6
 1800-
GTCAAAAAGGGATATCAAATGA





 1899
AGTGATGGGGTCACAATGGGGA






AATTGAAGTGGTGCATAACATTG






CCAAAATAGTGTGCCACTAGAA






ATGGTGTAAAG





314
MARCKS
NM_023009.5
 1117-
TCCAAGTAGGTTTTGTTTACCCT



L1

 1216
ACTCCCCAAATCCCTGAGCCAGA






AGTGGGGTGCTTATACTCCCAAA






CCTTGAGTGTCCAGCCTTCCCCT






GTTGTTTT





315
MBD1
NM_015844.2
 2380-
TGGCTGCAGGCCTGACTACTGCC





 2479
CACACCAACGAGGTGATCTAGC






AGATACATGGCAACGTGTGAACT






GCAACAACGCCTGGTGCCCCAGC






ACCAACCTT





316
C19orf
NM_174918.2
 1062-
CATACTAGAGTATACTGCGGCGT



59

 1161
GTTTTCTGTCTACCCATGTCATG






GTGGGGGAGATTTATCTCCGTAC






ATGTGGGTGTCGCCATGTGTGCC






CTGTCACT





317
MED16
NM_005481.2
 2152-
TCTGAAGCCCAGCTGCCTGCCCG





 2251
TGTATACGGCCACCTCGGATACC






CAGGACAGCATGTCCCTGCTCTT






CCGCCTGCTCACCAAGCTCTGGA






TCTGCTGT





318
MEN1
NM_130799.2
 2222-
CCCAGCCCCTAGAAACCCAAGCT





 2321
CCTCCTCGGAACCGCTCACCTAG






AGCCAGACCAACGTTACTCAGG






GCTCCTCCCAGCTTGTAGGAGCT






GAGGTTTCA





319
MERTK
NM_006343.2
  666-
GAAGAGATCGTGTCTGATCCCAT





  765
CTACATCGAAGTACAAGGACTTC






CTCACTTTACTAAGCAGCCTGAG






AGCATGAATGTCACCAGAAACA






CAGCCTTCA





320
MFSD1
NM_022736.2
 2023-
AAGGGCTGCGTTACACAAAATA





 2122
AACAATGGCATTGTCATAGGCCT






TCCTTTTACTAGTAGGGCATAAT






GCTAGGGAATATGTGAAGATGTT






TTTATGAAG





321
MID1IP
NM_021242.5
 3472-
AGCTGGCATTTCGCCAGCTTGTA



1

 3571
CGTAGCTTGCCACTCAGTGAAAA






TAATAACATTATTATGAGAAAGT






GGACTTAACCGAAATGGAACCA






ACTGACATT





322
MPDU1
NM_004870.3
 1226-
CATTCAGCCAAGCCTCCTCCTCT





 1325
AGCAGCAATTTCCAGCTGTGTAA






CACTATCCTGGGCAAATGTTTTA






CCCTGTCCTCCAGCCTCCCTGCT






TCCCTTCT





323
MRPL27
NM_148571.1
 2189-
TCAAACTGGTAGCTATGCTTTGA





 2288
TGTCCTGTTGAGGCCATCGGACA






GAGACTGGAGCCCAGGTGACAG






GAGATGGTGATACCAGAAGTCA






AGGGTTGGGG





324
MRPS16
NM_016065.3
 1811-
ATTCAAATGTGGCTGTGATTTCT





 1910
GCATATATCATAGATGGGATCCT






TCTGAGAATACTGGAATAGGGA






ATTAGGACACCAAGCCAATTCAG






CTGTGAACC





325
MS4A2
NM_000139.3
  662-
TTCTCACCATTCTGGGACTTGGT





  761
AGTGCTGTGTCACTCACAATCTG






TGGAGCTGGGGAAGAACTCAAA






GGAAACAAGGTTCCAGAGGATC






GTGTTTATGA





326
MS4A6A
NM_022349.3
 1290-
CTGGGAAGTTAAATGACTGGCCT





 1389
GGCATTATGCTATGAGTTTGTGC






CTTTGCTGAGGACACTAGAACCT






GGCTTGCCTCCCTTATAAGCAGA






AACAATTT





327
MS4A6A 
NM_152851.2
  880-
CTGCGGTGGAAACAGGCTTACTC



b

  979
TGACTTCCCTGGGAGTGTACTTT






TCCTGCCTCACAGTTACATTGGT






AATTCTGGCATGTCCTCAAAAAT






GACTCATG





328
MTCH1
NM_014341.2
 2081-
TCCTCCTCATCTAATGCTCATCT





 2180
GTTTAATGGTGATGCCTCGCGTA






CAGGATCTGGTTACCTGTGCAGT






TGTGAATACCCAGAGGTTGGGCA






GATCAGTG





329
MYADM
NM_00102082
 2656-
TCTTTTTCCTGGCCATGAGGACA




0.1
 2755
AAAATTACTGAGTGGCCCTTAAA






GAGGGAAGTTTGTTTTCAGCTGT






TCTCTTTTGCCCGTAGGTGGGAG






GGTGGGGA





330
MYADM 
NM_00102082
 2789-
TGAATGTGTAGTGCACACGCACG



b
0.1
 2888
GGTGTTTCTGTGTGCTAGTTGCT






TCTTGCTGCTGCTTCCTGCTTGT






CTGGGACTCACATACATAACGTG






ATATATAT





331
C19orf
NM_019107.3
  649-
TGTCCCTGAAAGGGCCAGCACAT



10

  748
CACTGGTTTTCTAGGAGGGACTC






TTAAGTTTTCTACCTGGGCTGAC






GTTGCCTTGTCCGGAGGGGCTTG






CAGGGTGG





332
MYL12A
NM_006471.3
  305-
TCTCTGGGTAGCAGGGTGGTGTG





  404
ATAGCGGCAGCGAGGGGCTCGG






AGAGGTGCTCGGATTCTCGTAGC






TGTGCCGGGACTTAACCACCACC






ATGTCGAGC





333
MYLIP
NM_013262.3
 2701-
TTGGGCATTTTGGAAGCTGGTCA





 2800
GCTAGCAGGTTTTCTGGGATGTC






GGGAGACCTAGATGACCTTATCG






GGTGCAATACTAGCTAAGGTAA






AGCTAGAAA





334
NAT5
NM_181528.3
  735-
AAACATACCACTCTCATGGTTCA





  834
TAGTATTCACTGTATGTATGCTA






GGGAAAAGACTTGCTCCAGTCTC






CTCCTCAGTTCTGTGCCTGAGAA






CCACTGCT





335
NADK
NM_023018.4
 2449-
TCCGGGGCTAGTGATCGTGATCC





 2548
CTTTTATTTGCAACTGTAATGAG






AATTTTTCACACTAACACAGCGA






GGGACTCAACACGCTGATTCTCC






TCCTGCCT





336
NAGK
NM_017567.4
 1362-
GGGCCAGGCACATCGGGCACCT





 1461
CCTCCCCATGGACTATAGCGCCA






ATGCCATTGCCTTCTATTCCTAC






ACCTTTTCCTAGGGGGCTGGTCC






CGGCTCCAC





337
NCAPG
NM_022346.4
 3080-
ACCCAAGCATCAAAGTCTACTCA





 3179
GCTAAAGACTAACAGAGGACAG






AGAAAAGTGACAGTTTCAGCTA






GGACGAACAGGAGGTGTCAGAC






TGCTGAAGCCG





338
NCOA5
NM_020967.2
 2837-
TGGACATGTTCTCGAGATGGGTG





 2936
GCTGTTCGCGACTTTTGTACCAG






AGTGAAATTGTTAGAAGGAGGG






TTTCTGGCTGTGGTTCTAAATGG






AGCCCCAGG





339
NCR1
NM_004829.5
  603-
CGATGTTTTGGCTCCTATAACAA





  702
CCATGCCTGGTCTTTCCCCAGTG






AGCCAGTGAAGCTCCTGGTCACA






GGCGACATTGAGAACACCAGCC






TTGCACCTG





340
NDRG2
NM_016250.2
 1516-
TATGCATCCTCTGTCCTGATCTA





 1615
GGTGTCTATAGCTGAGGGGTAAG






AGGTTGTTGTAGTTGTCCTGGTG






CCTCCATCAGACTCTCCCTACTT






GTCCCATA





341
NDUFA4
NM_002489.3
  262-
TGGGACAGAAATAACCCAGAGC





  361
CCTGGAACAAACTGGGTCCCAAT






GATCAATACAAGTTCTACTCAGT






GAATGTGGATTACAGCAAGCTG






AAGAAGGAAC





342
NDUFAF
NM_174889.4
  486-
TCCTGCCTCCACCAGTTCAAACT



2

  585
CAAATTAAAGGCCATGCCTCTGC






TCCATACTTTGGAAAGGAAGAAC






CCTCAGTGGCTCCCAGCAGCACT






GGTAAAAC





343
NDUFB3
NM_002491.2
  383-
ACAATGGAAGATAGAAGGGACA





  482
CCATTAGAAACTATCCAGAAGA






AGCTGGCTGCAAAAGGGCTAAG






GGATCCATGGGGCCGCAATGAA






GCTTGGAGATAC





344
NDUFS4
NM_002495.2
  326-
GAGTTTGATACCAGAGAGCGAT





  425
GGGAAAATCCTTTGATGGGTTGG






GCATCAACGGCTGATCCCTTATC






CAACATGGTTCTAACCTTCAGTA






CTAAAGAAG





345
NDUFV2
NM_021074.4
  687-
TTACTATGAGGATTTGACAGCTA





  786
AGGATATTGAAGAAATTATTGAT






GAGCTCAAGGCTGGCAAAATCC






CAAAACCAGGGCCAAGGAGTGG






ACGCTTCTCT





346
NFAT5
NM_138713.3
 3857-
CCCAAGAAGCATTTTTTGCAGCA





 3956
CCGAACTCAATTTCTCCACTTCA






GTCAACATCAAACAGTGAACAA






CAAGCTGCTTTCCAACAGCAAGC






TCCAATATC





347
NFATC1
NM_172389.1
 1985-
CGAATTCTCTGGTGGTTGAGATC





 2084
CCGCCATTTCGGAATCAGAGGAT






AACCAGCCCCGTTCACGTCAGTT






TCTACGTCTGCAACGGGAAGAG






AAAGCGAAG





348
NFATC4
NM_00113602
 2297-
ACAAGAGGGTTTCCCGGCCAGTC




2.2
 2396
CAGGTCTACTTTTATGTCTCCAA






TGGGCGGAGGAAACGCAGTCCT






ACCCAGAGTTTCAGGTTTCTGCC






TGTGATCTG





349
NFKB1
NM_003998.3
 3606-
CGGATGCATCTGGGGATGAGGTT





 3705
GCTTACTAAGCTTTGCCAGCTGC






TGCTGGATCACAGCTGCTTTCTG






TTGTCATTGCTGTTGTCCCTCTG






CTACGTTC





350
NFKB2
NM_002502.2
  826-
ATCTCCGGGGGCATCAAACCTGA





  925
AGATTTCTCGAATGGACAAGACA






GCAGGCTCTGTGCGGGGTGGAG






ATGAAGTTTATCTGCTTTGTGAC






AAGGTGCAG





351
NIPBL
NM_133433.3
 8755-
GCGCCGTGATGGCCGCAAACTG





 8854
GTGCCTTGGGTAGACACTATTAA






AGAGTCAGACATTATTTACAAAA






AAATTGCTCTAACGAGTGCTAAT






AAGCTGACT





352
NLRP3
NM_00107982
  416-
AGTGGGGTTCAGATAATGCACGT




1.2
  515
GTTTCGAATCCCACTGTGATATG






CCAGGAAGACAGCATTGAAGAG






GAGTGGATGGGTTTACTGGAGTA






CCTTTCGAG





353
NME1-
NM_00101813
  484-
ACCTGGAGCGCACCTTCATCGCC



NME2
6.2
  583
ATCAAGCCGGACGGCGTGCAGC






GCGGCCTGGTGGGCGAGATCATC






AAGCGCTTCGAGCAGAAGGGAT






TCCGCCTCGT





354
NUDT18
NM_024815.3
 1369-
CCCCAGTGGCATCTCCTCATCAC





 1468
GTTCTGTGCCGTCCTTGGGAAAG






GCCTGCATTCTGATCCTTCCAGG






CCCTTCGAGCATGGAGGGGCACT






GGGGAAGG





355
NUMB
NM_00100574
 2833-
CATAAGATTGATTTATCATTGAT




4.1
 2932
GCCTACTGAAATAAAAAGAGGA






AAGGCTGGAAGCTGCAGACAGG






ATCCCTAGCTTGTTTTCTGTCAG






TCATTCATTG





356
NUP153
NM_005124.3
 5104-
TTTATGATCCAGCAGATTATTCA





 5203
CTGATTTGACATAGTCTGGCTGT






ACCCAGGAATGGAGCCTGCACG






GTGAATGGCTTTGTATAGAACCT






CTTTGTCTA





357
OLR1
NM_002543.3
 1524-
ACACATTTTGGGACAAGTGGGG





 1623
AGCCCAAGAAAGTAATTAGTAA






GTGAGTGGTCTTTTCTGTAAGCT






AATCCACAACCTGTTACCACTTC






CTGAATCAGT





358
OSBP
ILMN_170637
  130-
TTCTCTTCCTTCACCATCTGCAC




6.1
  229
TACATTTCTGGCTGATCCCAATC






AGATTCCCGCTAATGGAAGAAGT






TTAGAATCTTTCAGGTGGAATAA






AGTCACAT





359
FAM105
NM_138348.4
 2537-
TGCAGATGGTGTTCACATGAACC



B

 2636
GGAGACATCACTCTTTAGGATTC






TACTGGCAGCCCCTGAATTGGCT






CAACGTTTGTGGAGGTGGTATTT






CCCTGAAG





360
P2RY10
NM_198333.1
  972-
TTACACCATGGTAAAGGAAACC





 1071
ATCATTAGCAGTTGTCCCGTTGT






CCGAATCGCACTGTATTTCCACC






CTTTTTGCCTGTGCCTTGCAAGT






CTCTGCTGC





361
PACS1
NM_018026.3
 3830-
CGCTGTCTTCGTGGCTTCCACCC





 3929
TTGTTAATGATGCTCCTGCCTCT






GCCTCCCAGCCCCTCACCCAGCA






CAGCTCTGCCTGGACTTGGAGAG






ATGGGAGG





362
PANK2
NM_153640.2
  824-
AGTGGATAAACTAGTACGAGAT





  923
ATTTATGGAGGGGACTATGAGA






GGTTTGGACTGCCAGGCTGGGCT






GTGGCTTCAAGCTTTGGAAACAT






GATGAGCAAG





363
PDCD10
NM_145859.1
  901-
AAGAGATGTACTTCTCAGTGGCA





 1000
GTATTGAACTGCCTTTATCTGTA






AATTTTAAAGTTTGACTGTATAA






ATTATCAGTCCCTCCTGAAGGGA






TCTAATCC





364
PDGFD
NM_033135.3
 3394-
CCTGTGAAAACATCAGTTTCCTG





 3493
TACCAAAGTCAAAATGAACGTTA






CATCACTCTAACCTGAACAGCTC






ACAATGTAGCTGTAAATATAAAA






AATGAGAG





365
PDSS1
NM_014317.3
 1199-
CATGAAGCAATAAGAGAGATCA





 1298
GTAAACTTCGACCATCCCCAGAA






AGAGATGCCCTCATTCAGCTTTC






AGAAATTGTACTCACAAGAGAT






AAATGACAAC





366
PELP1
NM_014389.2
 1989-
TGGCCCCGTCTCCTCGCTGCCCA





 2088
CCTCCTCTTGCCTGTGCCCTGCA






AGCCTTCTCCCTCGGCCAGCGAG






AAGATAGCCTTGAGGTCTCCTCT






TTCTGCTC





367
PFAS
NM_012393.2
 5109-
CATCCCTAGATCCTAACCCTTTA





 5208
GTATGCTGGAATTCTACTCTTCA






CTTACTGCATTGACTGTTGTTGA






TTAGTTATTATTGCAAAGCACTG






TCACCGGC





368
PFDN5
NM_145897.2
  232-
ATCGATGTGGGAACTGGGTACTA





  331
TGTAGAGAAGACAGCTGAGGAT






GCCAAGGACTTCTTCAAGAGGA






AGATAGATTTTCTAACCAAGCAG






ATGGAGAAAA





369
PFDN5 
NM_145897.2
  331-
ATCCAACCAGCTCTTCAGGAGAA



b

  430
GCACGCCATGAAACAGGCCGTC






ATGGAAATGATGAGTCAGAAGA






TTCAGCAGCTCACAGCCCTGGGG






GCAGCTCAGG





370
PGK1
NM_000291.3
 1122-
GTCCTGAAAGCAGCAAGAAGTA





 1221
TGCTGAGGCTGTCACTCGGGCTA






AGCAGATTGTGTGGAATGGTCCT






GTGGGGGTATTTGAATGGGAAG






CTTTTGCCCG





371
PHF8
NM_015107.2
 5704-
ATCAAGGTTTAGAACACCATGAG





 5803
ATAGTTACCCCTGATCTCCAGTC






CCTAGCTGGGGGCTGGACAGGG






GGAAGGGAGAGAGGATTTCTAT






TCACCTTTAA





372
PHLPP2
NM_015020.3
 7601-
CCAGTTGGGTGTGGCAGATCTAC





 7700
TGAATATCAAATGATGCTCTTCT






TCCCATGTAGACCTTCAGCAAAA






GCCGGTACTTGGAAGCCACAGG






CTCACCTTC





373
PHRF1
NM_020901.3
 5239-
GGGAAATGGGGGGCATCACCAT





 5338
GCCTGCCGTCGGGTTCCTGCGCT






GACACCTGGTCTGTGCACCTGTG






TTGCTCACAGTTGAAAACTGGAC






ACTTTTGTA





374
PI4K2A
NM_018425.3
 3886-
TCCATGGAATTGCTGAGACGTGG





 3985
CTCCTGGGGCTATTTCTCCCTAA






TAAAGGATGATCCAGGTCCTCAT






TTCCAAAGTCCCAATGCTCTGAA






AACCAAAA





375
PIK3CD
NM_005026.3
 4799-
GAGCCAGAAGTAGCCGCCCGCT





 4898
CAGCGGCTCAGGTGCCAGCTCTG






TTCTGATTCACCAGGGGTCCGTC






AGTAGTCATTGCCACCCGCGGGG






CACCTCCCT





376
PIM2
NM_006875.3
 1947-
TTTTTGGGGGATGGGCTAGGGGA





 2046
AATAAGGCTTGCTGTTTGTTCTC






CTGGGGCGCTCCCTCCAACTTTT






GCAGATTCTTGCAACCTCCTCCT






GAGCCGGG





377
PLAC8
NM_00113071
  289-
CTGATATGAATGAATGCTGTCTG




5.1
  388
TGTGGAACAAGCGTCGCAATGA






GGACTCTCTACAGGACCCGATAT






GGCATCCCTGGATCTATTTGTGA






TGACTATAT





378
PLEKHG
NM_015432.3
 6365-
CCAGTTGTGGGTTAAGAATAGGC



4

 6464
TAGAGCAGACATTGGGTGTTTCC






ATGCTGTAGGCTGGTGGGGGACC






ATGTGCCTCTAGGCAGTGACTAG






GGTGCCCC





379
POLR2A
NM_000937.4
 6539-
CCCCTGCCTGTCCCCAAATTGAA





 6638
GATCCTTCCTTGCCTGTGGCTTG






ATGCGGGGCGGGTAAAGGGTAT






TTTAACTTAGGGGTAGTTCCTGC






TGTGAGTGG





380
PPP1R3
XM_927029.1
 4342-
CAGAACCTCCTCAGTTCCTTCAC



E

 4441
AGTGCAACCCTGTGTACTTGGCC






CGCAACCCAATAGTATTGTGCCT






CACTTCACCTTCCATGGGCAACT






GCCCTCCC





381
PPP2R5
NM_178588.1
  941-
ACAGCACCCTCACGGAACCAGT



C

 1040
GGTGATGGCACTTCTCAAATACT






GGCCAAAGACTCACAGTCCAAA






AGAAGTAATGTTCTTAAACGAAT






TAGAAGAGAT





382
PPP6C
NM_002721.4
 1536-
TTAAGAAATTTCAGCAGCAAAGT





 1635
TGTTATTCAGTGGGCACGATGGA






CTCCAAATGCCTCAAGTTATGTA






TACCTGTCCCAGATGTAAACTTC






ATTGTCCT





383
PRG2
NM_002728.4
  257-
CTCTGGAAGTGAAGATGCCTCCA





  356
AGAAAGATGGGGCTGTTGAGTCT






ATCTCAGTGCCAGATATGGTGGA






CAAAAACCTTACGTGTCCTGAGG






AAGAGGAC





384
PRPF3
NM_004698.2
 2116-
CCTACAGAGAACATGGCTCGTGA





 2215
GCATTTCAAAAAGCATGGGGCTG






AACACTACTGGGACCTTGCGCTG






AGTGAATCTGTGTTAGAGTCCAC






TGATTGAG





385
PRPF8
NM_006445.3
 7091-
ACTCTGCGGATCGGGAGGACCTG





 7190
TATGCCTGACCGTTTCCCTGCCT






CCTGCTTCAGCCTCCCGAGGCCG






AAGCCTCAGCCCCTCCAGACAGG






CCGCTGAC





386
C22orf
NM_173566.2
10495-
CCCGTTGAGCTGGCCATCTAGTG



30

10594
CAGTGTGCTCTCAGATTCCATGT






TTGTTGATTGTGTGTCTTCACAA






GCCCCTCTCTGGTGCTGAATTGG






ATTTGAAT





387
BAT2D1
NM_015172.3
 9620-
AGAACAGTGAGTACCTAGAACT





 9719
GTGCCACTAATTAAAGGAAATCC






TAAGAAGGTGCATTTCTTTACAG






AGCTGTGTCATGCCATCCTTTGG






GCCCTCTGC





388
PRRG4
NM_024081.5
  761-
GAAGACCTGAGGAGGCTGCCTT





  860
GTCTCCATTGCCGCCTTCTGTGG






AGGATGCAGGATTACCTTCTTAT






GAACAGGCAGTGGCGCTGACCA






GAAAACACAG





389
PSMA3
NM_152132.2
  422-
CTTTGGCTACAACATTCCACTAA





  521
AACATCTTGCAGACAGAGTGGCC






ATGTATGTGCATGCATATACACT






CTACAGTGCTGTTAGACCTTTTG






GCTGCAGT





390
PSMA4
NM_002789.3
  541-
GTACATTGGCTGGGATAAGCACT





  640
ATGGCTTTCAGCTCTATCAGAGT






GACCCTAGTGGAAATTACGGGG






GATGGAAGGCCACATGCATTGG






AAATAATAGC





391
PSMA4 
NM_002789.4
  879-
GAGGAAGAAGAAGCCAAAGCTG



b

  978
AGCGTGAGAAGAAAGAAAAAGA






ACAGAAAGAAAAGGATAAATAG






AATCAGAGATTTTATTACTCATT






TGGGGCACCAT





392
PSMA6
NM_002791.2
  218-
GGTCGGCTCTACCAAGTAGAATA





  317
TGCTTTTAAGGCTATTAACCAGG






GTGGCCTTACATCAGTAGCTGTC






AGAGGGAAAGACTGTGCAGTAA






TTGTCACAC





393
PSMA6 
NM_002791.2
  866-
GATGCTCACCTTGTTGCTCTAGC



b

  965
AGAGAGAGACTAAACATTGTCG






TTAGTTTACCAGATCCGTGATGC






CACTTACCTGTGTGTTTGGTAAC






AACAAACCA





394
PSMB1
NM_002793.3
  687-
GCGGCTGGTGAAAGATGTCTTCA





  786
TTTCTGCGGCTGAGAGAGATGTG






TACACTGGGGACGCACTCCGGAT






CTGCATAGTGACCAAAGAGGGC






ATCAGGGAG





395
PSMB7
NM_002799.2
  421-
GTTACATTGGTGCAGCCCTAGTT





  520
TTAGGGGGAGTAGATGTTACTGG






ACCTCACCTCTACAGCATCTATC






CTCATGGATCAACTGATAAGTTG






CCTTATGT





396
PSMB8
NM_004159.4
 1216-
ACTCACAGAGACAGCTATTCTGG





 1315
AGGCGTTGTCAATATGTACCACA






TGAAGGAAGATGGTTGGGTGAA






AGTAGAAAGTACAGATGTCAGT






GACCTGCTGC





397
PSMC1
NM_002802.2
 1487-
CATCCTGTGTCTTTTGGAGTACG





 1586
ATGTGTAAGTGCCCATTGGGTGG






CCTGTTGGTCACTGTGCAGCAGT






CTGCTTCCCAATAAAGCGTGCTC






TTTCACAA





398
PSMD7
NM_002811.4
 1231-
GAGCTCTCTGCCTCCGGTCACTC





 1330
TTGCTGTGGTGCTACGTGGAAGT






GAATGGAGACTGATCTCAAATCT






GAACTGCAGCTTTCGCTGCTGTG






AGTTGGGG





399
PSME3
NM_005789.3
 3203-
TCCCGAGTGATACCCATGAACTG





 3302
CCAGTAGAGGCTGCTATCGTTCC






ATGTGTAAGGAATGAACTGGTTC






AAGGCGCGTCCTACCCAGTCATT






TTCTTTAC





400
PTGDR
NM_000953.2
 2341-
TATGATGACTGAAAGGGAAAAG





 2440
TGGAGGAAACGCAGCTGCAACT






GAAGCGGAGACTCTAAACCCAG






CTTGCAGGTAAGAGCTTTCACCT






TTGGTAAAAGA





401
PTGDR2
NM_004778.1
 1836-
GCCAATGCTTACTGCGCTAGACG





 1935
CTTCATCCCACAATCTTAAGGGG






CAGCTTCTATTAGCCAGTCTTTA






CAGCTGAGCACATTCTGGCTCAG






GGAGGTTA





402
PUM1
NM_00102065
 3753-
AAATGTTCTAGTGTAGAGTCTGA




8.1
 3852
GACGGGCAAGTGGTTGCTCCAG






GATTACTCCCTCCTCCAAAAAAG






GAATCAAATCCACGAGTGGAAA






AGCCTTTGTA





403
QTRTD1
NM_024638.3
 2508-
TTAGATTAGAGTCATAGCCTTAA





 2607
TAGCCCTAGTTGTCATCCTGGGA






GACAGGCAACAGTAGAGATATT






TGAGAGCCTAAAGAGAGGTTTG






GCCTGTGGGT





404
RAB10
NM_016131.4
 3593-
AGGGCTTTGCCCCTTTTCTGTAA





 3692
GTCTCTTGGGATCCTGTGTAGAA






GCTGTTCTCATTAAACACCAAAC






AGTTAAGTCCATTCTCTGGTACT






AGCTACAA





405
RAG1
NM_000448.2
 2301-
CAGTCTACATTTGTACTCTTTGT





 2400
GATGCCACCCGTCTGGAAGCCTC






TCAAAATCTTGTCTTCCACTCTA






TAACCAGAAGCCATGCTGAGAAC






CTGGAACG





406
RASSF5
NM_182664.2
 3061-
TCGTCCTGCATGTCTCTAACATT





 3160
AATAGAAGGCATGGCTCCTGCTG






CAACCGCTGTGAATGCTGCTGAG






AACCTCCCTCTATGGGGATGGCT






ATTTTATT





407
RBM14
NM_006328.3
 2661-
TGGTATGTATCCAAGTCCCTGCT





 2760
GACCACTAATGTTCTAGCTGATG






GTGAGCGGCACAGTCCCACTTCC






CCATCTCCCCAAGTAGGTGGTGT






TAGAAAAC





408
RBM4B
NM_031492.3
 1557-
TAGGAGTTGAATCCTTCTCCCTG





 1656
CCTACCTGCAGCATCTCCTTTCC






CTTTAAAATGACCATGTAGTGGC






AAGCAGCCTTTTACTCTTCTGTT






AGCTCTGG





409
RBX1
NM_014248.3
  158-
GATATTGTGGTTGATAACTGTGC





  257
CATCTGCAGGAACCACATTATGG






ATCTTTGCATAGAATGTCAAGCT






AACCAGGCGTCCGCTACTTCAGA






AGAGTGTA





410
RELA
NM_021975.2
  361-
GATGGCTTCTATGAGGCTGAGCT





  460
CTGCCCGGACCGCTGCATCCACA






GTTTCCAGAACCTGGGAATCCAG






TGTGTGAAGAAGCGGGACCTGG






AGCAGGCTA





411
REPIN1
NM_014374.3
 2491-
TGTGTCCAGGCTCTTGTCTGAAC





 2590
ACCGCAGCCCCTCCTTCGCTCCT






TCCAGAGCTCAGCATGTCACGGC






AAGGACTGCCGCATTGGTGATGG






AGGGCCAG





412
REPS1
NM_00112861
 1289-
CACCAACCAGTACTCTTTTAACC




7.2
 1388
ATGCATCCTGCTTCTGTCCAGGA






CCAGACAACAGTACGAACTGTA






GCATCAGCTACAACTGCCATTGA






AATTCGTAG





413
RERE
NM_00104268
 5916-
AACCCTCGACCCGAAACCCTCAC




2.1
 6015
CAGATAAACTACAGTTTGTTTAG






GAGGCCCTGACCTTCATGGTGTC






TTTGAAGCCCAACCACTCGGTTT





414
RERE 
NM_012102.3
 7734-
GCATTCTTGTTAGCTTTGCTTTT



b

 7833
CTCCCCATATCCCAAGGCGAAGC






GCTGAGATTCTTCCATCTAAAAA






ACCCTCGACCCGAAACCCTCACC






AGATAAAC





415
RFWD2
NM_022457.6
 2606-
TTTTCTTTTCCCTCCTTTATGAC





 2705
CTTTGGGACATTGGGAATACCCA






GCCAACTCTCCACCATCAATGTA






ACTCCATGGACATTGCTGCTCTT






GGTGGTGT





416
RFX1
NM_002918.4
 4187-
ATAAAAATCACTATTTTGTGTGC





 4286
TCCGCGTGCTATAGCTTTTGGGG






CGGCCCTGCCCAGTCCCCGTGCC






CACGGGGCTCCCTCTCCCGGTGG






TGAAAGTG





417
RHOB
NM_004040.3
 1707-
GGGAGGAGGGAGGATGCGCTGT





 1806
GGGGTTGTTTTTGCCATAAGCGA






ACTTTGTGCCTGTCCTAGAAGTG






AAAATTGTTCAGTCCAAGAAACT






GATGTTATT





418
RHOG
NM_001665.3
 1045-
CTTTCCACACAGTTGTTGCTGCC





 1144
TATTGTGGTGCCGCCTCAGGTTA






GGGGCTCTCAGCCATCTCTAACC






TCTGCCCTCGCTGCTCTTGGAAT






TGCGCCCC





419
RHOU
NM_021205.5
 4174-
TTGACAGACTCAAGAGAAACTA





 4273
CCCAGGTATTACACAAGCCAAA






ATGGGAGCAAGGCCTTCTCTCCA






GACTATCGTAACCTGGTGCCTTA






CCAAGTTGTG





420
RNASE2
NM_002934.2
  331-
TGACCTGTCCTAGTAACAAAACT





  430
CGCAAAAATTGTCACCACAGTGG






AAGCCAGGTGCCTTTAATCCACT






GTAACCTCACAACTCCAAGTCCA






CAGAATAT





421
RNF114
NM_018683.3
 2246-
AATTCAGATCATCTCAGAAGTCT





 2345
GGAGGGAAATCTGGCGAAACCT






TCGTTTGAGGGACTGATGTGAGT






GTATGTCCACCTCACTGGTGGCA






CCGAGAAAC





422
RNF19B
NM_153341.3
 2222-
CCCCAGAGCCCAAGGTGCACCG





 2321
AGCCCAAGTGCCCATATGAACCT






CTCTGCCCTAGCCGAGGGACAAA






CTGTCTTGAAGCCAGAAGGTGGA






GAAGCCAGA





423
RNF214
NM_207343.3
 2068-
ACCTGTAAGCTATGTCTAATGTG





 2167
CCAGAAACTCGTCCAGCCCAGTG






AGCTGCATCCAATGGCGTGTACC






CATGTATTGCACAAGGAGTGTAT






CAAATTCT





424
RNF34
NM_025126.3
 1619-
CTTCTGTCCTCTTTGGATGAGAT





 1718
CAGTGTCCACAAGTGGCCGACAT






GGAACATGCTGAGCAGTGGCTCC






TCTGAATGTTCACTTTATTAGTC






ATGTATAT





425
C20orf
NM_080748.2
  274-
CTCAGGATCGGAATGCGGGGTC



52

  373
GAGAGCTGATGGGCGGCATTGG






GAAAACCATGATGCAGAGTGGC






GGCACCTTTGGCACATTCATGGC






CATTGGGATGG





426
RPL26
NM_016093.2
    4-
CACTCAGGGTCTGAGGCAGCTAG



L1

  103
TAGCCGGAGGGTCACCATGAAG






TTCAATCCCTTCGTTACCTCGGA






CCGCAGTAAAAACCGCAAACGT






CACTTCAATG





427
RPL3
NM_00103385
 1072-
AGAAGAAAGCATTCATGGGACC




3.1
 1171
ACTGAAGAAAGACCGAATTGCA






AAGGAAGAAGGAGCTTAATGCC






AGGAACAGATTTTGCAGTTGGTG






GGGTCTCAATA





428
RPL31
NM_000993.4
   20-
CTTGCAACTGCGGCTTTCCTTCT





  119
CCCACAATCCTTCGCGCTCTTCC






TTTCCAACTTGGACGCTGCAGAA






TGGCTCCCGCAAAGAAGGGTGGC






GAGAAGAA





429
RPL34
NM_000995.3
  471-
ACCTCACCTCAGCTTGAGAGAGC





  570
CAGTTGTGTGCATCTCTTTCCAG






TTTTGCATCCAGTGACGTCTGCT






TGGCATCTTGAGATTGTTATGGT






GAGAGTAT





430
RPL39L
NM_052969.1
  139-
GCGGGTTCGGGTCGGTGACACGC





  238
AGACCTGAGGGAGCTGGGCCCG






CCTTTTCCGCCCGCGCCCCAGGC






CCTTGCAGATCGAGATTTGCGTC






CTAGAGTGG





431
KIAA0
NM_015203.4
 4795-
CCCCTTGGGTCCCTCACACAGAG



460

 4894
ACACCATCAGCCGGAGTGGTATA






ATCTTACGGAGTCCCCGGCCAGA






CTTTCGGCCTAGGGAACCTTTTC






TCAGCAGA





432
RPS24
NM_001026.4
  482-
ATGAAGAAAGTCAGGGGGACTG





  581
CAAAGGCCAATGTTGGTGCTGGC






AAAAAGCCGAAGGAGTAAAGGT






GCTGCAATGATGTTAGCTGTGGC






CACTGTGGAT





433
RPS27L
NM_015920.3
  241-
TAAAATGTCCAGGTTGCTACAAG





  340
ATCACCACGGTTTTCAGCCATGC






TCAGACAGTGGTTCTTTGTGTAG






GTTGTTCAACAGTGTTGTGCCAG






CCTACAGG





434
RPS6
NM_001010.2
  172-
GAATGGAAGGGTTATGTGGTCCG





  271
AATCAGTGGTGGGAACGACAAA






CAAGGTTTCCCCATGAAGCAGGG






TGTCTTGACCCATGGCCGTGTCC






GCCTGCTAC





435
RSL24D
NM_016304.2
 1232-
TGGAGTGACACTACACTCTAGAA



1

 1331
TTTCCACTTTGGAGAATACTCAG






TTCCAACTTGTGATTCCTGATAG






AACAGACTTTACTTTTCTAGCCC






AGCATTGA





436
RWDD1
NM_00100746
  998-
TGGAGGATGATGAAGATGATCC




4.2
 1097
AGACTATAATCCTGCTGACCCAG






AGAGTGACTCAGCTGACTAATGG






ACTGTCCCCATCTGCAGAGAGGC






TTGACTGCC





437
RXRA
NM_002957.5
 5301-
AGTAATTTTTAAAGCCTTGCTCT





 5400
GTTGTGTCCTGTTGCCGGCTCTG






GCCTTCCTGTGACTGACTGTGAA






GTGGCTTCTCCGTACGATTGTCT






CTGAAACA





438
S100A 
NM_005621.1
  261-
CAAGATGAACAGGTCGACTTTCA



12 b

  360
AGAATTCATATCCCTGGTAGCCA






TTGCGCTGAAGGCTGCCCATTAC






CACACCCACAAAGAGTAGGTAG






CTCTCTGAA





439
S100A8
NM_002964.4
  366-
GTTAACTTCCAGGAGTTCCTCAT





  465
TCTGGTGATAAAGATGGGCGTGG






CAGCCCACAAAAAAAGCCATGA






AGAAAGCCACAAAGAGTAGCTG






AGTTACTGGG





440
SAMSN1
NM_022136.3
 1024-
ACCTGAGCCCCTATCCTTGAGCT





 1123
CAGACATCTCCTTAAATAAGTCA






CAGTTAGATGACTGCCCAAGGG






ACTCTGGTTGCTATATCTCATCA






GGAAATTCA





441
SAP130 
NM_024545.3
 3091-
GATCTCCACCGAATAAACGAACT



b

 3190
GATACAGGGAAATATGCAGAGG






TGTAAACTTGTGATGGATCAAAT






CAGTGAAGCCAGAGACTCCATG






CTTAAGGTTT





442
SAP130
NM_024545.3
 3720-
CGGTTCTTCTGCCTGACCTTCAA





 3819
ATGCCCATGTTGGCCTTTTACAG






CAGTGCCACGGCACCAAGCGAG






CTGCCACATCTCACACTCTAAAG






GGTTTGAAC





443
CIP29
NM_033082.3
  622-
AACTGGAACCACAGAGGATACA





  721
GAGGCAAAGAAGAGGAAAAGAG






CAGAGCGCTTTGGGATTGCCTGA






TGAAAAGTTCCTGATACTTTCTG






TTCTCCAGTG





444
SFRS
NM_004719.2
 4203-
AGTTCTTCTCATGTAAGTAATAA



2IP

 4302
CATGAGTACACCAGTTTTGCCTG






CTCCGACAGCAGCCCCAGGAAA






TACGGGAATGGTTCAGGGACCA






AGTTCTGGTA





445
SFRS15
NM_020706.2
 3635-
GAGAGAAGGAAGAAGCCCGAGG





 3734
AAAGGAAAAGCCTGAGGTGACA






GACAGGGCAGGTGGTAACAAAA






CCGTTGAACCTCCCATTAGCCAA






GTGGGAAATGT





446
RBM16
NM_014892.4
 4111-
TGATTATTTTGAAGGGGCCACTT





 4210
CTCAACGAAAAGGTGATAATGT






GCCTCAGGTTAATGGTGAAAATA






CAGAGAGACATGCTCAGCCACC






ACCTATACCA





447
SDHA
NM_004168.3
 2042-
GTCACTCTGGAATATAGACCCGT





 2141
GATCGACAAAACTTTGAACGAG






GCTGACTGTGCCACCGTCCCGCC






AGCCATTCGCTCCTACTGATGAG






ACAAGATGT





448
SEC24C
NM_198597.2
 4194-
AGGCAGAGGCAGCTGGAGCGCC





 4293
GTTCTCTCCTGCTGGGACACCGC






TTGGGCTTTGGTATTGACTGAGT






GGCTGACAGTTATCTTCCAACCC






CAACTGGCT





449
SEMG1
NM_003007.2
 1291-
GGCAGACACCAACATGGATCTC





 1390
ATGGGGGATTGGATATTGTAATT






ATAGAGCAGGAAGATGACAGTG






ATCGTCATTTGGCACAACATCTT






AACAACGACC





450
SERPIN
NM_005024.1
  891-
AGACAGTTATGATCTCAAGTCAA



B10

  990
CCCTGAGCAGTATGGGGATGAGT






GATGCCTTCAGCCAAAGCAAAG






CTGATTTCTCAGGAATGTCTTCA






GCAAGAAAC





451
SETD2
NM_014159.6
 7956-
TGGTTAGAAGCCATCAGAGGTGC





 8055
AAGGGCTTAGAAAAGACCCTGG






CCAGACCTGACTCCACTCTTAAA






CCTGGGTCTTCTCCTTGGCGGTG






CTGTCAGCG





452
SFMBT1
NM_00100515
 2844-
AAGGATCGAAGTTGCTGAAAGG




8.2
 2943
CTTCACCTGGACAGTAACCCCTT






GAAGTGGAGTGTGGCAGACGTT






GTGCGGTTCATCAGATCCACTGA






CTGTGCTCCA





453
SFPQ
NM_005066.2
 2800-
GGTTATGTAAGCAAAGCTGAACT





 2899
GTAAATCTTCAGGAATATGTATT






AAGATTGTGGAATGGGTGTAAG






ACAATTGGTAGGGGGTGAAAGT






GGGTTTGATT





454
SGK1
NM _05627.3
 1622-
ACGAGCGTTAGAGTGCCGCCTTA





 1721
GACGGAGGCAGGAGTTTCGTTA






GAAAGCGGACGCTGTTCTAAAA






AAGGTCTCCTGCAGATCTGTCTG






GGCTGTGATG





455
SGK
NM_005627.3
  173-
GAAGCAGAGGAGGATGGGTCTG





  272
AACGACTTTATTCAGAAGATTGC






CAATAACTCCTATGCATGCAAAC






ACCCTGAAGTTCAGTCCATCTTG






AAGATCTCC





456
SGK1 b
NM_005627.3
 1814-
GGATATGCTGTGTGAACCGTCGT





 1913
GTGAGTGTGGTATGCCTGATCAC






AGATGGATTTTGTTATAAGCATC






AATGTGACACTTGCAGGACACTA






CAACGTGG





457
SH2D3C
NM_170600.2
 2795-
AGCACCCCAAGGACACTGTGATC





 2894
AACCCGAGAATGTTCTGGGTTCA






ACTCAAGCATCTCCCTTGCACCT






CCAGGGTCCTGCGTGGACTCTGG






GTTCCATC





458
SIK1
NM_173354.3
 4185-
TCGCTCATAAAGAAGTTTTTGGG





 4284
ATGGGAGAGAATCCAGACCATC






TTGGGGCAGCCAGGCCCTTGCCT






TCATTTTTACAGAGGTAGCACAA






CTGATTCCA





459
SIN3A
NM_015477.2
 4666-
TTTATTCCTGACGATTCCCTTGC





 4765
TGCCTACCCTTTTCTCTCCTCTG






GTTCTCAACCTCAACGAGTTCAA






ATCAGTTGTCCTTTTTAGCTCCC






GTGGAACT





460
SLAMF8
NM_020125.2
 3173-
AACAAATATTGATTGAGGGCGCT





 3272
GCATGTGCTGGGTACATTTCTTG






GCACTTGGGAATCAGTAGTCAAG






CGAAACCCTTGCCTTTGAGAGTT






TATGGTCT





461
SLC11A
NM_000578.3
 2072-
GCAGGATAGAGTGGGACAGTTC



1

 2171
CTGAGACCAGCCAACCTGGGGG






CTTTAGGGACCTGCTGTTTCCTA






GCGCAGCCATGTGATTACCCTCT






GGGTCTCAGT





462
SLC15A
NM_021082.3
 2548-
AACTCATTAAAACTTGTGCAGTG



2

 2647
TTGCTGGAGCTGGCCTGGTGTCT






CCAAATGACCATGAAAATACAC






ACGTATAATGGAGATCATTCTCT






GTGGGTATG





463
SLC25A
NM_000387.5
 1511-
ATCTTCTTCAGTCCCTAGCCAGG



20

 1610
AATACCCATTTGATTTCCAGGGT






GCCATCTAATCCTGGGCTGTACA






TGTGGATATGGACTTGAGGCCCA






CCTCTGTG





464
SLC25A
NM_016612.2
 1217-
TCCAGCCCCTTGCCCTCTCCTCA



37

 1316
CACGTAGATCATTTTTTTTTTGC






AGGGTGCTGCCTATGGGCCCTCT






GCTCCCCAATGCCTTAGAGAGAG






GAGGGGAC





465
SLC45A
NM_033102.2
 2455-
AGTTTCTAGGATGAAACACTCCT



3

 2554
CCATGGGATTTGAACATATGAAA






GTTATTTGTAGGGGAAGAGTCCT






GAGGGGCAACACACAAGAACCA






GGTCCCCTC





466
SLC6A
NM_003044.4
 3220-
GATATTGCTAACTGATCACAGAT



12

 3319
TCTTTCCCACCTCACAATCCTTC






CGAATGTGCTCCAGGCAGCACCA






TTTGCCATCCTGCTTCTAACGCA






AACCCCTG





467
SLC6A6
NM_003043.5
 4438-
ATTCTAGACCAAAGACACAGGC





 4537
AGACCAAGTCCCCAGGCCCCGCC






TGGAAGGAAGTCGTTCCTCAACT






CTCCCCAAGGCACCTGTCTCCAA






TCAGAGCCC





468
SLC9A3
NM_004252.3
 1811-
ATTAACATGATTTTCCTGGTTGT



R1

 1910
TACATCCAGGGCATGGCAGTGGC






CTCAGCCTTAAACTTTTGTTCCT






ACTCCCACCCTCAGCGAACTGGG






CAGCACGG





469
C14orf
NM_031210.5
   46-
CGGCCTCAGCAGCGAGAGGTGC



156

  145
TGCGGCGCTGCGTAGAAGTATCA






ATCAGCCGGTTGCTTTTGTGAGA






AGAATTCCTTGGACTGCGGCGTC






GAGTCAGCT





470
SMARCC
NM_003074.3
 5281-
CAATGGCCAGGGTTTTACCTACT



1

 5380
TCCTGCCAGTCTTTCCCAAAGGA






AACTCATTCCAAATACTTCTTTT






TTCCCCTGGAGTCCGAGAAGGAA






AATGGAAT





471
SNORA
NR_002984.1
   30-
CTCGTGGGACTCTAGAGGGAGTC



56

  129
AGTCTGCAACAGTAAGTGGTGA






GTTCTTCTGTCCAGCGTCAGTAT






TTTGATGGTGGCTTTAGACTTGC






CAGATAACA





472
SNX11
NM_152244.1
 2261-
CCCTCCCTGTCGCCCACTCCTCC





 2360
CTCCTCTGGCTATCCTACCCTGT






CTGTGGGCTCTTTTACTACCAGC






CTATGCTGTGGGACTGTCATGGC






ATTTAGTT





473
SOCS1
NM_003745.1
 1026-
TTAACTGTATCTGGAGCCAGGAC





 1125
CTGAACTCGCACCTCCTACCTCT






TCATGTTTACATATACCCAGTAT






CTTTGCACAAACCAGGGGTTGGG






GGAGGGTC





474
SP2
NM_003110.5
 2701-
GGGGGCAATGATGAGCATATGA





 2800
ATTTTTTCTCACTCTAGCAATTC






CCTTTTCTAAATGACACAGCATT






TAAACTCAAATCTGGATTCAGAT






AACAGCACC





475
SPA17
NM_017425.3
  176-
CAAGGATTTGGGAATCTTCTTGA





  275
AGGGCTGACACGCGAGATTCTG






AGAGAGCAACCGGACAATATAC






CAGCTTTTGCAGCAGCCTATTTT






GAGAGCCTTC





476
SPEN
NM_015001.2
11995-
GTATTGCCCACTCATTTGTATAA





12094
GTGCGCTTCGGTACAGCACGGGT






CCTGCTCCCGCGATGTGGAAGTG






TCACACGGCACCTGTACAAAAA






GACTGGCTA





477
SPINK5
NM_006846.3
 2596-
GAGCAATGACAAAGAGGATCTG





 2695
TGTCGTGAATTTCGAAGCATGCA






GAGAAATGGAAAGCTTATCTGC






ACCAGAGAAAATAACCCTGTTCG






AGGCCCATAT





478
SPN
NM_003123.3
 2346-
AGTGCCTGCGTGTGTCCACTCGT





 2445
GGGTGTGGTTTGTGTGCAAGAGC






TGAGGATTTGGCGATGCTTGGGA






GGGGTAGTTGTGGGTACAGACG






GTGTGGGGG





479
SREBF1
NM_00100529
 3985-
CCCCTCCTTGCTCTGCAGGCACC




1.2
 4084
TTAGTGGCTTTTTTCCTCCTGTG






TACAGGGAAGAGAGGGGTACATT






TCCCTGTGCTGACGGAAGCCAAC






TTGGCTTT





480
SFRS4
NM_005626.4
 2080-
TACTCATGGCCCACAGTAGAATA





 2179
TCCAAAACGCCTTGGCTTTCAGG






CCTGGCCTTTCCTACAGGGAGCT






CAGTAACCTGGACGGCTCTAAGG






CTGGAATG





481
ST6GAL
NM_003032.2
 3783-
CTGATTTTAATCTTCGAATCATG



1

 3882
ACACTGAGTGCAGAGGAGGTGG






CATTCCGACAGCAGGACATACAT






GTTGGTGTGAAGACTGGGACGA






CACTGGGTAG





482
STAG3
NM_012447.3
 3424-
AAGTGCCTGCAGCATGTCTCCCA





 3523
GGCACCTGGCCATCCCTGGGGCC






CAGTCACCACCTACTGCCACTCC






CTCAGCCCTGTGGAGAACACAGC






AGAGACCA





483
STAMBP
NM_006463.4
 1926-
TTTCCTGTGGTTTATGGCAATAT





 2025
GAATGGAGCTTATTACTGGGGTG






AGGGACAGCTTACTCCATTTGAC






CAGATTGTTTGGCTAACACATCC






CGAAGAAT





484
STAT6
NM_003153.4
 3725-
ACTGTGCCCAAGTGGGTCCAAGT





 3824
GGCTGTGACATCTACGTATGGCT






CCACACCTCCAATGCTGCCTGGG






AGCCAGGGTGAGAGTCTGGGTC






CAGGCCTGG





485
STIP1
NM_006819.2
 1906-
CCCGGGGAAGACACAGAGACTC





 2005
GTACCTGCGCTGTTTGTGCCGCC






GCTGCCTCTGGGCCCTCCCAGCA






CACGCATGGTCTCTTCACCGCTG






CCCTCGAGT





486
STK16
NM_003691.2
 1420-
GGGGTAGCGGGGTCAGGACAAT





 1519
CATCTCAGTCCTGCATCTTTTCT






TCTGCTTTCTTCCCTCCAAGAGC






AAAACCTGGGCAAGGGGACTTAC






TGAGTGGGG





487
STK38
NM_007271.3
 3269-
TTGTCAGTGAAACTACTTTGGAT





 3368
TTTAACCTCTTAGAGGAAGAAAA






AAGGTTAGGGAAGTGTCAACTCT






GGATGAAGGTGATGTGTTTGCCT






CTCAGTCT





488
STOM
NM_004099.5
 2953-
TTCTGCCTTGTGAATTCGTAGTC





 3052
CAATCAGCTGAAATTAAATCACT






TGGGAGGGACGCATAGAAGGAG






CTCTAGGAACACAGTGCCAGTGC






AGAAGTTTC





489
SYNJ1
NM_003895.3
 4746-
CCCTCTGCTCCCGCCCGGCACCA





 4845
GCCCTCCAGTAGATCCTTTCACG






ACCTTGGCCTCTAAGGCTTCACC






CACACTGGACTTTACAGAAAGAT






AACGCCAT





490
TAPBP
NM_003190.4
 3397-
CTTGCCCTCCCTGGGTCGCAGAC





 3496
GAGGTCGGCCTCGTCATTCCCCG






CAGACCGCCGCGCGTCCCTCTTG






TGCGGTTCACCACAGTTGTATTT






AAGTGATC





491
TAX1BP
NM_00107986
 2081-
CAGCCAGCCTGCTCGAAACTTTA



1
4.2
 2180
GTCGGCCTGATGGCTTAGAGGAC






TCTGAGGATAGCAAAGAAGATG






AGAATGTGCCTACTGCTCCTGAT






CCTCCAAGT





492
TBC1D
NM_015188.1
 5451-
TTCCAAGGAATGCACTAAGCCTT



12

 5550
CAGTCTTTTTAGACTGACAGTAC






TGGCAGCTAAAATATTGTACTGT






ATCTTCTCTTGAGCCCAGTATGT






AGGAAATA





493
TBCE
NM_00107951
 1541-
TATGCTGAAAAACCAGCTACTAA




5.2
 1640
CACTGAAGATAAAATACCCTCAT






CAACTTGATCAGAAAGTCCTGGA






GAAACAACTGCCGGGCTCCATG






ACAATTCAA





494
TBK1
NM_013254.2
 1611-
ACCAGTCTTCAGGATATCGACAG





 1710
CAGATTATCTCCAGGTGGATCAC






TGGCAGACGCATGGGCACATCA






AGAAGGCACTCATCCGAAAGAC






AGAAATGTAG





495
TBP
NM_003194.4
 1441-
TGTAAGTGCCCACCGCGGGATGC





 1540
CGGGAAGGGGCATTATTTGTGCA






CTGAGAACACCGCGCAGCGTGA






CTGTGAGTTGCTCATACCGTGCT






GCTATCTGG





496
TCF20
NM_181492.2
 6765-
CCAGGCCTGTGTTGCCAGAGCTG





 6864
GCAGTGTGAGCTGTAGGCAGGG






ACGGGGAGGGACTGTCGCTGTG






ATCAGAGTGGGTTAAGCTGACCA






GGAACACCCA





497
TCF7L2
NM_030756.4
 2067-
GGCCCACCTGTCCATGATGCCTC





 2166
CGCCACCCGCCCTCCTGCTCGCT






GAGGCCACCCACAAGGCCTCCG






CCCTCTGTCCCAACGGGGCCCTG






GACCTGCCC





498
TCP1
NM_030752.2
  254-
GTGTTCGGTGACCGCAGCACTGG





  353
GGAAACGATCCGCTCCCAAAAC






GTTATGGCTGCAGCTTCGATTGC






CAATATTGTAAAAAGTTCTCTTG






GTCCAGTTG





499
TFCP2
NM_005653.4
 2271-
CCTCTGAAAACGGCCCTCTTGAA





 2370
GGGGGATATGAATGGAGATTTG






AAGGTCTGCAAGAACCTGACTCG






TCTGACTGTGTGTGGAGGAGTCC






AGGCCATGG





500
TGIF1
NM_003244.2
 1041-
ACCTCAACCAGGACTTCAGTGGA





 1140
TTTCAGCTTCTAGTGGATGTTGC






ACTCAAACGGGCTGCAGAGATG






GAGCTTCAGGCAAAACTTACAGC






TTAACCCAT





501
TGIF1 
NM_173208.1
  691-
CCCCGGGATCAGTTTTGGCTCGT



b

  790
CCATCAGTGATCTGCCATACCAC






TGTGACTGCATTGAAAGATGTCC






CTTTCTCTCTCTGCCAGTCGGTC






GGTGTGGG





502
TIAM1
NM_003253.2
 5293-
CCTAACTCTGCCCACCCTCCTGT





 5392
ACCGTCGACAAGAATGTCCCCTT






AGGTCGCGCTCTTGCACACACGG






TTTTGGCAGCTGACTTGGTTCTG






AAGCCATG





503
TIMM8B
ENST0000050
  339-
GAATGACAGAAGCAAAGGACTT




4148.1
  438
GTTACTAAGCAGATTTAAGGGTC






AGTGGGGGAAGGCTATCAACCC






ATTGTCAGATCAGCATCAGGCTG






TTATCAAGTC





504
TM2D2
NM_078473.2
 2970-
ACCCATCATCCATCTGCCCACAA





 3069
ACCTGGCCAAATGTGATACAACC






TGAAAACCTGATGGACTAAAGG






AGTACTATTTAACAATTGATTGC






CTTTGCACT





505
TM9SF1
NM_006405.6
 1996-
CGCTGGTGGTGGCGATCTGTGCT





 2095
GAGTGTTGGCTCCACCGGCCTCT






TCATCTTCCTCTACTCAGTTTTC






TATTATGCCCGGCGCTCCAACAT






GTCTGGGG





506
CCDC72
NM_015933.4
  124-
GAGGAGCAGAAGAAACTCGAGG





  223
AGCTAAAAGCGAAGGCCGCGGG






GAAGGGGCCCTTGGCCACAGGT






GGAATTAAGAAATCTGGCAAAA






AGTAAGCTGTTC





507
TMBIM6
NM_003217.2
 2282-
CTCTCCCTATTCACAACCAGTGC





 2381
ACAGTTTGACACAGTGGCCTCAG






GTTCACAGTGCACCATGTCACTG






TGCTATCCTACGAAATCATTTGT






TTCTAAGT





508
TMC8
NM_152468.4
 2238-
AGGCCAATGCCAGGGCCATCCA





 2337
CAGGCTCCGGAAGCAGCTGGTGT






GGCAGGTTCAGGAGAAGTGGCA






CCTGGTGGAGGACCTGTCGCGAC






TGCTGCCGGA





509
TMCO1
NM_019026.3
  992-
TCATTTACATAAGTATTTTCTGT





 1091
GGGACCGACTCTCAAGGCACTGT






GTATGCCCTGCAAGTTGGCTGTC






TATGAGCATTTAGAGATTTAGAA






GAAAAATT





510
TMEM
NM_00110082
 7652-
AGGAGAATAAATGTTGGAGGGG



170B
9.2
 7751
TAATACACAAAAACAAAGGCAT






ATTTGATGAAGTACCCTGTGTTA






TGTGAACACAATTTCCCCTTCTG






TTAAGACTAT





511
TMEM
NM_00108054
 1313-
GCTCTGTGAAGGCAATGAGTGTC



218
6.2
 1412
ACTTCCCTCTGCTCTAATAAAGC






AATAAATAATAGCTAAAGGGCT






GACTTTCACTTCGAACTCTTGGC






CACGGCTTT





512
TMEM70
NM_017866.5
 1952-
GGTGGTTAGCTATACGGGAAATG





 2051
GTAAGTAGTGTTGTCTTCAGTAT






CTTAATTTGTTTCTGCAACTGTG






CACTCCTCCCTTGGTGGCACCCT






ATGGGTGT





513
TMSB4X
NM_021109.3
  286-
TTAACTTTGTAAGATGCAAAGAG





  385
GTTGGATCAAGTTTAAATGACTG






TGCTGCCCCTTTCACATCAAAGA






ACTACTGACAACGAAGGCCGCG






CCTGCCTTT





514
TNFRSF
NM_001561.5
 1848-
GCCTGGAGGAAGTTTTGGAAAG



9

 1947
AGTTCAAGTGTCTGTATATCCTA






TGGTCTTCTCCATCCTCACACCT






TCTGCCTTTGTCCTGCTCCCTTT






TAAGCCAGG





515
TNFSF
NM_003808.3
  811-
AGTCAGAGAGCCGGCACTCTCA



13

  910
GTTGCCCTCTGGTTGAGTTGGGG






GGCAGCTCTGGGGGCCGTGGCTT






GTGCCATGGCTCTGCTGACCCAA






CAAACAGAG





516
TNFSF8
NM_001244.3
  519-
CCCTCAAAGGAGGAAATTGCTCA





  618
GAAGACCTCTTATGTATCCTGAA






AAGGGCTCCATTCAAGAAGTCAT






GGGCCTACCTCCAAGTGGCAAA






GCATCTAAA





517
TOMM7
NM_019059.2
  251-
TCTGGCTCGGATAAGAGATGGG





  350
ACATCATTCAGTCACTAGTTGGA






TGGCACAAGGCTCTTCACAGACG






CATCTGTAGCAGAGTGGATCTTG






TACTAACTT





518
TP53BP
NM_005657.2
 5591-
TACTTCCTGTGCCTTGCCAGTGG



1

 5690
GATTCCTTGTGTGTCTCATGTCT






GGGTCCATGATAGTTGCCATGCC






AACCAGCTCCAGAACTACCGTAA






TTATCTGT





519
TPR
NM_003292.2
 7194-
TCTCCCCTCCACCAGCCAGGATC





 7293
CTCCTTCTAGCTCATCTGTAGAT






ACTAGTAGTAGTCAACCAAAGCC






TTTCAGACGAGTAAGACTTCAGA






CAACATTG





520
TPT1
NM_003295.3
   18-
GCCTGCGTCGCTTCCGGAGGCGC





  117
AGCGGGCGATGACGTAGAGGGA






CGTGCCCTCTATATGAGGTTGGG






GAGCGGCTGAGTCGGCCTTTTCC






GCCCGCTCC





521
TRAF
NM_147686.3
 2449-
GCCAGTGTCCCATATGTTCCTCC



3IP2

 2548
TGACAGTTTGATGTGTCCATTCT






GGGCCTCTCAGTGCTTAGCAAGT






AGATAATGTAAGGGATGTGGCA






GCAAATGGA





522
TRAF6
NM_145803.1
 1840-
CACCCGCTTTGACATGGGTAGCC





 1939
TTCGGAGGGAGGGTTTTCAGCCA






CGAAGTACTGATGCAGGGGTAT






AGCTTGCCCTCACTTGCTCAAAA






ACAACTACC





523
LBA1
NM_014831.2
10132-
CTGGGAAACCTTCATGCCTCTCT





10231
GATGGTTACTGCCCACCCTTACC






CCACCCCTCAGCTCAGCCTGGTA






TGGAAAGCAAGGTGCACGTTGG






TCTTTGATT





524
TRIM21
NM_003141.3
 1637- 
TCTGCAGAGGCATCCGGATCCCA





 1736
GCAAGCGAGCTTTAGCAGGGAA






GTCACTTCACCATCAACATTCCT






GCCCCAGATGGCTTTGTGATTCC






CTCCAGTGA





525
TRIM32
NM_012210.3
 2681-
GTGCTACCAAAGGGGATACACA





 2780
AGCCCTTTAGGAAGCAGTACCTC






TCGCCTGGAGGATCTGTGCCATC






TTGGATTGAGAATTGCAGATGTG






ACAGAATGG





526
TRIM39
NM_021253.3
 3141-
CTGCTATTCGGGTAATCTTCACA





 3240
GAAATGACTGAGAGAAGAATCT






GCAGTTTACTGAGGGCATTTCAG






TTCCTCCTACCACCTCAACAGGA






CTTTGTCCA





527
TRIM39 
NM_172016.2
 2841-
CTCTATACCAATAAGTCAGTCAC



b

 2940
CTTGCTCCTCTCCAGAGGCAAAG






TGGAAGAGATCCTGCAAGACAC






ATCTATCCTTTCACAGTGTTCCC






AAGGGAACT





528
TRRAP
NM_003496.3
12169-
AGTTGATGAACCCATCATGCTGG





12268
TTTTTCTCTGAGCACAAAGTTTT






AGGCTGTACACAGCCAGCCTTGG






GAATCTCGTTGAGCGTTCGGCGT






GGATCCAC





529
TSC1
NM_000368.4
 8068-
CCCCAGACCAACCCTTCCCTCCC





 8167
TTTCCCCACCTCTTACAGTGTTT






GGACAGGAGGGTATGGTGCTGCT






CTGTGTAGCAAGTACTTTGGCTT






ATGAAAGA





530
TTC9
NM_015351.1
 4050-
TACTAATCAGGCATCTGACCTGC





 4149
ACTGTCATCCCCTGCCTGGACTT






TTGCGATGGACTCTTTGGGGGAA






AAACTAACGCTTTTTAATTATTG






TGAAAGCA





531
TTN
NM_133378.4
  850-
TCGACTGCTCAGATCTCAGAATC





  949
AAGACAAACCCGAATTGAAAAG






AAGATTGAAGCCCACTTTGATGC






CAGATCAATTGCAACAGTTGAGA






TGGTCATAG





532
TUBB
NM_178014.2
 2223-
CAAAAAAGAATGAACACCCCTG





 2322
ACTCTGGAGTGGTGTATACTGCC






ACATCAGTGTTTGAGTCAGTCCC






CAGAGGAGAGGGGAACCCTCCT






CCATCTTTTT





533
TUG1
NR_002323.2
 7082-
TAAGCTAGAGGTCATGGTCACTG





 7181
AAATTACTTTCCAAAGTGGAAGA






CAAAATGAAACAGGAACTGAGG






GAATATTTAAGATCCCACAGAAG






CGTAAAAAT





534
TXN
NM_003329.3
  152-
TTGGATCCATTTCCATCGGTCCT





  251
TACAGCCGCTCGTCAGACTCCAG






CAGCCAAGATGGTGAAGCAGATC






GAGAGCAAGACTGCTTTTCAGGA






AGCCTTGG





535
TXNDC
NM_032731.3
  378-
TCATCTACTGCCAAGTAGGAGAA



17

  477
AAGCCTTATTGGAAAGATCCAAA






TAATGACTTCAGAAAAAACTTGA






AAGTAACAGCAGTGCCTACACTA






CTTAAGTA





536
TXNRD1
NM_00109377
 3348-
CTCAGTTGCAGCACTGAGTGGTC




1.2
 3447
AAAATACATTTCTGGGCCACCTC






AGGGAACCCATGCATCTGCCTGG






CATTTAGGCAGCAGAGCCCCTGA






CCGTCCCC





537
TXNRD1 
NM_182743.2
 2438-
TGTTGCATGGAAGGGATAGTTTG



b

 2537
GCTCCCTTGGAGGCTATGTAGGC






TTGTCCCGGGAAAGAGAACTGTC






CTGCAGCTGAAATGGACTGTTCT






TTACTGAC





538
U2AF2
NM_007279.2
 2871-
TTTATGGCCAAACTATTTTGAAT





 2970
TTTGTTGTCCGGCCCTCAGTGCC






CTGCCCTCTCCCTTACCAGGACC






ACAGCTCTGTTCCTTCGGCCTCT






GGTCCTCT





539
UBA1
NM_003334.3
 3307-
CCGCCACGTGCGGGCGCTGGTGC





 3406
TTGAGCTGTGCTGTAACGACGAG






AGCGGCGAGGATGTCGAGGTTC






CCTATGTCCGATACACCATCCGC






TGACCCCGT





540
UBC
NM_021009.3
 1876-
TGCAGATCTTCGTGAAGACCCTG





 1975
ACTGGTAAGACCATCACTCTCGA






AGTGGAGCCGAGTGACACCATT






GAGAATGTCAAGGCAAAGATCC






AAGACAAGGA





541
UBE2G1
NM_003342.4
  685-
ACGCTGGCTCCCTATCCACACTG





  784
TGGAAACCATCATGATTAGTGTC






ATTTCTATGCTGGCAGACCCTAA






TGGAGACTCACCTGCTAATGTTG






ATGCTGCG





542
UBE2I
NM_194259.2
  288-
CTGCTCTGCTGACTGGGGAAGTC





  387
ATCGTGCCACCCAGAACCTGAGT






GCGGGCCTCTCAGAGCTCCTTCG






TCCGTGGGTCTGCCGGGGACTGG






GCCTTGTC





543
UBTF
NM_00107668
 2724-
GGGGGTCCCAAAGAGTTTGATG




3.1
 2823
AGGCCCTCCACACCTGCGGCCCA






ATCCAAGGTGGGGTGGAAGCTT






GGGGAAGACCCATTCCTTCCCAG






AGGGGCCTGC





544
UQCRQ
NM_014402.4
   97-
TGACGCGGATGCGGCATGTGATC





  196
AGCTACAGCTTGTCACCGTTCGA






GCAGCGCGCCTATCCGCACGTCT






TCACTAAAGGAATCCCCAATGTT






CTGCGCCG





545
USP16
NM_00103241
 2487-
TCTATTCCTTATATGGAGTTGTT




0.1
 2586
GAACACAGTGGTACTATGAGGTC






GGGGCATTACACTGCCTATGCCA






AGGCAAGAACCGCAAATAGTCAT






CTCTCTAA





546
USP21
NM_012475.4
 1499-
CCTTTTCACTAAGGAAGAAGAGC





 1598
TAGAGTCGGAGAATGCCCCAGT






GTGTGACCGATGTCGGCAGAAA






ACTCGAAGTACCAAAAAGTTGA






CAGTACAAAGA





547
USP34
NM_014709.3
10104-
AGGAGCACACTGTAGACAGCTG





10203
CATCAGTGACATGAAAACAGAA






ACCAGGGAGGTCCTGACCCCAA






CGAGCACTTCTGACAATGAGACC






AGAGACTCCTC





548
USP5
NM_003481.2
 2720-
AGAGCAGAGGGGCAGCGATAGA





 2819
CTCTGGGGATGGAGCAGGACGG






GGACGGGAGGGGCCGGCCACCT






GTCTGTAAGGAGACTTTGTTGCT






TCCCCTGCCCC





549
USP9Y
NM_004654.3
   86-
GGTGTGGAAAGACTTTTCTGGGC





  185
TCAGAGGTGAAACTGACCCTTGT






GTATCAGCAGCATTTCTGACTGA






CTGAGAGAGTGTAGTGATTAACA






GAGTTGTG





550
VPS37C
NM_017966.4
 2579-
TTATAAAGAGAAATCACTAATGG





 2678
ACTCTACTGGTTTGAGTGCTTCT






GAGCTGGATGACCGACCGCCTGT






ATGTTTGTGTAATTAATTGCCAT






AATAAACT





551
WDR1
NM_005112.4
 2325-
AACTGTTGCCTGTCAGTGTTTAC





 2424
AAACTAGTGCGTTGACGGCACCG






TGTCCAAGTTTTTAGAACCCTTG






TTAGCCAGACCGAGGTGTCCTGG






TCACCGTT





552
WDR91
NM_014149.3
 2777-
CAGGCTCTCCTGTTGCTTTGCCA





 2876
TGGAGCCAGGTCAGCTCTCTGTC






TGTTCTGCTGGGTAACAAGGTTT






GGCAGTTCCTGTTTCTCTGGGCT






TAAGTCAA





553
XCL2
NM_003175.3
  378-
GTAGTCTCTGGCACCCTGTCCGT





  477
CTCCAGCCAGCCAGCTCATTTCA






CTTTACACCCTCATGGACTGAGA






TTATACTCACCTTTTATGAAAGC






ACTGCATG





554
XPC
NR_027299.1
 3168- 
CTGGATGGTGGTGCATCCGTGAA





 3267
TGCGCTGATCGTTTCTTCCAGTT






AGAGTCTTCATCTGTCCGACAAG






TTCACTCGCCTCGGTTGCGGACC






TAGGACCA





555
YPEL1
NM_013313.4
 3672-
GCTCATTTTTAAACCAAATGAAC





 3771
AGACCATGAGCTGGCTTCAGGG






GAAGTGCTATTCACAGGACCATA






TCCACCACCCTCTTAAATTCCTA






AACAATATC





556
ZMIZ1
NM_020338.3
 7171-
ATGATCACAGGTGATTCACACGT





 7270
ACACACATAAACACACCCACCA






GTGCAGCCTGAAGTAACTCCCAC






AGAAACCATCATCGTCTTTGTAC






ATCGTATGT





557
ZNF143
NM_003442.5
 2292-
TATCAGATCACAAACTCCTAGAG





 2391
TCTACATGCAAGACTAGTAAAGT






CTTATGGAGTCTTATGATGGATT






TTTAACTTCCCGTGGAAAAAAAA






ATAAAGGC





558
ZNF239
NM_00109928
 1496-
AGAGCTCCAACCTTCACATCCAC




3.1
 1595
CAGCGGGTTCACAAGAAAGATC






CTCGCTAACTGACATTAGCCCAT






TCAGGTCTTCACAGCGCTCATAC






TGTAAAAAC





559
ZNF341
NM_032819.4
 3247-
CAGACGGTTCCCCACAGCATCCT





 3346
CAGACAGCTCTGTGATGTAGCTT






TTAGGAGGCACTCAGGTGTCACG






GCTAGACTGCAGCTATGAGACA






GATCTGGCT









C. Polymerase Chain Reaction (PCR) Techniques


Another suitable quantitative method is RT-PCR, which can be used to compare mRNA levels in different sample populations, in normal and tumor tissues, to characterize patterns of gene expression, to discriminate between closely related mRNAs, and to analyze RNA structure. The first step is the isolation of mRNA from a target sample (e.g., typically total RNA isolated from human PBMC). mRNA can be extracted, for example, from frozen or archived paraffin-embedded and fixed (e.g. formalin-fixed) tissue samples.


General methods for mRNA extraction are well known in the art, such standard textbooks of molecular biology. In particular, RNA isolation can be performed using a purification kit, buffer set and protease from commercial manufacturers, according to the manufacturer's instructions. Exemplary commercial products include TRI-REAGENT, Qiagen RNeasy mini-columns, MASTERPURE Complete DNA and RNA Purification Kit (EPICENTRE®, Madison, Wis.), Paraffin Block RNA Isolation Kit (Ambion, Inc.) and RNA Stat-60 (Tel-Test). Conventional techniques such as cesium chloride density gradient centrifugation may also be employed.


The first step in gene expression profiling by RT-PCR is the reverse transcription of the RNA template into cDNA, followed by its exponential amplification in a PCR reaction. The two most commonly used reverse transcriptases are avilo myeloblastosis virus reverse transcriptase (AMV-RT) and Moloney murine leukemia virus reverse transcriptase (MMLV-RT). The reverse transcription step is typically primed using specific primers, random hexamers, or oligo-dT primers, depending on the circumstances and the goal of expression profiling. See, e.g., manufacturer's instructions accompanying the product GENEAMP RNA PCR kit (Perkin Elmer, Calif., USA). The derived cDNA can then be used as a template in the subsequent RT-PCR reaction.


The PCR step generally uses a thermostable DNA-dependent DNA polymerase, such as the Taq DNA polymerase, which has a 5′-3′ nuclease activity but lacks a 3′-5′ proofreading endonuclease activity. Thus, TAQMAN® PCR typically utilizes the 5′-nuclease activity of Taq or Tth polymerase to hydrolyze a hybridization probe bound to its target amplicon, but any enzyme with equivalent 5′ nuclease activity can be used. Two oligonucleotide primers are used to generate an amplicon typical of a PCR reaction. In one embodiment, the target sequence is shown in Table III. A third oligonucleotide, or probe, is designed to detect nucleotide sequence located between the two PCR primers. The probe is non-extendible by Taq DNA polymerase enzyme, and is labeled with a reporter fluorescent dye and a quencher fluorescent dye. Any laser-induced emission from the reporter dye is quenched by the quenching dye when the two dyes are located close together as they are on the probe. During the amplification reaction, the Taq DNA polymerase enzyme cleaves the probe in a template-dependent manner. The resultant probe fragments disassociate in solution, and signal from the released reporter dye is free from the quenching effect of the second fluorophore. One molecule of reporter dye is liberated for each new molecule synthesized, and detection of the unquenched reporter dye provides the basis for quantitative interpretation of the data.


TaqMan® RT-PCR can be performed using commercially available equipment. In a preferred embodiment, the 5′ nuclease procedure is run on a real-time quantitative PCR device such as the ABI PRISM 7900® Sequence Detection System®. The system amplifies samples in a 96-well format on a thermocycler. During amplification, laser-induced fluorescent signal is collected in real-time through fiber optic cables for all 96 wells, and detected at the CCD. The system includes software for running the instrument and for analyzing the data. 5′-Nuclease assay data are initially expressed as Ct, or the threshold cycle. As discussed above, fluorescence values are recorded during every cycle and represent the amount of product amplified to that point in the amplification reaction. The point when the fluorescent signal is first recorded as statistically significant is the threshold cycle (Ct).


To minimize errors and the effect of sample-to-sample variation, RT-PCR is usually performed using an internal standard. The ideal internal standard is expressed at a constant level among different tissues, and is unaffected by the experimental treatment. RNAs most frequently used to normalize patterns of gene expression are mRNAs for the housekeeping genes glyceraldehyde-3-phosphate-dehydrogenase (GAPDH) and β-actin.


Real time PCR is comparable both with quantitative competitive PCR, where internal competitor for each target sequence is used for normalization, and with quantitative comparative PCR using a normalization gene contained within the sample, or a housekeeping gene for RT-PCR.


In another PCR method, i.e., the MassARRAY-based gene expression profiling method (Sequenom, Inc., San Diego, Calif.), following the isolation of RNA and reverse transcription, the obtained cDNA is spiked with a synthetic DNA molecule (competitor), which matches the targeted cDNA region in all positions, except a single base, and serves as an internal standard. The cDNA/competitor mixture is PCR amplified and is subjected to a post-PCR shrimp alkaline phosphatase (SAP) enzyme treatment, which results in the dephosphorylation of the remaining nucleotides. After inactivation of the alkaline phosphatase, the PCR products from the competitor and cDNA are subjected to primer extension, which generates distinct mass signals for the competitor- and cDNA-derived PCR products. After purification, these products are dispensed on a chip array, which is pre-loaded with components needed for analysis with matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) analysis. The cDNA present in the reaction is then quantified by analyzing the ratios of the peak areas in the mass spectrum generated.


Still other embodiments of PCR-based techniques which are known to the art and may be used for gene expression profiling include, e.g., differential display, amplified fragment length polymorphism (iAFLP), and BeadArray™ technology (Illumina, San Diego, Calif.) using the commercially available Luminex100 LabMAP system and multiple color-coded microspheres (Luminex Corp., Austin, Tex.) in a rapid assay for gene expression; and high coverage expression profiling (HiCEP) analysis.


D. Microarrays


Differential gene expression can also be identified, or confirmed using the microarray technique. Thus, the expression profile of lung cancer-associated genes can be measured in either fresh or paraffin-embedded tissue, using microarray technology. In this method, polynucleotide sequences of interest (including cDNAs and oligonucleotides) are plated, or arrayed, on a microchip substrate. The arrayed sequences are then hybridized with specific DNA probes from cells or tissues of interest. Just as in the other methods and compositions herein, the source of mRNA is total RNA isolated from whole blood of controls and patient subjects.


In one embodiment of the microarray technique, PCR amplified inserts of cDNA clones are applied to a substrate in a dense array. In one embodiment, all 559 nucleotide sequences from Table III are applied to the substrate. The microarrayed genes, immobilized on the microchip, are suitable for hybridization under stringent conditions. Fluorescently labeled cDNA probes may be generated through incorporation of fluorescent nucleotides by reverse transcription of RNA extracted from tissues of interest. Labeled cDNA probes applied to the chip hybridize with specificity to each spot of DNA on the array. After stringent washing to remove non-specifically bound probes, the chip is scanned by confocal laser microscopy or by another detection method, such as a CCD camera. Quantitation of hybridization of each arrayed element allows for assessment of corresponding mRNA abundance. With dual color fluorescence, separately labeled cDNA probes generated from two sources of RNA are hybridized pairwise to the array. The relative abundance of the transcripts from the two sources corresponding to each specified gene is thus determined simultaneously. The miniaturized scale of the hybridization affords a convenient and rapid evaluation of the expression pattern for large numbers of genes. Such methods have been shown to have the sensitivity required to detect rare transcripts, which are expressed at a few copies per cell, and to reproducibly detect at least approximately two-fold differences in the expression levels. Microarray analysis can be performed by commercially available equipment, following manufacturer's protocols.


Other useful methods summarized by U.S. Pat. No. 7,081,340, and incorporated by reference herein include Serial Analysis of Gene Expression (SAGE) and Massively Parallel Signature Sequencing (MPSS). Briefly, serial analysis of gene expression (SAGE) is a method that allows the simultaneous and quantitative analysis of a large number of gene transcripts, without the need of providing an individual hybridization probe for each transcript. First, a short sequence tag (about 10 to 14 bp) is generated that contains sufficient information to uniquely identify a transcript, provided that the tag is obtained from a unique position within each transcript. Then, many transcripts are linked together to form long serial molecules, that can be sequenced, revealing the identity of the multiple tags simultaneously. The expression pattern of any population of transcripts can be quantitatively evaluated by determining the abundance of individual tags, and identifying the gene corresponding to each tag. For more details see, e.g. Velculescu et al., Science 270:484 487 (1995); and Velculescu et al., Cell 88:243 51 (1997), both of which are incorporated herein by reference.


Gene Expression Analysis by Massively Parallel Signature Sequencing (MPSS), described by Brenner et al., Nature Biotechnology 18:630 634 (2000) (which is incorporated herein by reference), is a sequencing approach that combines non-gel-based signature sequencing with in vitro cloning of millions of templates on separate 5 μm diameter microbeads. First, a microbead library of DNA templates is constructed by in vitro cloning. This is followed by the assembly of a planar array of the template-containing microbeads in a flow cell at a high density (typically greater than 3×106 microbeads/cm2). The free ends of the cloned templates on each microbead are analyzed simultaneously, using a fluorescence-based signature sequencing method that does not require DNA fragment separation. This method has been shown to simultaneously and accurately provide, in a single operation, hundreds of thousands of gene signature sequences from a yeast cDNA library.


E. Immunohistochemistry


Immunohistochemistry methods are also suitable for detecting the expression levels of the gene expression products of the informative genes described for use in the methods and compositions herein. Antibodies or antisera, preferably polyclonal antisera, and most preferably monoclonal antibodies, or other protein-binding ligands specific for each marker are used to detect expression. The antibodies can be detected by direct labeling of the antibodies themselves, for example, with radioactive labels, fluorescent labels, hapten labels such as, biotin, or an enzyme such as horse radish peroxidase or alkaline phosphatase. Alternatively, unlabeled primary antibody is used in conjunction with a labeled secondary antibody, comprising antisera, polyclonal antisera or a monoclonal antibody specific for the primary antibody. Protocols and kits for immunohistochemical analyses are well known in the art and are commercially available.


III. COMPOSITIONS OF THE INVENTION

The methods for diagnosing lung cancer described herein which utilize defined gene expression profiles permit the development of simplified diagnostic tools for diagnosing lung cancer, e.g., NSCLC vs. non-cancerous nodule. Thus, a composition for diagnosing lung cancer in a mammalian subject as described herein can be a kit or a reagent. For example, one embodiment of a composition includes a substrate upon which said polynucleotides or oligonucleotides or ligands or ligands are immobilized. In another embodiment, the composition is a kit containing the relevant 5 or more polynucleotides or oligonucleotides or ligands, optional detectable labels for same, immobilization substrates, optional substrates for enzymatic labels, as well as other laboratory items. In still another embodiment, at least one polynucleotide or oligonucleotide or ligand is associated with a detectable label.


In one embodiment, a composition for diagnosing lung cancer in a mammalian subject includes 5 or more PCR primer-probe sets. Each primer-probe set amplifies a different polynucleotide sequence from a gene expression product of 5 or more informative genes found in the blood of the subject. These informative genes are selected to form a gene expression profile or signature which is distinguishable between a subject having lung cancer and a subject having a non-cancerous nodule. Changes in expression in the genes in the gene expression profile from that of a reference gene expression profile are correlated with a lung cancer, such as non-small cell lung cancer (NSCLC).


In one embodiment of this composition, the informative genes are selected from among the genes identified in Table I. In another embodiment of this composition, the informative genes are selected from among the genes identified in Table II. This collection of genes is those for which the gene product expression is altered (i.e., increased or decreased) versus the same gene product expression in the blood of a reference control (i.e., a patient having a non-cancerous nodule). In one embodiment, polynucleotide or oligonucleotide or ligands, i.e., probes, are generated to 5 or more informative genes from Table I or Table II for use in the composition (the CodeSet). An example of such a composition contains probes to a targeted portion of the 559 genes of Table I. In another embodiment, probes are generated to all 559 genes from Table I for use in the composition. In another embodiment, probes are generated to the first 539 genes from Table I for use in the composition. In another embodiment, probes are generated to the first 3 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 5 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 10 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 15 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 20 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 25 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 30 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 35 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 40 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 45 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 50 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 60 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 65 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 70 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 75 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 80 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 85 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 90 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 95 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 100 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 200 genes from Table I for use in the composition. In yet another embodiment, probes are generated to 300 genes from Table I for use in the composition. Still other embodiments employ probes to a targeted portion of other combinations of the genes in Table I or Table II. The selected genes from the Table need not be in rank order; rather any combination that clearly shows a difference in expression between the reference control to the diseased patient is useful in such a composition.


In one embodiment of the compositions described above, the reference control is a non-healthy control (NHC) as described above. In other embodiments, the reference control may be any class of controls as described above in “Definitions”.


The compositions based on the genes selected from Table I or Table II described herein, optionally associated with detectable labels, can be presented in the format of a microfluidics card, a chip or chamber, or a kit adapted for use with the Nanostring, PCR, RT-PCR or Q PCR techniques described above. In one aspect, such a format is a diagnostic assay using TAQMAN® Quantitative PCR low density arrays. In another aspect, such a format is a diagnostic assay using the Nanostring nCounter platform.


For use in the above-noted compositions the PCR primers and probes are preferably designed based upon intron sequences present in the gene(s) to be amplified selected from the gene expression profile. Exemplary target sequences are shown in Table III. The design of the primer and probe sequences is within the skill of the art once the particular gene target is selected. The particular methods selected for the primer and probe design and the particular primer and probe sequences are not limiting features of these compositions. A ready explanation of primer and probe design techniques available to those of skill in the art is summarized in U.S. Pat. No. 7,081,340, with reference to publically available tools such as DNA BLAST software, the Repeat Masker program (Baylor College of Medicine), Primer Express (Applied Biosystems); MGB assay-by-design (Applied Biosystems); Primer3 (Steve Rozen and Helen J. Skaletsky (2000) Primer3 on the WWW for general users and for biologist programmers.


In general, optimal PCR primers and probes used in the compositions described herein are generally 17-30 bases in length, and contain about 20-80%, such as, for example, about 50-60% G+C bases. Melting temperatures of between 50 and 80° C., e.g. about 50 to 70° C. are typically preferred.


In another aspect, a composition for diagnosing lung cancer in a mammalian subject contains a plurality of polynucleotides immobilized on a substrate, wherein the plurality of genomic probes hybridize to 100 or more gene expression products of 100 or more informative genes selected from a gene expression profile in the blood of the subject, the gene expression profile comprising genes selected from Table I. In another embodiment, a composition for diagnosing lung cancer in a mammalian subject contains a plurality of polynucleotides immobilized on a substrate, wherein the plurality of genomic probes hybridize to 10 or more gene expression products of 10 or more informative genes selected from a gene expression profile in the blood of the subject, the gene expression profile comprising genes selected from Table I or Table II. This type of composition relies on recognition of the same gene profiles as described above for the Nanostring compositions but employs the techniques of a cDNA array. Hybridization of the immobilized polynucleotides in the composition to the gene expression products present in the blood of the patient subject is employed to quantitate the expression of the informative genes selected from among the genes identified in Tables I or Table II to generate a gene expression profile for the patient, which is then compared to that of a reference sample. As described above, depending upon the identification of the profile (i.e., that of genes of Table I or subsets thereof, that of genes of Table II or subsets thereof), this composition enables the diagnosis and prognosis of NSCLC lung cancers. Again, the selection of the polynucleotide sequences, their length and labels used in the composition are routine determinations made by one of skill in the art in view of the teachings of which genes can form the gene expression profiles suitable for the diagnosis and prognosis of lung cancers.


In yet another aspect, a composition or kit useful in the methods described herein contain a plurality of ligands that bind to 100 or more gene expression products of 100 or more informative genes selected from a gene expression profile in the blood of the subject. In another embodiment, a composition or kit useful in the methods described herein contain a plurality of ligands that bind to 10 or more gene expression products of 10 or more informative genes selected from a gene expression profile in the blood of the subject. The gene expression profile contains the genes of Table I or Table II, as described above for the other compositions. This composition enables detection of the proteins expressed by the genes in the indicated Tables. While preferably the ligands are antibodies to the proteins encoded by the genes in the profile, it would be evident to one of skill in the art that various forms of antibody, e.g., polyclonal, monoclonal, recombinant, chimeric, as well as fragments and components (e.g., CDRs, single chain variable regions, etc.) may be used in place of antibodies. Such ligands may be immobilized on suitable substrates for contact with the subject's blood and analyzed in a conventional fashion. In certain embodiments, the ligands are associated with detectable labels. These compositions also enable detection of changes in proteins encoded by the genes in the gene expression profile from those of a reference gene expression profile. Such changes correlate with lung cancer in a manner similar to that for the PCR and polynucleotide-containing compositions described above.


For all of the above forms of diagnostic/prognostic compositions, the gene expression profile can, in one embodiment, include at least the first 25 of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 10 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 15 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 20 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 30 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 40 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 50 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 60 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 70 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 80 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 90 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include all 100 of the informative genes of Table II. In one embodiment, for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include at least the first 100 of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 200 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 300 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 400 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 500 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 539 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include all 559 of the informative genes of Table I.


These compositions may be used to diagnose lung cancers, such as stage I or stage II NSCLC. Further these compositions are useful to provide a supplemental or original diagnosis in a subject having lung nodules of unknown etiology.


IV. DIAGNOSTIC METHODS OF THE INVENTION

All of the above-described compositions provide a variety of diagnostic tools which permit a blood-based, non-invasive assessment of disease status in a subject. Use of these compositions in diagnostic tests, which may be coupled with other screening tests, such as a chest X-ray or CT scan, increase diagnostic accuracy and/or direct additional testing.


Thus, in one aspect, a method is provided for diagnosing lung cancer in a mammalian subject. This method involves identifying a gene expression profile in the blood of a mammalian, preferably human, subject. In one embodiment, the gene expression profile includes 100 or more gene expression products of 100 or more informative genes having increased or decreased expression in lung cancer. The gene expression profiles are formed by selection of 100 or more informative genes from the genes of Table I. In another embodiment, the gene expression profile includes 10 or more gene expression products of 10 or more informative genes having increased or decreased expression in lung cancer. The gene expression profiles are formed by selection of 10 or more informative genes from the genes of Table I. In another embodiment, the gene expression profiles are formed by selection of 10 or more informative genes from the genes of Table II. In another embodiment, the gene expression profile includes 10 or more gene expression products of 5 or more informative genes having increased or decreased expression in lung cancer. The gene expression profiles are formed by selection of 5 or more informative genes from the genes of Table I. In another embodiment, the gene expression profiles are formed by selection of 5 or more informative genes from the genes of Table II. Comparison of a subject's gene expression profile with a reference gene expression profile permits identification of changes in expression of the informative genes that correlate with a lung cancer (e.g., NSCLC). This method may be performed using any of the compositions described above. In one embodiment, the method enables the diagnosis of a cancerous tumor from a benign nodule.


In another aspect, use of any of the compositions described herein is provided for diagnosing lung cancer in a subject.


The diagnostic compositions and methods described herein provide a variety of advantages over current diagnostic methods. Among such advantages are the following. As exemplified herein, subjects with cancerous tumors are distinguished from those with benign nodules. These methods and compositions provide a solution to the practical diagnostic problem of whether a patient who presents at a lung clinic with a small nodule has malignant disease. Patients with an intermediate-risk nodule would clearly benefit from a non-invasive test that would move the patient into either a very low-likelihood or a very high-likelihood category of disease risk. An accurate estimate of malignancy based on a genomic profile (i.e. estimating a given patient has a 90% probability of having cancer versus estimating the patient has only a 5% chance of having cancer) would result in fewer surgeries for benign disease, more early stage tumors removed at a curable stage, fewer follow-up CT scans, and reduction of the significant psychological costs of worrying about a nodule. The economic impact would also likely be significant, such as reducing the current estimated cost of additional health care associated with CT screening for lung cancer, i.e., $116,000 per quality adjusted life-year gained. A non-invasive blood genomics test that has a sufficient sensitivity and specificity would significantly alter the post-test probability of malignancy and thus, the subsequent clinical care.


A desirable advantage of these methods over existing methods is that they are able to characterize the disease state from a minimally-invasive procedure, i.e., by taking a blood sample. In contrast, current practice for classification of cancer tumors from gene expression profiles depends on a tissue sample, usually a sample from a tumor. In the case of very small tumors a biopsy is problematic and clearly if no tumor is known or visible, a sample from it is impossible. No purification of tumor is required, as is the case when tumor samples are analyzed. A recently published method depends on brushing epithelial cells from the lung during bronchoscopy, a method which is also considerably more invasive than taking a blood sample. Blood samples have an additional advantage, which is that the material is easily prepared and stabilized for later analysis, which is important when messenger RNA is to be analyzed.


The 559 classifier described herein showed a ROC-AUC of 0.81 over all tested samples. In one embodiment, when the sensitivity is about 90%, the specificity is about 46%. When the nodule classification accuracy is assessed by size without using a specific threshold for sensitivity, as nodules size and the cancer risk factor increases, the number of benign nodules classified as cancer increases. In one embodiment, the accuracy of the gene classifier is about 89% for nodules ≤8 mm. In another embodiment, the accuracy of the gene classifier is about 75% for nodules >8 to about ≤12 mm. In yet another embodiment, the accuracy of the gene classifier is about 68% for nodules >12 to about ≤16 mm. In another embodiment, the accuracy of the gene classifier is about 53% for >16 mm. See examples below.


In one embodiment, for nodules about <10 mm, the specificity is about 54% and the ROC-AUC to 0.85 at about 90% sensitivity. In another embodiment, for larger nodules, about >10 mm, the specificity is about 24% and the ROC-AUC about 0.71 at about 90% sensitivity.


The 100 Classifier described herein showed a ROC-AUC of 0.82 over all tested samples. In one embodiment, when the sensitivity is about 90%, the specificity is about 62%. In another embodiment, when the sensitivity is about 79%, the specificity is about 68%. In one embodiment, when the sensitivity is about 71%, the specificity is about 75%. See examples below.


These compositions and methods allow for more accurate diagnosis and treatment of lung cancer. Thus, in one embodiment, the methods described include treatment of the lung cancer. Treatment may removal of the neoplastic growth, chemotherapy and/or any other treatment known in the art or described herein.


In one embodiment, a method for diagnosing the existence or evaluating a lung cancer in a mammalian subject is provided, which includes identifying changes in the expression of 5, 10, 15 or more genes in the sample of said subject, said genes selected from the genes of Table I or the genes of Table II. The subject's gene expression levels are compare with the levels of the same genes in a reference or control, wherein changes in expression of the subject's genes from those of the reference correlates with a diagnosis or evaluation of a lung cancer.


In one embodiment, the diagnosis or evaluation comprise one or more of a diagnosis of a lung cancer, a diagnosis of a benign nodule, a diagnosis of a stage of lung cancer, a diagnosis of a type or classification of a lung cancer, a diagnosis or detection of a recurrence of a lung cancer, a diagnosis or detection of a regression of a lung cancer, a prognosis of a lung cancer, or an evaluation of the response of a lung cancer to a surgical or non-surgical therapy. In another embodiment, the changes comprise an upregulation of one or more selected genes in comparison to said reference or control or a downregulation of one or more selected genes in comparison to said reference or control.


In one embodiment, the method includes the size of a lung nodule in the subject. The specificity and sensitivity may be variable based on the size of the nodule. In one embodiment, the specificity is about 46% at about 90% sensitivity. In another embodiment, the specificity is about 54% at about 90% sensitivity for nodules <10 mm. In yet another embodiment, the accuracy is about 88% for nodules ≤8 mm, about 75% for nodules >8 mm and ≤12 mm, about 68% for nodules >12 mm and ≤16 mm, and about 53% for nodules >16 mm.


In another embodiment, the reference or control comprises three or more genes of Table I sample of at least one reference subject. The reference subject may be selected from the group consisting of: (a) a smoker with malignant disease, (b) a smoker with non-malignant disease, (c) a former smoker with non-malignant disease, (d) a healthy non-smoker with no disease, (e) a non-smoker who has chronic obstructive pulmonary disease (COPD), (f) a former smoker with COPD, (g) a subject with a solid lung tumor prior to surgery for removal of same; (h) a subject with a solid lung tumor following surgical removal of said tumor; (i) a subject with a solid lung tumor prior to therapy for same; and (j) a subject with a solid lung tumor during or following therapy for same. In one embodiment, the reference or control subject (a)-(j) is the same test subject at a temporally earlier timepoint.


The sample is selected from those described herein. In one embodiment, the sample is peripheral blood. The nucleic acids in the sample are, in some embodiments, stabilized prior to identifying changes in the gene expression levels. Such stabilization may be accomplished, e.g., using the Pax Gene system, described herein.


In one embodiment, the method of detecting lung cancer in a patient includes


a. obtaining a sample from the patient; and


b. detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product.


In another embodiment, the method of diagnosing lung cancer in a subject includes


a. obtaining a blood sample from a subject;


b. detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 100 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product; and


c. diagnosing the subject with cancer when changes in expression of the subject's genes from those of the reference are detected.


In yet another embodiment, the method includes


a. obtaining a blood sample from a subject;


b. detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 genes of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product;


c. diagnosing the subject with cancer when changes in expression of the subject's genes from those of the reference are detected; and


d. removing the neoplastic growth.


V. EXAMPLES

The invention is now described with reference to the following examples. These examples are provided for the purpose of illustration only and the invention should in no way be construed as being limited to these examples but rather should be construed to encompass any and all variations that become evident as a result of the teaching provided herein.


Example 1: Patient Population—Analysis A

For development of the gene classifier described herein, blood samples and clinical information were collected from 150 subjects, 73 having a diagnosis of lung cancer and 77 having a diagnosis of benign nodule. Patient characteristics are shown in FIG. 1.


Patients with lung cancer included newly diagnosed male and female patients with early stage lung cancer. They were in moderately good health (ambulatory), although with medical illness. They were excluded if they have had previous cancers, chemotherapy, radiation, or cancer surgery. They must have had a lung cancer diagnosis within preceding 6 months, histologic confirmation, and no systemic therapy, such as chemotherapy, radiation therapy or cancer surgery as biomarker levels may change with therapy. Thus the majority of the cancer patients were early stage (i.e., Stage I and Stage II).


The “control” cohort was derived from patients with benign lung nodules (e.g. ground glass opacities, single nodules, granulomas or hamartomas). These patients were evaluated at pulmonary clinics, or underwent thoracic surgery for a lung nodule. All samples were collected prior to surgery.


Example 2: Patient Population—Analysis B

Further blood samples and clinical information were collected from 120 subjects, 60 having a diagnosis of lung cancer and 60 having a diagnosis of benign nodule. Patients with lung cancer included newly diagnosed male and female patients with early stage lung cancer. They were in moderately good health (ambulatory), although with medical illness. They were excluded if they have had previous cancers, chemotherapy, radiation, or cancer surgery. They must have had a lung cancer diagnosis within preceding 6 months, histologic confirmation, and no systemic therapy, such as chemotherapy, radiation therapy or cancer surgery as biomarker levels may change with therapy. Thus the majority of the cancer patients were early stage (i.e., Stage I and Stage II).


The “control” cohort was derived from patients with benign lung nodules (e.g. granulomas or hamartomas). These patients were evaluated at pulmonary clinics, or underwent thoracic surgery for a lung nodule. All samples were collected prior to surgery.


Example 3: Sample Collection Protocols and Processing

Blood samples were collected in the clinic by the tissue acquisition technician. Blood samples were drawn directly into PAXgene Blood RNA Tubes via standard phlebotomy technique. These tubes contain a proprietary reagent that immediately stabilizes intracellular RNA, minimizing the ex-vivo degradation or up-regulation of RNA transcripts. The ability to eliminate freezing, batch samples, and to minimize the urgency to process samples following collection, greatly enhances lab efficiency and reduces costs.


Example 4—RNA Purification and Quality Assessment

PAXgene RNA is prepared using a standard commercially available kit from Qiagen™ that allows purification of mRNA. The resulting RNA is used for mRNA profiling. The RNA quality is determined using a Bioanalyzer. Only samples with RNA Integrity numbers >3 were used.


Briefly, RNA is isolated as follows. Turn shaker-incubator on and set to 55° C. before beginning. Unless otherwise noted, all steps in this protocol including centrifugation steps, should be carried out at room temp (15-25° C.). This protocol assumes samples are stores at −80° C. Unfrozen samples that have been left a RT per the Qiagen protocol of a minimum of 2 hours should be processed in the same way.


Thaw Paxgene tubes upright in a plastic rack. Invert tubes at least 10 times to mix before starting isolation. Prepare all necessary tubes. For each sample, the following are needed: 2 numbered 1.5 ml Eppendorf tubes; 1 Eppendorf tube with the sample information (this is the final tube); 1 Lilac Paxgene spin column; 1 Red Paxgene Spin column; and 5 Processing tubes.


Centrifuge the PAXgene Blood RNA Tube for 10 minutes at 5000×g using a swing-out rotor in Qiagen centrifuge. (Sigma 4-15° C. Centrifuge., Rotor: Sigma Nr. 11140, 7/01, 5500/min, Holder: Sigma 13115, 286 g 14/D, Inside tube holder: 18010, 125 g). Note: After thawed, ensure that the blood sample has been incubated in the PAXgene Blood RNA Tube for a minimum of 2 hours at room temperature (15-25° C.), in order to achieve complete lysis of blood cells.


Under the hood—remove the supernatant by decanting into bleach. When the supernatant is decanted, take care not to disturb the pellet, and dry the rim of the tube with a clean paper towel. Discard the decanted supernatant by placing the clotted blood into a bag and then into the infectious waste and discard the fluid portion down the sink and wash down with a lot of water. Add 4 ml RNase-free water to the pellet, and close the tube using a fresh secondary Hemogard closure.


Vortex until the pellet is visibly dissolved. Weigh the tubes in the centrifuge holder again to ensure they are balanced, and centrifuge for 10 minutes at 5000×g using a swing-out rotor Qiagen centrifuge Small debris remaining in the supernatant after vortexing but before centrifugation will not affect the procedure.


Remove and discard the entire supernatant. Leave tube upside-down for 1 min to drain off all supernatant. Incomplete removal of the supernatant will inhibit lysis and dilute the lysate, and therefore affect the conditions for binding RNA to the PAXgene membrane.


Add 350 μl Buffer BM1 and pipet up and down lyse the pellet.


Pipet the re-suspended sample into a labeled 1.5 ml microcentrifuge tube. Add 300 μl Buffer BM2. Then add 40 μl proteinase K. Mix by vortexing for 5 seconds, and incubate for 10 minutes at 55° C. using a shaker-incubator at the highest possible speed, 800 rpm on Eppendorf thermomixer. (If using a shaking water bath instead of a thermomixer, quickly vortex the samples every 2-3 minutes during the incubation. Keep the vortexer next to the incubator).


Pipet the lysate directly into a PAXgene Shredder spin column (lilac tube) placed in a 2 ml processing tube, and centrifuge for 3 minutes at 24 C at 18,500×g in the TOMY Microtwin centrifuge. Carefully pipet the lysate into the spin column and visually check that the lysate is completely transferred to the spin column. To prevent damage to columns and tubes, do not exceed 20,000×g.


Carefully transfer the entire supernatant of the flow-through fraction to a fresh 1.5 ml microcentrifuge tube without disturbing the pellet in the processing tube. Discard the pellet in the processing tube.


Add 700 μl isopropanol (100%) to the supernatant. Mix by vortexing.


Pipet 690 μl sample into the PAXgene RNA spin column (red) placed in a 2 ml processing tube, and centrifuge for 1 minute at 10,000×g. Place the spin column in a new 2 ml processing tube, and discard the old processing tube containing flow-through.


Pipet the remaining sample into the PAXgene RNA spin column (red), and centrifuge for 1 minute at 18,500×g. Place the spin column in a new 2 ml processing tube, and discard the old processing tube containing flow-through. Carefully pipet the sample into the spin column and visually check that the sample is completely transferred to the spin column.


Pipet 350 μl Buffer BM3 into the PAXgene RNA spin column. Centrifuge for 15 sec at 10,000×g. Place the spin column in a new 2 ml processing tube, and discard the old processing tube containing flow-through.


Prepare DNase I incubation mix for step 13. Add 10 μl DNase I stock solution to 70 μl Buffer RDD in a 1.5 ml microcentrifuge tube. Mix by gently flicking the tube, and centrifuge briefly to collect residual liquid from the sides of the tube.


Pipet the DNase I incubation mix (80 μl) directly onto the PAXgene RNA spin column membrane, and place on the benchtop (20-30° C.) for 15 minutes. Ensure that the DNase I incubation mix is placed directly onto the membrane. DNase digestion will be incomplete if part of the mix is applied to and remains on the walls or the O-ring of the spin column.


Pipet 350 μl Buffer BM3 into the PAXgene RNA spin column, and centrifuge for 15 sec at 18,500×g. Place the spin column in a new 2 ml processing tube, and discard the old processing tube containing flow-through.


Pipet 500 μl Buffer BM4 to the PAXgene RNA spin column, and centrifuge for 15 sec at 10,000×g. Place the spin column in a new 2 ml processing tube, and discard the old processing tube containing flow-through.


Add another 500 μl Buffer BM4 to the PAXgene RNA spin column. Centrifuge for 2 minutes at 18,500×g.


Discard the tube containing the flow-through, and place the PAXgene RNA spin column in a new 2 ml processing tube. Centrifuge for 1 minute at 18,500×g.


Discard the tube containing the flow-through. Place the PAXgene RNA spin column in a labeled 1.5 ml microcentrifuge tube (final tube), and pipet 40 μl Buffer BR5 directly onto the PAXgene RNA spin column membrane. Centrifuge for 1 minute at 10,000×g to elute the RNA. It is important to wet the entire membrane with Buffer BR5 in order to achieve maximum elution efficiency.


Repeat the elution step as described, using 40 μl Buffer BR5 and the same microcentrifuge tube. Centrifuge for 1 minute at 20,000×g to elute the RNA.


Incubate the eluate for 5 minutes at 65° C. in the shaker-incubator without shaking. After incubation, chill immediately on ice. This incubation at 65° C. denatures the RNA for downstream applications. Do not exceed the incubation time or temperature.


If the RNA samples will not be used immediately, store at −20° C. or −70° C. Since the RNA remains denatured after repeated freezing and thawing, it is not necessary to repeat the incubation at 65° C.


Example 5: Measurement of RNA Levels

To provide a biomarker signature that can be used in clinical practice to diagnose lung cancer, a gene expression profile with the smallest number of genes that maintain satisfactory accuracy is provided by the use of 100 more of the genes identified in Table I as well as by the use of 10 or more of the genes identified in Table II. These gene profiles or signatures permit simpler and more practical tests that are easy to use in a standard clinical laboratory. Because the number of discriminating genes is small enough, NanoString nCounter® platforms are developed using these gene expression profiles.


A. Nanostring nCounter® Platform Gene Expression Assay Protocol


Total RNA was isolated from whole blood using the Paxgene Blood miRNA Kit, as described above, and samples were checked for RNA quality. Samples were analyzed with the Agilent 2100 Bioanalyzer on a RNA Nano chip, using the RIN score and electropherogram picture as indicators for good sample integrity. Samples were also quantitated on the Nanodrop (ND-1000 Spectrophotometer) where 260/280 and 260/230 readings were recorded and evaluated for Nanostring-compatibility. From the concentrations taken by Nanodrop, total RNA samples were normalized to contain 100 ng in 5 μL, using Nuclease-free water as diluent, into Nanostring-provided tube strips. An 8 μL aliquot of a mixture of the Nanostring nCounter Reporter CodeSet and Hybridization Buffer (70 μL Hybridization Buffer, 42 μL Reporter CodeSet per 12 assays) and 2 μL of Capture ProbeSet was added to each 5 μL RNA sample. Samples were hybridized for 19 hours at 65° C. in the Thermocycler (Eppendorf). During hybridization, Reporter Probes, which have fluorescent barcodes specific to each mRNA of interest to the user, and biotinylated Capture Probes bound to their associated target mRNA to create target-probe complexes. After hybridization was complete, samples were then transferred to the nCounter Prep Station for processing using the Standard Protocol setting (Run Time: 2 hr35 min). The Prep Station robot, during the Standard Protocol, washed samples to remove excess Reporter and Capture Probes. Samples were moved to a streptavidin-coated cartridge where purified target-probe complexes were immobilized in preparation for imaging by the nCounter Digital Analyzer. Upon completion, the cartridge was sealed and placed in the Digital Analyzer using a Field of View (FOV) setting at 555. A fluorescent microscope tabulated the raw counts for each unique barcode associated with a target mRNA. Data collected was stored in .csv files and then transferred to the Bioinformatics Facility for analysis according to the manufacturer's instructions.


Example 6: Biomarker Selection

Support Vector Machine (SVM) can be applied to gene expression datasets for gene function discovery and classification. SVM has been found to be most efficient at distinguishing the more closely related cases and controls that reside in the margins. Primarily SVM-RFE (48, 54) was used to develop gene expression classifiers which distinguish clinically defined classes of patients from clinically defined classes of controls (smokers, non-smokers, COPD, granuloma, etc). SVM-RFE is a SVM based model utilized in the art that removes genes, recursively based on their contribution to the discrimination, between the two classes being analyzed. The lowest scoring genes by coefficient weights were removed and the remaining genes were scored again and the procedure was repeated until only a few genes remained. This method has been used in several studies to perform classification and gene selection tasks. However, choosing appropriate values of the algorithm parameters (penalty parameter, kernel-function, etc.) can often influence performance.


SVM-RCE is a related SVM based model, in that it, like SVM-RFE assesses the relative contributions of the genes to the classifier. SVM-RCE assesses the contributions of groups of correlated genes instead of individual genes. Additionally, although both methods remove the least important genes at each step, SVM-RCE scores and removes clusters of genes, while SVM-RFE scores and removes a single or small numbers of genes at each round of the algorithm.


The SVM-RCE method is briefly described here. Low expressing genes (average expression less than 2× background) were removed, quantile normalization performed, and then “outlier” arrays whose median expression values differ by more than 3 sigma from the median of the dataset were removed. The remaining samples were subject to SVM-RCE using ten repetitions of 10-fold cross-validation of the algorithm. The genes were reduced by t-test (applied on the training set) to an experimentally determined optimal value which produces highest accuracy in the final result. These starting genes were clustered by K-means into clusters of correlated genes whose average size is 3-5 genes. SVM classification scoring was carried out on each cluster using 3-fold resampling repeated 5 times, and the worst scoring clusters eliminated. Accuracy is determined on the surviving pool of genes using the left-out 10% of samples (testing set) and the top-scoring 100 genes were recorded. The procedure was repeated from the clustering step to an end point of 2 clusters. The optimal gene panel was taken to be the minimal number of genes which gives the maximal accuracy starting with the most frequently selected gene. The identity of the individual genes in this panel is not fixed, since the order reflects the number of times a given gene was selected in the top 100 informative genes and this order is subject to some variation.


A. Biomarker Selection.


Genes which score highest (by SVM) in discriminating cancerous tumors from benign nodules were examined for their utility for clinical tests. Factors considered include, higher differences in expression levels between classes, and low variability within classes. When selecting biomarkers for validation an effort was made to select genes with distinct expression profiles to avoid selection of correlated genes and to identify genes with differential expression levels that were robust by alternative techniques including PCR and/or immuno-histochemistry.


B. Validation.


Three methods of validation were considered.


Cross-Validation: To minimize over-fitting within a dataset, K-fold cross-validation (K usually equal to 10) was used, when the dataset is split on K parts randomly and K−1 parts were used for training and 1 for testing. Thus, for K=10 the algorithm was trained on a random selection of 90% of the patients and 90% of the controls and then tested on the remaining 10%. This was repeated until all of the samples have been employed as test subjects and the cumulated classifier makes use of all of the samples, but no sample is tested using a training set of which it is a part. To reduce the randomization impact, K-fold separation was performed M times producing different combinations of patients and controls in each of K folds each time. Therefore, for individual dataset M*K rounds of permuted selection of training and testing sets were used for each set of genes.


Independent Validation: To estimate the reproducibility of the data and the generality of the classifier, one needs to examine the classifier that was built using one dataset and tested using another dataset to estimate the performance of the classifier. To estimate the performance, validation on the second set was performed using the classifier developed with the original dataset.


Resampling (permutation): To demonstrate dependence of the classifier on the disease state, patients and controls from the dataset were chosen at random (permuted) and the classification was repeated. The accuracy of classification using randomized samples was compared to the accuracy of the developed classifier to determine the p value for the classifier, i.e., the possibility that the classifier might have been chosen by chance. In order to test the generality of a classifier developed in this manner, it was used to classify independent sets of samples that were not used in developing the classifier. The cross-validation accuracies of the permuted and original classifier were compared on independent test sets to confirm its validity in classifying new samples.


C. Classifier Performance


Performance of each classifier was estimated by different methods and several performance measurements were used for comparing classifiers between each other. These measurements include accuracy, area under ROC curve, sensitivity, specificity, true positive rate and true negative rate. Based on the required properties of the classification of interest, different performance measurements can be used to pick the optimal classifier, e.g. classifier to use in screening of the whole population would require better specificity to compensate for small (˜1%) prevalence of the disease and therefore avoid large number of false positive hits, while a diagnostic classifier of patients in hospital should be more sensitive.


For diagnosing cancerous tumors from benign nodules, higher sensitivity is more desirable than specificity, as the patients are already at high risk.


Example 7: Testing of the Classifiers

Peripheral blood samples were all collected in PAXgene RNA stabilizations tubes and RNA was extracted according to the manufacturer. Samples were tested on a Nanostring nCounter™ (as described above) against a custom panel of 559 probes (Table III). In addition, they were tested against a 100 probe subset of 559 marker panel.


For the 559 Classifier, 432 were selected based on previous microarray data, 107 probes were selected from Nanostring studies and 20 were housekeeping genes. We analyzed 610 PAXgene RNA samples (278 cancers, 332 controls) derived from 5 collection sites. For QC, a Universal RNA standard (Agilent) was included in each batch of 36 samples tested. Probe expression values were normalized using the 20 housekeeping genes as well as spike-in positive and negative controls supplied by Nanostring (included in classifier). Zscores were calculated for probe count values and served as the input to a Support Vector Machine (SVM) classifier using a polynomial kernel. Classification performance was evaluated by 10-fold cross-validation of the samples.


A. 559 Classifier


As shown in FIGS. 2A to 2B, the 559 classifier developed on all the samples showed a ROC-AUC of 0.81 (FIG. 2A). With the Sensitivity set at 90%, the specificity is 46%. When performed on a balanced set of 556 samples (278 cancer, 278 nodule), similar performance is shown (FIG. 2B). For both sets, UHR controls, post samples, and patients with other cancers were excluded.


When nodule classification accuracy is assessed by size without using a specific threshold for sensitivity, we find that as nodules size and the cancer risk factor increases, the number of benign nodules classified as cancer increases. FIG. 3. In this analysis, nodules ≤8 mm were correctly classified 88.9% of the time, for nodules >8, ≤12 mm accuracy was 75%, for nodules >12, ≤16 mm accuracy was 68%, for nodules >16 mm accuracy is 53.6%. See Table IV below.













TABLE IV





Nodule Size
Correct
Incorrect
Total
Specificity



















<=5 mm
108
19
127
85.0%


>5, <=8 mm
88
11
99
88.9%


 >8, <=12 mm
40
13
53
75.5%


>12, <=16 mm
17
8
25
68.0%


>16 mm
15
13
28
53.6%


Total
268
64
332
80.7%









A second set of nodules was tested and the accuracy of the classifier for size groups was determined by sample group (cancer vs benign nodule). Similarly, as nodule size and the cancer risk factor increases, the number of benign nodules classified as cancer increases (FIGS. 4A to 4C). For cancers >5 mm and higher, r=0.95. For nodules of all sizes, r=0.97. The chart shows the sensitivity and specificity of the classification of cancers and nodules based on lesion size. These numbers are shown in bar graph form below.


Since classification accuracy was found to be negatively correlated with benign nodule size, we reanalyzed the data using only nodules <10 mm (n=244) (FIG. 5A) and sensitivity fixed at 90%, in this case the specificity rises to 54% and the ROC-AUC to 0.85. For larger nodules, >10 mm (n=88) the specificity drops to 24% and the ROC-AUC drops to 0.71 (FIG. 5B). See Table V below.













TABLE V







Small
Large




≤10 mm
>10 mm
All nodules





















N (nodules)
244
88
332



min
1
10.4
1



max
10
90
90



mean
6.07
17.8
8.7



median
6
15
6



std
1.73
10.6
7.13



ROC Area
0.85
0.71
0.81



Specificity at
54%
42%
46%



90% Sensitivity










B. 100 Marker Classifier


We now reanalyzed the data from the 633 samples analyzed by W559 on the Nanostring platform in order to identify the minimal number of probes required to maintain performance attained with the whole panel. We used SVM-RFE for probe selection as previously described. We used 75% of the data for the training set with SVM-RFE and the tested the performance of top 100 probes (Table II) selected by this process on an independent testing set composed of 25% of the samples. Samples were randomly selected for training and testing sets Table VI below. The accuracy obtained on the testing set is shown in FIG. 6. In this analysis, at a sensitivity of 90%, specificity was 62%; at a sensitivity of 79%, specificity was 68%; and at a sensitivity of 71%, specificity was 75% (FIG. 6). In summary the ROC-AUC is 0.82 and at a sensitivity of 0.90 we achieve a specificity of 0.62.












TABLE VI









nodules
cancer














>
<=
n
>
<=
n


















0
5
130
0
14
86



5
8
109
14
22
75



8
12.5
65
22
33
64



12.5

57
33

47










Each and every patent, patent application, and publication, including the priority application, U.S. Provisional Patent Application No. 62/352,865, filed Jun. 21, 2017, and publically available gene sequence cited throughout the disclosure is expressly incorporated herein by reference in its entirety. While this invention has been disclosed with reference to specific embodiments, it is apparent that other embodiments and variations of this invention are devised by others skilled in the art without departing from the true spirit and scope of the invention. The appended claims include such embodiments and equivalent variations.

Claims
  • 1. A composition for diagnosing the existence or evaluating the progression of a lung cancer in a mammalian subject, said composition comprising at least 10 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II.
  • 2. The composition of claim 1, wherein at least one polynucleotide or oligonucleotide or ligand is attached to a detectable label.
  • 3. The composition of claim 2, wherein each polynucleotide or oligonucleotide or ligand is attached to a different detectable label.
  • 4. The composition of claim 1, further comprising a capture oligonucleotide, which hybridizes to at least one polynucleotide or oligonucleotide.
  • 5. The composition of claim 4, wherein the capture oligonucleotide is capable of hybridizing to each polynucleotide or oligonucleotide.
  • 6. The composition of claim 4, wherein the capture oligonucleotide binds to a substrate.
  • 7. The composition of claim 6, further comprising a substrate to which the capture oligonucleotide binds.
  • 8. The composition of claim 1, comprising at least 15 polynucleotides or oligonucleotides.
  • 9. The composition of claim 1, comprising at least 25 polynucleotides or oligonucleotides.
  • 10. The composition of claim 1, comprising at least 50 polynucleotides or oligonucleotides.
  • 11. (canceled)
  • 12. The composition of claim 1, comprising at least 100 polynucleotides or oligonucleotides.
  • 13. (canceled)
  • 14. The composition of claim 1, comprising polynucleotides or oligonucleotides capable of hybridizing to each different gene, gene fragment, gene transcript or expression product listed in Table I.
  • 15-22. (canceled)
  • 23. A kit comprising the composition of any of claim 1 and an apparatus for sample collection.
  • 24-25. (canceled)
  • 26. A method for diagnosing the existence or evaluating a lung cancer in a mammalian subject comprising identifying changes in the expression of 10 or more genes in the sample of said subject, said genes selected from the genes of Table I or the genes of Table II; and comparing said subject's gene expression levels with the levels of the same genes in a reference or control, wherein changes in expression of the subject's genes from those of the reference correlates with a diagnosis or evaluation of a lung cancer.
  • 27. The method according to claim 26, wherein said diagnosis or evaluation comprise one or more of a diagnosis of a lung cancer, a diagnosis of a benign nodule, a diagnosis of a stage of lung cancer, a diagnosis of a type or classification of a lung cancer, a diagnosis or detection of a recurrence of a lung cancer, a diagnosis or detection of a regression of a lung cancer, a prognosis of a lung cancer, or an evaluation of the response of a lung cancer to a surgical or non-surgical therapy.
  • 28. The method according to claim 25, wherein said changes comprise an upregulation of one or more selected genes in comparison to said reference or control or a downregulation of one or more selected genes in comparison to said reference or control.
  • 28. The method according to claim 25, further comprising identifying the size of a lung nodule in the subject.
  • 29. The method according to claim 25, wherein the specificity is about 46% at about 90% sensitivity.
  • 30. The method according to claim 25, wherein the specificity is about 54% at about 90% sensitivity for nodules <10 mm.
  • 31. The method according to claim 25, wherein the accuracy is about 88% for nodules ≤8 mm, about 75% for nodules >8 mm and ≤12 mm, about 68% for nodules >12 mm and ≤16 mm, and about 53% for nodules >16 mm.
  • 28-40. (canceled)
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

This invention was made with government support under Grant No. CA010815 awarded by the National Institutes of Health. The government has certain rights in the invention.

PCT Information
Filing Document Filing Date Country Kind
PCT/US2017/038571 6/21/2017 WO 00
Provisional Applications (1)
Number Date Country
62352865 Jun 2016 US