Compositions and methods for diagnosing lung cancers using gene expression profiles

BACKGROUND OF THE INVENTION

Lung cancer is the most common worldwide cause of cancer mortality. In the United States, lung cancer is the second most prevalent cancer in both men and women and will account for more than 174,000 new cases per year and more than 162,000 cancer deaths. In fact, lung cancer accounts for more deaths each year than from breast, prostate and colorectal cancers combined.

The high mortality (80-85% in five years), which has shown little or no improvement in the past 30 years, emphasizes the fact that new and effective tools to facilitate early diagnosis prior to metastasis to regional nodes or beyond the lung are needed.

High risk populations include smokers, former smokers, and individuals with markers associated with genetic predispositions. Because surgical removal of early stage tumors remains the most effective treatment for lung cancer, there has been great interest in screening high-risk patients with low dose spiral CT (LDCT). This strategy identifies non-calcified pulmonary nodules in approximately 30-70% of high risk individuals but only a small proportion of detected nodules are ultimately diagnosed as lung cancers (0.4 to 2.7%). Currently, the only way to differentiate subjects with lung nodules of benign etiology from subjects with malignant nodules is an invasive biopsy, surgery, or prolonged observation with repeated scanning Even using the best clinical algorithms, 20-55% of patients selected to undergo surgical lung biopsy for indeterminate lung nodules, are found to have benign disease and those that do not undergo immediate biopsy or resection require sequential imaging studies. The use of serial CT in this group of patients runs the risk of delaying potential curable therapy, along with the costs of repeat scans, the not-insignificant radiation doses, and the anxiety of the patient.

Ideally, a diagnostic test would be easily accessible, inexpensive, demonstrate high sensitivity and specificity, and result in improved patient outcomes (medically and financially). Others have shown that classifiers which utilize epithelial cells have high accuracy. However, harvesting these cells requires an invasive bronchoscopy. See, Silvestri et al, N Engl J Med. 2015 Jul. 16; 373(3): 243-251, which is incorporated herein by reference.

Efforts are in progress to develop non-invasive diagnostics using sputum, blood or serum and analyzing for products of tumor cells, methylated tumor DNA, single nucleotide polymorphism (SNPs) expressed messenger RNA or proteins. This broad array of molecular tests with potential utility for early diagnosis of lung cancer has been discussed in the literature. Although each of these approaches has its own merits, none has yet passed the exploratory stage in the effort to detect patients with early stage lung cancer, even in high-risk groups, or patients which have a preliminary diagnosis based on radiological and other clinical factors. A simple blood test, a routine event associated with regular clinical office visits, would be an ideal diagnostic test.

SUMMARY OF THE INVENTION

In one aspect, a composition or kit for diagnosing or evaluating a lung cancer in a mammalian subject includes ten (10) or more polynucleotides or oligonucleotides, wherein each polynucleotide or oligonucleotide hybridizes to a different gene, gene fragment, gene transcript or expression product in a patient sample. Each gene, gene fragment, gene transcript or expression product is selected from the genes of Table I or Table II. In one embodiment, at least one polynucleotide or oligonucleotide is attached to a detectable label. In one embodiment, the composition or kit includes polynucleotides or oligonucleotides which detect the gene, gene fragment, gene transcript or expression product of each of the 559 genes in Table I. In another embodiment, the composition or kit includes polynucleotides or oligonucleotides which detect the gene, gene fragment, gene transcript or expression product of each of the 100 genes in Table II.

In another aspect, a composition or kit for diagnosing or evaluating a lung cancer in a mammalian subject includes ten (10) or more ligands, wherein each ligand hybridizes to a different gene expression product in a patient sample. Each gene expression product is selected from the genes of Table I or Table II. In one embodiment, at least one ligand is attached to a detectable label. In one embodiment, the composition or kit includes ligands which detect the expression products of each of the 559 genes in Table I. In another embodiment, the composition or kit includes ligands which detect the expression products of each of the 100 genes in Table II.

The compositions described herein enable detection of changes in expression in the genes in the subject's gene expression profile from that of a reference gene expression profile. The various reference gene expression profiles are described below. In one embodiment, the composition provides the ability to distinguish a cancerous tumor from a non-cancerous nodule.

In another aspect, a method for diagnosing or evaluating a lung cancer in a mammalian subject involves identifying changes in the expression of three or more genes in the sample of a subject, said genes selected from the genes of Table I or Table II, and comparing that subject's gene expression levels with the levels of the same genes in a reference or control, wherein changes in expression of said gene expression correlates with a diagnosis or evaluation of a lung cancer. In one embodiment, the changes in expression of said gene expression provides the ability to distinguish a cancerous tumor from a non-cancerous nodule.

In another aspect, a method for diagnosing or evaluating a lung cancer in a mammalian subject involves identifying a gene expression profile in the blood of a subject, the gene expression profile comprising 10 or more gene expression products of 10 or more informative genes as described herein. The 10 or more informative genes are selected from the genes of Table I or Table II. In one embodiment, the gene expression profile contains all 559 genes of Table I. In another embodiment, the gene expression profile contains all 100 genes of Table II. The subject's gene expression profile is compared with a reference gene expression profile from a variety of sources described below. Changes in expression of the informative genes correlate with a diagnosis or evaluation of a lung cancer. In one embodiment, the changes in expression of said gene expression provides the ability to distinguish a cancerous tumor from a non-cancerous nodule.

In another aspect, a method of detecting lung cancer in a patient is provided. The method includes obtaining a sample from the patient; and detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product.

In yet another aspect, a method of diagnosing lung cancer in a subject is provided. The method includes obtaining a blood sample from a subject; detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product; and diagnosing the subject with cancer when changes in expression of the subject's genes from those of the reference are detected.

In another aspect, a method of diagnosing and treating lung cancer in a subject having a neoplastic growth is provided. The method includes obtaining a blood sample from a subject; detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product; diagnosing the subject with cancer when changes in expression of the subject's genes from those of the reference are detected; and removing the neoplastic growth. Other appropriate treatments may also be provided.

Other aspects and advantages of these compositions and methods are described further in the following detailed description of the preferred embodiments thereof.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a table showing patient characteristics for the samples used in Example 1.

FIGS. 2A and 2B are graphs showing the cross validated support vector machine classifier (CV SVM) of all 610 samples (FIG. 2A, Accuracy=0.75, ROC Area=0.81. According to the curve, when the sensitivity is 0.91, the specificity is 0.46; when the sensitivity is 0.72, the specificity is 0.77) and a balanced set of 556 samples (FIG. 2B, Accuracy=0.76, ROC Area=0.81, According to the curve, when the sensitivity is 0.90, the specificity is 0.48; when the sensitivity is 0.76, the specificity is 0.77), using the 559 Classifier. The full and balanced sets show similar performance.

FIG. 3 is a bar graph showing sensitivity of the classifier by nodule size groups (x-axis). Data shows that larger nodules are more likely to be misclassified (p=1.54*10⁻⁴).

FIGS. 4A to 4C show the classification of samples groups (cancer, FIG. 4B, n=204; and nodule, FIG. 4C, n=331) stratified by lesion size. Over cancers >5 mm and higher, r=0.95. For nodules of all sizes, r=0.97. The chart (FIG. 4A) shows the sensitivity and specificity of the classification of cancers and nodules based on lesion size. These numbers are shown in bar graph form below.

FIGS. 5A and 5B are graphs showing the cross validated support vector machine classifier (CV SVM) of all cancer samples (n=278) vs. small nodules (<10 mm) (n=244) (FIG. 5A, Accuracy=0.79, ROC Area=0.85. According to the curve, when the sensitivity is 0.90, the specificity is 0.54; when the sensitivity is 0.77, the specificity is 0.82) and 10-fold CV SVM using all cancer samples (n=278) vs. large nodules (≥10 mm) (n=88) (FIG. 5B, Accuracy=0.76, ROC Area=0.71. According to the curve, when the sensitivity is 0.90, the specificity is 0.24; when the sensitivity is 0.87, the specificity is 0.42).

FIG. 6 is a graph showing the cross validated support vector machine classifier (CV SVM) of 25% of the data set used for the 559 Classifier, used as a testing set for the 100 Classifier. ROC Area=0.82. According to the curve, when the sensitivity is 0.90, the specificity is 0.62; when the sensitivity is 0.79, the specificity is 0.68; and when the sensitivity is 0.71, the specificity is 0.75.

DETAILED DESCRIPTION OF THE INVENTION

The methods and compositions described herein apply gene expression technology to blood screening for the detection and diagnosis of lung cancer. The compositions and methods described herein provide the ability to distinguish a cancerous tumor from a non-cancerous nodule, by determining a characteristic RNA expression profile of the genes of the blood of a mammalian, preferably human, subject. The profile is compared with the profile of one or more subjects of the same class (e.g., patients having lung cancer or a non-cancerous nodule) or a control to provide a useful diagnosis.

These methods of lung cancer screening employ compositions suitable for conducting a simple and cost-effective and non-invasive blood test using gene expression profiling that could alert the patient and physician to obtain further studies, such as a chest radiograph or CT scan, in much the same way that the prostate specific antigen is used to help diagnose and follow the progress of prostate cancer. The application of these profiles provides overlapping and confirmatory diagnoses of the type of lung disease, beginning with the initial test for malignant vs. non-malignant disease.

“Patient” or “subject” as used herein means a mammalian animal, including a human, a veterinary or farm animal, a domestic animal or pet, and animals normally used for clinical research. In one embodiment, the subject of these methods and compositions is a human.

“Control” or “Control subject” as used herein refers to the source of the reference gene expression profiles as well as the particular panel of control subjects described herein. In one embodiment, the control or reference level is from a single subject. In another embodiment, the control or reference level is from a population of individuals sharing a specific characteristic. In yet another embodiment, the control or reference level is an assigned value which correlates with the level of a specific control individual or population, although not necessarily measured at the time of assaying the test subject's sample. In one embodiment, the control subject or reference is from a patient (or population) having a non-cancerous nodule. In another embodiment, the control subject or reference is from a patient (or population) having a cancerous tumor. In other embodiments, the control subject can be a subject or population with lung cancer, such as a subject who is a current or former smoker with malignant disease, a subject with a solid lung tumor prior to surgery for removal of same; a subject with a solid lung tumor following surgical removal of said tumor; a subject with a solid lung tumor prior to therapy for same; and a subject with a solid lung tumor during or following therapy for same. In other embodiments, the controls for purposes of the compositions and methods described herein include any of the following classes of reference human subject with no lung cancer. Such non-healthy controls (NHC) include the classes of smoker with non-malignant disease, a former smoker with non-malignant disease (including patients with lung nodules), a non-smoker who has chronic obstructive pulmonary disease (COPD), and a former smoker with COPD. In still other embodiments, the control subject is a healthy non-smoker with no disease or a healthy smoker with no disease.

“Sample” as used herein means any biological fluid or tissue that contains immune cells and/or cancer cells. The most suitable sample for use in this invention includes whole blood. Other useful biological samples include, without limitation, peripheral blood mononuclear cells, plasma, saliva, urine, synovial fluid, bone marrow, cerebrospinal fluid, vaginal mucus, cervical mucus, nasal secretions, sputum, semen, amniotic fluid, bronchoscopy sample, bronchoalveolar lavage fluid, and other cellular exudates from a patient having cancer. Such samples may further be diluted with saline, buffer or a physiologically acceptable diluent. Alternatively, such samples are concentrated by conventional means.

As used herein, the term “cancer” refers to or describes the physiological condition in mammals that is typically characterized by unregulated cell growth. More specifically, as used herein, the term “cancer” means any lung cancer. In one embodiment, the lung cancer is non-small cell lung cancer (NSCLC). In a more specific embodiment, the lung cancer is lung adenocarcinoma (AC or LAC). In another more specific embodiment, the lung cancer is lung squamous cell carcinoma (SCC or LSCC). In another embodiment, the lung cancer is a stage I or stage II NSCLC. In still another embodiment, the lung cancer is a mixture of early and late stages and types of NSCLC.

The term “tumor,” as used herein, refers to all neoplastic cell growth and proliferation, whether malignant or benign, and all pre-cancerous and cancerous cells and tissues. The term “nodule” refers to an abnormal buildup of tissue which is benign. The term “cancerous tumor” refers to a malignant tumor.

By “diagnosis” or “evaluation” it is meant a diagnosis of a lung cancer, a diagnosis of a stage of lung cancer, a diagnosis of a type or classification of a lung cancer, a diagnosis or detection of a recurrence of a lung cancer, a diagnosis or detection of a regression of a lung cancer, a prognosis of a lung cancer, or an evaluation of the response of a lung cancer to a surgical or non-surgical therapy. In one embodiment, “diagnosis” or “evaluation” refers to distinguishing between a cancerous tumor and a benign pulmonary nodule.

As used herein, “sensitivity” (also called the true positive rate), measures the proportion of positives that are correctly identified as such (e.g., the percentage of sick people who are correctly identified as having the condition).

As used herein, “specificity” (also called the true negative rate) measures the proportion of negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition).

By “change in expression” is meant an upregulation of one or more selected genes in comparison to the reference or control; a downregulation of one or more selected genes in comparison to the reference or control; or a combination of certain upregulated genes and down regulated genes.

By “therapeutic reagent” or “regimen” is meant any type of treatment employed in the treatment of cancers with or without solid tumors, including, without limitation, chemotherapeutic pharmaceuticals, biological response modifiers, radiation, diet, vitamin therapy, hormone therapies, gene therapy, surgical resection, etc.

By “informative genes” as used herein is meant those genes the expression of which changes (either in an up-regulated or down-regulated manner) characteristically in the presence of lung cancer. A statistically significant number of such informative genes thus form suitable gene expression profiles for use in the methods and compositions. Such genes are shown in Table I and Table II below. Such genes make up the “expression profile”.

The term “statistically significant number of genes” in the context of this invention differs depending on the degree of change in gene expression observed. The degree of change in gene expression varies with the type of cancer and with the size or spread of the cancer or solid tumor. The degree of change also varies with the immune response of the individual and is subject to variation with each individual. For example, in one embodiment of this invention, a large change, e.g., 2-3 fold increase or decrease in a small number of genes, e.g., in about 10 to 20 genes, is statistically significant. In another embodiment, a smaller relative change in about 15 more genes is statistically significant.

Thus, the methods and compositions described herein contemplate examination of the expression profile of a “statistically significant number of genes” ranging from 5 to about 559 genes in a single profile. In one embodiment, the genes are selected from Table I. In another embodiment, the genes are selected from Table II. In one embodiment, the gene profile is formed by a statistically significant number of 5 or more genes. In one embodiment, the gene profile is formed by a statistically significant number of 10 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 15 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 20 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 25 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 30 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 35 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 40 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 45 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 50 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 60 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 65 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 70 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 75 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 80 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 85 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 90 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 95 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 100 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 200 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 300 or more genes. In another embodiment, the gene profile is formed by a statistically significant number of 350 or more genes. In still another embodiment, the gene profile is formed by 400 or more genes. In still another embodiment, the gene profile is formed by 539 genes. In still another embodiment, the gene profile is formed by 559 genes. In still other embodiments, the gene profiles examined as part of these methods contain, as statistically significant numbers of genes, from 10 to 559 genes, and any numbers therebetween. In another embodiment, the gene profile is formed by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, or all 559 genes of Table I. In another embodiment, the gene profile is formed by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or all 100 genes of Table II.

Table I and Table II below refer to a collection of known genes useful in discriminating between a subject having a lung cancer, e.g., NSCLC, and subjects having benign (non-malignant) lung nodules. The sequences of the genes identified in Table I and Table II are publicly available. One skilled in the art may readily reproduce the compositions and methods described herein by use of the sequences of the genes, all of which are publicly available from conventional sources, such as GenBank. The GenBank accession number for each gene is provided.

The term “microarray” refers to an ordered arrangement of hybridizable array elements, preferably polynucleotide or oligonucleotide probes, on a substrate.

The term “polynucleotide,” when used in singular or plural form, generally refers to any polyribonucleotide or polydeoxribonucleotide, which may be unmodified RNA or DNA or modified RNA or DNA. Thus, for instance, polynucleotides as defined herein include, without limitation, single- and double-stranded DNA, DNA including single- and double-stranded regions, single- and double-stranded RNA, and RNA including single- and double-stranded regions, hybrid molecules comprising DNA and RNA that may be single-stranded or, more typically, double-stranded or include single- and double-stranded regions. In addition, the term “polynucleotide” as used herein refers to triple-stranded regions comprising RNA or DNA or both RNA and DNA. The strands in such regions may be from the same molecule or from different molecules. The regions may include all of one or more of the molecules, but more typically involve only a region of some of the molecules. One of the molecules of a triple-helical region often is an oligonucleotide. The term “polynucleotide” specifically includes cDNAs. The term includes DNAs (including cDNAs) and RNAs that contain one or more modified bases. Thus, DNAs or RNAs with backbones modified for stability or for other reasons are “polynucleotides” as that term is intended herein. Moreover, DNAs or RNAs comprising unusual bases, such as inosine, or modified bases, such as tritiated bases, are included within the term “polynucleotides” as defined herein. In general, the term “polynucleotide” embraces all chemically, enzymatically and/or metabolically modified forms of unmodified polynucleotides, as well as the chemical forms of DNA and RNA characteristic of viruses and cells, including simple and complex cells.

The term “oligonucleotide” refers to a relatively short polynucleotide, including, without limitation, single-stranded deoxyribonucleotides, single- or double-stranded ribonucleotides, RNA:DNA hybrids and double-stranded DNAs. Oligonucleotides, such as single-stranded DNA probe oligonucleotides, are often synthesized by chemical methods, for example using automated oligonucleotide synthesizers that are commercially available. However, oligonucleotides can be made by a variety of other methods, including in vitro recombinant DNA-mediated techniques and by expression of DNAs in cells and organisms.

The terms “differentially expressed gene”, “differential gene expression” and their synonyms, which are used interchangeably, refer to a gene whose expression is activated to a higher or lower level in a subject suffering from a disease, specifically cancer, such as lung cancer, relative to its expression in a control subject, such as a subject having a benign nodule. The terms also include genes whose expression is activated to a higher or lower level at different stages of the same disease. It is also understood that a differentially expressed gene may be either activated or inhibited at the nucleic acid level or protein level, or may be subject to alternative splicing to result in a different polypeptide product. Such differences may be evidenced by a change in mRNA levels, surface expression, secretion or other partitioning of a polypeptide, for example. Differential gene expression may include a comparison of expression between two or more genes or their gene products, or a comparison of the ratios of the expression between two or more genes or their gene products, or even a comparison of two differently processed products of the same gene, which differ between normal subjects, non-health controls and subjects suffering from a disease, specifically cancer, or between various stages of the same disease. Differential expression includes both quantitative, as well as qualitative, differences in the temporal or cellular expression pattern in a gene or its expression products among, for example, normal and diseased cells, or among cells which have undergone different disease events or disease stages. For the purpose of this invention, “differential gene expression” is considered to be present when there is a statistically significant (p<0.05) difference in gene expression between the subject and control samples.

The term “over-expression” with regard to an RNA transcript is used to refer to the level of the transcript determined by normalization to the level of reference mRNAs, which might be all measured transcripts in the specimen or a particular reference set of mRNAs.

The phrase “gene amplification” refers to a process by which multiple copies of a gene or gene fragment are formed in a particular cell or cell line. The duplicated region (a stretch of amplified DNA) is often referred to as “amplicon.” Usually, the amount of the messenger RNA (mRNA) produced, i.e., the level of gene expression, also increases in the proportion of the number of copies made of the particular gene expressed.

In the context of the compositions and methods described herein, reference to “10 or more”, “at least 10” etc. of the genes listed in Table I or Table II means any one or any and all combinations of the genes listed. For example, suitable gene expression profiles include profiles containing any number between at least 5 through 559 genes from Table I. In another example, suitable gene expression profiles include profiles containing any number between at least 5 through 100 genes from Table II. In one embodiment, gene profiles formed by genes selected from a table are used in rank order, e.g., genes ranked in the top of the list demonstrated more significant discriminatory results in the tests, and thus may be more significant in a profile than lower ranked genes. However, in other embodiments the genes forming a useful gene profile do not have to be in rank order and may be any gene from the table. As used herein, the term “100 Classifier” or “100 Biomarker Classifier” refers to the 100 genes of Table II. As used herein, the term “559 Classifier” or “559 Biomarker Classifier” refers to the 559 genes of Table I. However, subsets of the genes of Table I or Table II, as described herein, are also useful, and, in another embodiment, the terms may refer to those subsets as well.

As used herein, “labels” or “reporter molecules” are chemical or biochemical moieties useful for labeling a nucleic acid (including a single nucleotide), polynucleotide, oligonucleotide, or protein ligand, e.g., amino acid or antibody. “Labels” and “reporter molecules” include fluorescent agents, chemiluminescent agents, chromogenic agents, quenching agents, radionucleotides, enzymes, substrates, cofactors, inhibitors, magnetic particles, and other moieties known in the art. “Labels” or “reporter molecules” are capable of generating a measurable signal and may be covalently or noncovalently joined or bound to an oligonucleotide or nucleotide (e.g., a non-natural nucleotide) or ligand.

Unless defined otherwise in this specification, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs and by reference to published texts, which provide one skilled in the art with a general guide to many of the terms used in the present application.

I. GENE EXPRESSION PROFILES

The inventors have shown that the gene expression profiles of the whole blood of lung cancer patients differ significantly from those seen in patients having non-cancerous lung nodules. For example, changes in the gene expression products of the genes of Table I and/or Table II can be observed and detected by the methods of this invention in the normal circulating blood of patients with early stage solid lung tumors.

The gene expression profiles described herein provide new diagnostic markers for the early detection of lung cancer and could prevent patients from undergoing unnecessary procedures relating to surgery or biopsy for a benign nodule. Since the risks are very low, the benefit to risk ratio is very high. In one embodiment, the methods and compositions described herein may be used in conjunction with clinical risk factors to help physicians make more accurate decisions about how to manage patients with lung nodules. Another advantage of this invention is that diagnosis may occur early since diagnosis is not dependent upon detecting circulating tumor cells which are present in only vanishing small numbers in early stage lung cancers.

In one aspect, a composition is provided for classifying a nodule as cancerous or benign in a mammalian subject. In one embodiment, the composition includes at least 10 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In another embodiment, the composition includes at least 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the polynucleotide or oligonucleotide or ligand hybridizes to an mRNA.

TABLE I

Rank
Sequence ID#
Gene
Class Name

1
PLEKHG4
NM_015432.3
Endogenous

2
SLC25A20
NM_000387.5
Endogenous

3
LETM2
NM_144652.3
Endogenous

4
GLIS3
NM_001042413.1
Endogenous

5
LOC100132797
XR_036994.1
Endogenous

6
ARHGEF5
NM_005435.3
Endogenous

7
TCF7L2
NM_030756.4
Endogenous

8
SFRS2IP
NM_004719.2
Endogenous

9
CFD
NM_001928.2
Endogenous

10
AZI2
NM_022461.4
Endogenous

11
STOM
NM_004099.5
Endogenous

12
CD1A
NM_001763.2
Endogenous

13
PANK2
NM_153640.2
Endogenous

14
CNIH4
NM_014184.3
Endogenous

15
EVI2A
NM_014210.3
Endogenous

16
BATF
NM_006399.3
Endogenous

17
TCP1
NM_030752.2
Endogenous

18
BX108566
BX108566.1
Endogenous

19
ANXA1
NM_000700.2
Endogenous

20
PSMA3
NM_152132.2
Endogenous

21
IRF4
NM_002460.1
Endogenous

22
STAG3
NM_012447.3
Endogenous

23
NDUFS4
NM_002495.2
Endogenous

24
HAT1
NM_003642.3
Endogenous

25
ANXA1 b
NM_000700.1
Endogenous

26
LOC148137
NM_144692.1
Endogenous

27
LDHA
NM_001165416.1
Endogenous

28
PSME3
NM_005789.3
Endogenous

29
REPS1
NM_001128617.2
Endogenous

30
CDH5
NM_001795.3
Endogenous

31
NAT5
NM_181528.3
Endogenous

32
PLAC8
NM_001130715.1
Endogenous

33
GSTO1
NM_004832.2
Endogenous

34
DGUOK
NM_080916.2
Endogenous

35
OLR1
NM_002543.3
Endogenous

36
MYST4
NM_012330.3
Endogenous

37
TIMM8B
ENST00000504148.1
Endogenous

38
LY96
NM_015364.4
Endogenous

39
CCDC72
NM_015933.4
Endogenous

40
ATP5I
NM_007100.2
Endogenous

41
WDR91
NM_014149.3
Endogenous

42
MAGEA3
NM_005362.3
Endogenous

43
AK093878
AK093878.1
Endogenous

44
EYA3
NM_001990.3
Endogenous

45
ACAA2
NM_006111.2
Endogenous

46
ETFDH
NM_004453.3
Endogenous

47
CCT6A
NM_001762.3
Endogenous

48
HSCB
NM_172002.3
Endogenous

49
EMR4
NM_001080498.2
Endogenous

50
USP5
NM_003481.2
Endogenous

51
SIK1
NM_173354.3
Endogenous

52
SYNJ1
NM_003895.3
Endogenous

53
KLRB1
NM_002258.2
Endogenous

54
CLK2
XM_941392.1
Endogenous

55
SNORA56
NR_002984.1
Endogenous

56
TP53BP1
NM_005657.2
Endogenous

57
RBX1
NM_014248.3
Endogenous

58
CNPY2
NM_014255.5
Endogenous

59
RELA
NM_021975.2
Endogenous

60
LOC732371
XM_001133019.1
Endogenous

61
TMEM218
NM_001080546.2
Endogenous

62
LOC91431
NM_001099776.1
Endogenous

63
GZMB
NM_004131.3
Endogenous

64
CAMP
NM_004345.4
Endogenous

65
RBM16
NM_014892.4
Endogenous

66
MID1IP1
NM_021242.5
Endogenous

67
LOC399942
XM_934471.1
Endogenous

68
COMMD6
NM_203497.3
Endogenous

69
PPP6C
NM_002721.4
Endogenous

70
BCOR
NM_017745.5
Endogenous

71
PDCD10
NM_145859.1
Endogenous

72
HLA-DMB
NM_002118.3
Endogenous

73
DNAJB1
NM_006145.2
Endogenous

74
KYNU
NM_001032998.1
Endogenous

75
TM2D2
NM_078473.2
Endogenous

76
FAM179A
NM_199280.2
Endogenous

77
FAM43A
NM_153690.4
Endogenous

78
QTRTD1
NM_024638.3
Endogenous

79
MARCKSL1
NM_023009.5
Endogenous

80
FAM193A
NM_003704.3
Endogenous

81
AK026725
AK026725.1
Endogenous

82
SERPINB10
NM_005024.1
Endogenous

83
OSBP
ILMN_1706376.1
Endogenous

84
ST6GAL1
NM_003032.2
Endogenous

85
NDUFAF2
NM_174889.4
Endogenous

86
UBE2I
NM_194259.2
Endogenous

87
CTAG1B
NM_001327.2
Endogenous

88
TRAF6
NM_145803.1
Endogenous

89
REPIN1
NM_014374.3
Endogenous

90
LAMA5
NM_005560.4
Endogenous

91
TBC1D12
NM_015188.1
Endogenous

92
TGIF1 b
NM_173208.1
Endogenous

93
LOC728533
XR_015610.3
Endogenous

94
CLN8
NM_018941.3
Endogenous

95
COX7B
NM_001866.2
Endogenous

96
DYNC2LI1
NM_016008.3
Endogenous

97
ANP32B
NM_006401.2
Endogenous

98
PTGDR2
NM_004778.1
Endogenous

99
MRPS16
NM_016065.3
Endogenous

100
NIPBL
NM_133433.3
Endogenous

101
PPP2R5C
NM_178588.1
Endogenous

102
DPF2
NM_006268.4
Endogenous

103
RAB10
NM_016131.4
Endogenous

104
MYADM
NM_001020820.1
Endogenous

105
CCND3
NM_001760.2
Endogenous

106
CC2D1B
NM_032449.2
Endogenous

107
HLA-G
NM_002127.4
Endogenous

108
CKS2
NM_001827.1
Endogenous

109
HPSE
NM_006665.5
Endogenous

110
UBE2G1
NM_003342.4
Endogenous

111
MED16
NM_005481.2
Endogenous

112
LOC339674
XM_934917.1
Endogenous

113
RNF114
NM_018683.3
Endogenous

114
KIR2DS3
NM_012313.1
Endogenous

115
AMD1
NM_001634.4
Endogenous

116
S100A8
NM_002964.4
Endogenous

117
NFATC4
NM_001136022.2
Endogenous

118
RPL39L
NM_052969.1
Endogenous

119
LOC399753
XM_930634.1
Endogenous

120
FKBP1A
NM_054014.3
Endogenous

121
CHMP5
NM_016410.5
Endogenous

122
CABC1
NM_020247.4
Endogenous

123
HLA-B
NM_005514.6
Endogenous

124
TRIM39
NM_021253.3
Endogenous

125
LOC645914
XM_928884.1
Endogenous

126
CD79A
NM_021601.3
Endogenous

127
GLRX
ILMN_1737308.1
Endogenous

128
RPL26L1
NM_016093.2
Endogenous

129
USP21
NM_012475.4
Endogenous

130
CD70
NM_001252.2
Endogenous

131
SPINK5
NM_006846.3
Endogenous

132
HUWE1
NM_031407.6
Endogenous

133
STK38
NM_007271.3
Endogenous

134
SEMG1
NM_003007.2
Endogenous

135
NDUFA4
NM_002489.3
Endogenous

136
MYADM b
NM_001020820.1
Endogenous

137
SGK1 b
NM_005627.3
Endogenous

138
SLAMF8
NM_020125.2
Endogenous

139
LOC653773
XM_938755.1
Endogenous

140
RPS24
NM_001026.4
Endogenous

141
LOC338799
NR_002809.2
Endogenous

142
MAP3K7
NM_145333.1
Endogenous

143
KLRD1
NM_002262.3
Endogenous

144
LOC732111
XM_001134275.1
Endogenous

145
CD69
NM_001781.2
Endogenous

146
DDIT4
NM_019058.2
Endogenous

147
C1orf222
NM_001003808.1
Endogenous

148
PFAS
NM_012393.2
Endogenous

149
USP9Y
NM_004654.3
Endogenous

150
COLEC12
NM_130386.2
Endogenous

151
VPS37C
NM_017966.4
Endogenous

152
SAP130
NM_024545.3
Endogenous

153
CDC42EP2
NM_006779.3
Endogenous

154
LOC643319
XM_927980.1
Endogenous

155
ASF1B
NM_018154.2
Endogenous

156
AK094576
AK094576.1
Endogenous

157
BANP
NM_079837.2
Endogenous

158
TBK1
NM_013254.2
Endogenous

159
GNS
NM_002076.3
Endogenous

160
IL1R2
NM_173343.1
Endogenous

161
CLEC4C
NM_203503.1
Endogenous

162
TM9SF1
NM_006405.6
Endogenous

163
PTGDR
NM_000953.2
Endogenous

164
GOLGA3
NM_005895.3
Endogenous

165
CLEC4A
NM_194448.2
Endogenous

166
TSC1
NM_000368.4
Endogenous

167
SFMBT1
NM_001005158.2
Endogenous

168
GLT25D1
NM_024656.2
Endogenous

169
LOC100130229
XM_001717158.1
Endogenous

170
PHF8
NM_015107.2
Endogenous

171
PUM1
NM_001020658.1
Endogenous

172
SMARCC1
NM_003074.3
Endogenous

173
AK126342
AK126342.1
Endogenous

174
ACSL5
NM_203379.1
Endogenous

175
TGIF1
NM_003244.2
Endogenous

176
BF375676
BF375676.1
Endogenous

177
SPA17
NM_017425.3
Endogenous

178
FLNB
NM_001457.3
Endogenous

179
FAM105B
NM_138348.4
Endogenous

180
CPPED1
NM_018340.2
Endogenous

181
TRIM32
NM_012210.3
Endogenous

182
RNF34
NM_025126.3
Endogenous

183
SLC45A3
NM_033102.2
Endogenous

184
P2RY10
NM_198333.1
Endogenous

185
AKR1C3
NM_003739.4
Endogenous

186
NME1-NME2
NM_001018136.2
Endogenous

187
AMPD3
NM_000480.2
Endogenous

188
HSP90AB1
NM_007355.3
Endogenous

189
RBM4B
NM_031492.3
Endogenous

190
DMBT1
NM_007329.2
Endogenous

191
TMCO1
NM_019026.3
Endogenous

192
CASP2
NM_032983.3
Endogenous

193
C1orf103
NM_018372.3
Endogenous

194
ARHGAP17
NM_018054.5
Endogenous

195
IFNA17
NM_021268.2
Endogenous

196
CTSZ
NM_001336.3
Endogenous

197
DBI
NM_001079862.1
Endogenous

198
TXNRD1 b
NM_182743.2
Endogenous

199
KIAA0460
NM_015203.4
Endogenous

200
PDGFD
NM_033135.3
Endogenous

201
ATG5
NM_004849.2
Endogenous

202
ITFG2
NM_018463.3
Endogenous

203
HERC1
NM_003922.3
Endogenous

204
MEN1
NM_130799.2
Endogenous

205
IFI27L2
NM_032036.2
Endogenous

206
LOC729887
XR_040891.2
Endogenous

207
PI4K2A
NM_018425.3
Endogenous

208
RAG1
NM_000448.2
Endogenous

209
CREB5
NM_182898.3
Endogenous

210
SLC6A12
NM_003044.4
Endogenous

211
CDKN1A
NM_000389.2
Endogenous

212
AW173314
AW173314.1
Endogenous

213
SAP130 b
NM_024545.3
Endogenous

214
ABCA5
NM_018672.4
Endogenous

215
SLC25A37
NM_016612.2
Endogenous

216
MYLIP
NM_013262.3
Endogenous

217
GATA2
NM_001145662.1
Endogenous

218
ATP5L
NM_006476.4
Endogenous

219
RPS27L
NM_015920.3
Endogenous

220
DB338252
DB338252.1
Endogenous

221
FRAT2
NM_012083.2
Endogenous

222
CCL4
NM_002984.2
Endogenous

223
CD79B
NM_000626.2
Endogenous

224
MBD1
NM_015844.2
Endogenous

225
TIAM1
NM_003253.2
Endogenous

226
HSD11B1
NM_181755.1
Endogenous

227
TPR
NM_003292.2
Endogenous

228
EID2B
NM_152361.2
Endogenous

229
PDSS1
NM_014317.3
Endogenous

230
C9orf164
NM_182635.1
Endogenous

231
ARHGEF18
NM_015318.3
Endogenous

232
TXNRD1
NM_001093771.2
Endogenous

233
HNRNPAB
NM_004499.3
Endogenous

234
TTN
NM_133378.4
Endogenous

235
EP300
NM_001429.2
Endogenous

236
CCDC97
NM_052848.1
Endogenous

237
HK3
NM_002115.2
Endogenous

238
CRKL
NM_005207.3
Endogenous

239
NCOA5
NM_020967.2
Endogenous

240
AK124143
AK124143.1
Endogenous

241
LBA1
NM_014831.2
Endogenous

242
SLC9A3R1
NM_004252.3
Endogenous

243
CRY2
NM_021117.3
Endogenous

244
ATG4B
NM_178326.2
Endogenous

245
CD97
NM_078481.3
Endogenous

246
TTC9
NM_015351.1
Endogenous

247
BMPR2
NM_001204.6
Endogenous

248
LPIN2
NM_014646.2
Endogenous

249
UBA1
NM_003334.3
Endogenous

250
SETD1B
XM_037523.11
Endogenous

251
PRPF8
NM_006445.3
Endogenous

252
RNASE2
NM_002934.2
Endogenous

253
KIAA0101
NM_014736.4
Endogenous

254
ARG1
NM_000045.3
Endogenous

255
UBTF
NM_001076683.1
Endogenous

256
MFSD1
NM_022736.2
Endogenous

257
IDO1
NM_002164.3
Endogenous

258
MS4A6A
NM_022349.3
Endogenous

259
C22orf30
NM_173566.2
Endogenous

260
HNRNPK
NM_031263.2
Endogenous

261
ARL8B
NM_018184.2
Endogenous

262
SETD2
NM_014159.6
Endogenous

263
NCAPG
NM_022346.4
Endogenous

264
EEF1B2
NM_001037663.1
Endogenous

265
TRIM39 b
NM_172016.2
Endogenous

266
EHD4
NM_139265.3
Endogenous

267
IRF1
NM_002198.1
Endogenous

268
LOC100129022
XM_001716591.1
Endogenous

269
TRAF3IP2
NM_147686.3
Endogenous

270
PSMA6
NM_002791.2
Endogenous

271
RHOG
NM_001665.3
Endogenous

272
CN312986
CN312986.1
Endogenous

273
PSMB8
NM_004159.4
Endogenous

274
ZNF239
NM_001099283.1
Endogenous

275
CLPTM1
NM_001294.3
Endogenous

276
NADK
NM_023018.4
Endogenous

277
C8orf76
NM_032847.2
Endogenous

278
LIF
NM_002309.3
Endogenous

279
EGR1
NM_001964.2
Endogenous

280
ARG1 b
NM_000045.2
Endogenous

281
MERTK
NM_006343.2
Endogenous

282
RHOU
NM_021205.5
Endogenous

283
PFDN5 b
NM_145897.2
Endogenous

284
MAGEA1
NM_004988.4
Endogenous

285
SEC24C
NM_198597.2
Endogenous

286
SLC11A1
NM_000578.3
Endogenous

287
TCF20
NM_181492.2
Endogenous

288
AHCYL1
NM_001242676.1
Endogenous

289
TPT1
NM_003295.3
Endogenous

290
KIR2DL5A
XM_001126354.1
Endogenous

291
IRAK2
NM_001570.3
Endogenous

292
C17orf51
XM_944416.1
Endogenous

293
C14orf156
NM_031210.5
Endogenous

294
ATP2C1
NM_014382.3
Endogenous

295
SOCS1
NM_003745.1
Endogenous

296
JAK1
NM_002227.1
Endogenous

297
RSL24D1
NM_016304.2
Endogenous

298
AP2S1
NM_021575.3
Endogenous

299
PHRF1
NM_020901.3
Endogenous

300
GPI
NM_000175.2
Endogenous

301
NCR1
NM_004829.5
Endogenous

302
AKAP4
NM_139289.1
Endogenous

303
CD160
NM_007053.3
Endogenous

304
DDX23
NM_004818.2
Endogenous

305
GNL3
NM_014366.4
Endogenous

306
NFKB2
NM_002502.2
Endogenous

307
CSK
NM_004383.2
Endogenous

308
PELP1
NM_014389.2
Endogenous

309
KLRF1 b
NM_016523.2
Endogenous

310
CS
NM_004077.2
Endogenous

311
PHCA
NM_018367.6
Endogenous

312
LOC644315
XR_017529.2
Endogenous

313
NUDT18
NM_024815.3
Endogenous

314
XCL2
NM_003175.3
Endogenous

315
KLRC1
NM_002259.3
Endogenous

316
ARHGAP18
NM_033515.2
Endogenous

317
CTDSP2
NM_005730.3
Endogenous

318
P2RY5
NM_005767.5
Endogenous

319
CREB1
NM_004379.3
Endogenous

320
RHOB
NM_004040.3
Endogenous

321
DCAF7
NM_005828.4
Endogenous

322
NUP153
NM_005124.3
Endogenous

323
AFTPH
NM_017657.4
Endogenous

324
EWSR1
NM_005243.3
Endogenous

325
LYN
NM_002350.1
Endogenous

326
CYBB
NM_000397.3
Endogenous

327
TMEM70
NM_017866.5
Endogenous

328
PPP1R3E
XM_927029.1
Endogenous

329
PSMB1
NM_002793.3
Endogenous

330
RERE b
NM_012102.3
Endogenous

331
RXRA
NM_002957.5
Endogenous

332
GZMA
NM_006144.3
Endogenous

333
ERLIN1
NM_006459.3
Endogenous

334
KRTAP10-3
NM_198696.2
Endogenous

335
SAMSN1
NM_022136.3
Endogenous

336
LRRC47
NM_020710.2
Endogenous

337
MARCKS
NM_002356.6
Endogenous

338
HOPX
NM_139211.4
Endogenous

339
KLRF1
NM_016523.1
Endogenous

340
NFAT5
NM_138713.3
Endogenous

341
SLC15A2
NM_021082.3
Endogenous

342
STK16
NM_003691.2
Endogenous

343
KIR_Activating_Subgroup_2
NM_014512.1
Endogenous

344
TBCE
NM_001079515.2
Endogenous

345
BAG3
NM_004281.3
Endogenous

346
SFRS4
NM_005626.4
Endogenous

347
AW270402
AW270402.1
Endogenous

348
CCL3L1
NM_021006.4
Endogenous

349
HERC3
NM_014606.2
Endogenous

350
RPL34
NM_000995.3
Endogenous

351
ALAS1
NM_000688.4
Endogenous

352
CCR9
NM_031200.1
Endogenous

353
CORO1C
ILMN_1745954.1
Endogenous

354
FAIM3
NM_005449.4
Endogenous

355
SFPQ
NM_005066.2
Endogenous

356
HOOK3
NM_032410.3
Endogenous

357
CD36
NM_000072.3
Endogenous

358
IL7
NM_000880.2
Endogenous

359
CBLL1
NM_024814.3
Endogenous

360
HVCN1
NM_032369.3
Endogenous

361
HMGB1
NM_002128.4
Endogenous

362
SIN3A
NM_015477.2
Endogenous

363
CASP3
NM_032991.2
Endogenous

364
BQ189294
BQ189294.1
Endogenous

365
NDRG2
NM_016250.2
Endogenous

366
BX400436
BX400436.2
Endogenous

367
IFNAR2
NM_000874.3
Endogenous

368
MS4A6A b
NM_152851.2
Endogenous

369
KLRC2
NM_002260.3
Endogenous

370
S100A12 b
NM_005621.1
Endogenous

371
ATM
NM_000051.3
Endogenous

372
NLRP3
NM_001079821.2
Endogenous

373
HAVCR2
NM_032782.3
Endogenous

374
C4B
NM_001002029.3
Endogenous

375
CTSW
NM_001335.3
Endogenous

376
TMEM170B
NM_001100829.2
Endogenous

377
EIF4ENIF1
NM_019843.2
Endogenous

378
CCL3
NM_002983.2
Endogenous

379
CHCHD3
NM_017812.2
Endogenous

380
CST7
NM_003650.3
Endogenous

381
SFRS15
NM_020706.2
Endogenous

382
STIP1
NM_006819.2
Endogenous

383
MPDU1
NM_004870.3
Endogenous

384
DHX16 b
NM_001164239.1
Endogenous

385
INTS4
NM_033547.3
Endogenous

386
USP16
NM_001032410.1
Endogenous

387
IFNAR1
NM_000629.2
Endogenous

388
ITCH
NM_001257138.1
Endogenous

389
FOXK2
NM_004514.3
Endogenous

390
LOC642812
XR_036892.1
Endogenous

391
KIAA1967
NM_021174.5
Endogenous

392
LOC440928
XM_942885.1
Endogenous

393
NDUFV2
NM_021074.4
Endogenous

394
IL4
NM_000589.2
Endogenous

395
CIAPIN1
NM_020313.3
Endogenous

396
CXCL2
NM_002089.3
Endogenous

397
TXN
NM_003329.3
Endogenous

398
PRG2
NM_002728.4
Endogenous

399
MS4A2
NM_000139.3
Endogenous

400
YPEL1
NM_013313.4
Endogenous

401
POLR2A
NM_000937.4
Endogenous

402
C19orf10
NM_019107.3
Endogenous

403
IGFBP7
NM_001553.2
Endogenous

404
ITGAE
NM_002208.4
Endogenous

405
CXCR5 b
NM_001716.3
Endogenous

406
BID
NM_001196.2
Endogenous

407
LOC100133273
XR_039238.1
Endogenous

408
FNBP1
NM_015033.2
Endogenous

409
IFNGR1
NM_000416.1
Endogenous

410
STAT6
NM_003153.4
Endogenous

411
CR2
NM_001006658.2
Endogenous

412
CCL3L3
NM_001001437.3
Endogenous

413
RFWD2
NM_022457.6
Endogenous

414
SP2
NM_003110.5
Endogenous

415
BAT2D1
NM_015172.3
Endogenous

416
CX3CL1
NM_002996.3
Endogenous

417
GPATCH3
NM_022078.2
Endogenous

418
CASP1
NM_033294.3
Endogenous

419
NAGK
NM_017567.4
Endogenous

420
IER5
NM_016545.4
Endogenous

421
PHLPP2
NM_015020.3
Endogenous

422
RPL31
NM_000993.4
Endogenous

423
SPEN
NM_015001.2
Endogenous

424
TMSB4X
NM_021109.3
Endogenous

425
IL8RB
NM_001557.3
Endogenous

426
XPC
NR_027299.1
Endogenous

427
SNX11
NM_152244.1
Endogenous

428
SPN
NM_003123.3
Endogenous

429
ANKHD1
NM_017747.2
Endogenous

430
CCR6
NM_031409.2
Endogenous

431
DZIP3
NM_014648.3
Endogenous

432
MRPL27
NM_148571.1
Endogenous

433
SREBF1
NM_001005291.2
Endogenous

434
CD14
NM_000591.2
Endogenous

435
TNFSF8
NM_001244.3
Endogenous

436
C3
NM_000064.2
Endogenous

437
FAM50B
NM_012135.1
Endogenous

438
RASSF5
NM_182664.2
Endogenous

439
BU743228
BU743228.1
Endogenous

440
NFATC1
NM_172389.1
Endogenous

441
DOCK5
NM_024940.6
Endogenous

442
PACS1
NM_018026.3
Endogenous

443
CYP1B1
NM_000104.3
Endogenous

444
CLIC3
ILMN_1796423.1
Endogenous

445
PSMA4
NM_002789.3
Endogenous

446
ZNF341
NM_032819.4
Endogenous

447
PRPF3
NM_004698.2
Endogenous

448
PSMA6 b
NM_002791.2
Endogenous

449
LOC648927
XR_038906.2
Endogenous

450
KCTD12
NM_138444.3
Endogenous

451
LOC440389
XM_498648.3
Endogenous

452
U2AF2
NM_007279.2
Endogenous

453
CLEC5A
NM_013252.2
Endogenous

454
PRRG4
NM_024081.5
Endogenous

455
TNFRSF9
NM_001561.5
Endogenous

456
NDUFB3
NM_002491.2
Endogenous

457
BCL6
NM_001130845.1
Endogenous

458
SGK1
NM_005627.3
Endogenous

459
CIP29
NM_033082.3
Endogenous

460
CD160 b
NM_007053.2
Endogenous

461
ARCN1
NM_001655.4
Endogenous

462
LOC151162
NR_024275.1
Endogenous

463
GPR65
NM_003608.3
Endogenous

464
CCR1
NM_001295.2
Endogenous

465
TFCP2
NM_005653.4
Endogenous

466
SGK
NM_005627.3
Endogenous

467
RNF214
NM_207343.3
Endogenous

468
TMC8
NM_152468.4
Endogenous

469
RBM14
NM_006328.3
Endogenous

470
USP34
NM_014709.3
Endogenous

471
BACH2
NM_021813.3
Endogenous

472
LILRA5
NM_021250.3
Endogenous

473
C5orf21
NM_032042.5
Endogenous

474
LOC441073
XR_018937.2
Endogenous

475
TAX1BP1
NM_001079864.2
Endogenous

476
TNFSF13
NM_003808.3
Endogenous

477
PIM2
NM_006875.3
Endogenous

478
RNF19B
NM_153341.3
Endogenous

479
EPHX2
NM_001979.5
Endogenous

480
LILRA5 b
NM_181879.2
Endogenous

481
ABCF1
NM_001025091.1
Endogenous

482
C4orf27
NM_017867.2
Endogenous

483
PSMB7
NM_002799.2
Endogenous

484
LPCAT4
NM_153613.2
Endogenous

485
TRIM21
NM_003141.3
Endogenous

486
LOC728835
XM_001133190.1
Endogenous

487
NFKB1
NM_003998.3
Endogenous

488
CR2 b
NM_001006658.1
Endogenous

489
HMGB2
NM_002129.3
Endogenous

490
IL1B
NM_000576.2
Endogenous

491
C20orf52
NM_080748.2
Endogenous

492
DNAJB6
NM_058246.3
Endogenous

493
PFDN5
NM_145897.2
Endogenous

494
RPS6
NM_001010.2
Endogenous

495
LEF1
NM_016269.4
Endogenous

496
DKFZp761P0423
XM_291277.4
Endogenous

497
LOC647340
XR_018104.1
Endogenous

498
FTHL16
XR_041433.1
Endogenous

499
COX6C
NM_004374.2
Endogenous

500
BCL10
NM_003921.2
Endogenous

501
CD48
NM_001778.2
Endogenous

502
ZMIZ1
NM_020338.3
Endogenous

503
GZMH
NM_033423.4
Endogenous

504
TRRAP
NM_003496.3
Endogenous

505
SH2D3C
NM_170600.2
Endogenous

506
UBC
NM_021009.3
Endogenous

507
TXNDC17
NM_032731.3
Endogenous

508
ATP5J2
NM_004889.3
Endogenous

509
KIAA1267
NM_015443.3
Endogenous

510
RFX1
NM_002918.4
Endogenous

511
WDR1
NM_005112.4
Endogenous

512
LOC100129697
XM_001732822.2
Endogenous

513
TOMM7
NM_019059.2
Endogenous

514
ARHGAP26
NM_015071.4
Endogenous

515
HSPA6
NM_002155.4
Endogenous

516
FLJ10357
NM_018071.4
Endogenous

517
ITGAL
NM_002209.2
Endogenous

518
BX089765
BX089765.1
Endogenous

519
RERE
NM_001042682.1
Endogenous

520
C15orf39
NM_015492.4
Endogenous

521
BX436458
BX436458.2
Endogenous

522
RWDD1
NM_001007464.2
Endogenous

523
TMBIM6
NM_003217.2
Endogenous

524
SLC6A6
NM_003043.5
Endogenous

525
KIAA0174
NM_014761.3
Endogenous

526
IL16
NM_004513.4
Endogenous

527
EGLN1
NM_022051.1
Endogenous

528
LOC391126
XR_017684.2
Endogenous

529
TAPBP
NM_003190.4
Endogenous

530
NUMB
NM_001005744.1
Endogenous

531
CENTD2
NM_001040118.2
Endogenous

532
CLSTN1
NM_001009566.2
Endogenous

533
PSMA4 b
NM_002789.4
Endogenous

534
LOC648000
XM_371757.4
Endogenous

535
COX7C
NM_001867.2
Endogenous

536
PIK3CD
NM_005026.3
Endogenous

537
UQCRQ
NM_014402.4
Endogenous

538
IDS
NM_006123.4
Endogenous

539
C19orf59
NM_174918.2
Endogenous

540
MYL12A
NM_006471.3
Housekeeping

541
EIF2B4
NM_015636.3
Housekeeping

542
DGUOK b
NM_080916.2
Housekeeping

543
PSMC1
NM_002802.2
Housekeeping

544
CHFR
NM_018223.2
Housekeeping

545
ARPC2
NM_005731.2
Housekeeping

546
ATP5B
NM_001686.3
Housekeeping

547
RPL3
NM_001033853.1
Housekeeping

548
ZNF143
NM_003442.5
Housekeeping

549
PSMD7
NM_002811.4
Housekeeping

550
TBP
NM_003194.4
Housekeeping

551
DHX16
NM_003587.4
Housekeeping

552
TUG1
NR_002323.2
Housekeeping

553
GUSB
NM_000181.3
Housekeeping

554
HDAC3
NM_003883.3
Housekeeping

555
SDHA
NM_004168.3
Housekeeping

556
PGK1
NM_000291.3
Housekeeping

557
STAMBP
NM_006463.4
Housekeeping

558
MTCH1
NM_014341.2
Housekeeping

559
TUBB
NM_178014.2
Housekeeping

TABLE II

Rank
Sequence ID#
Gene
Class Name

1
TPR
NM_003292.2
Endogenous

2
DNAJB1
NM_006145.2
Endogenous

3
PDCD10
NM_145859.1
Endogenous

4
PSMB7
NM_002799.2
Endogenous

5
MERTK
NM_006343.2
Endogenous

6
AFTPH
NM_017657.4
Endogenous

7
BCOR
NM_017745.5
Endogenous

8
RASSF5
NM_182664.2
Endogenous

9
SNX11
NM_152244.1
Endogenous

10
ANP32B
NM_006401.2
Endogenous

11
C4B
NM_001002029.3
Endogenous

12
NME1-NME2
NM_001018136.2
Endogenous

13
DGUOK
NM_080916.2
Endogenous

14
CYP1B1
NM_000104.3
Endogenous

15
MPDU1
NM_004870.3
Endogenous

16
MED16
NM_005481.2
Endogenous

17
FAM179A
NM_199280.2
Endogenous

18
CPPED1
NM_018340.2
Endogenous

19
LOC648927
XR_038906.2
Endogenous

20
ANKHD1
NM_017747.2
Endogenous

21
CN312986
CN312986.1
Endogenous

22
PHCA
NM_018367.6
Endogenous

23
CD1A
NM_001763.2
Endogenous

24
NCOA5
NM_020967.2
Endogenous

25
SLC6A12
NM_003044.4
Endogenous

26
LOC728533
XR_015610.3
Endogenous

27
TRAF3IP2
NM_147686.3
Endogenous

28
TBCE
NM_001079515.2
Endogenous

29
CCT6A
NM_001762.3
Endogenous

30
P2RY5
NM_005767.5
Endogenous

31
RNASE2
NM_002934.2
Endogenous

32
CLN8
NM_018941.3
Endogenous

33
REPS1
NM_001128617.2
Endogenous

34
TPT1
NM_003295.3
Endogenous

35
LOC100129022
XM_001716591.1
Endogenous

36
KLRC1
NM_002259.3
Endogenous

37
AZI2
NM_022461.4
Endogenous

38
FAM193A
NM_003704.3
Endogenous

39
PLAC8
NM_001130715.1
Endogenous

40
LDHA
NM_001165416.1
Endogenous

41
GPATCH3
NM_022078.2
Endogenous

42
RBM14
NM_006328.3
Endogenous

43
KYNU
NM_001032998.1
Endogenous

44
PPP2R5C
NM_178588.1
Endogenous

45
S100A12 b
NM_005621.1
Endogenous

46
SFMBT1
NM_001005158.2
Endogenous

47
CCR6
NM_031409.2
Endogenous

48
TRIM39
NM_021253.3
Endogenous

49
AK126342
AK126342.1
Endogenous

50
SLC45A3
NM_033102.2
Endogenous

51
IL4
NM_000589.2
Endogenous

52
UBE2I
NM_194259.2
Endogenous

53
PRPF3
NM_004698.2
Endogenous

54
NDUFB3
NM_002491.2
Endogenous

55
CRKL
NM_005207.3
Endogenous

56
IDO1
NM_002164.3
Endogenous

57
PUM1
NM_001020658.1
Endogenous

58
BCL10
NM_003921.2
Endogenous

59
TMBIM6
NM_003217.2
Endogenous

60
C17orf51
XM_944416.1
Endogenous

61
BANP
NM_079837.2
Endogenous

62
HAVCR2
NM_032782.3
Endogenous

63
BAG3
NM_004281.3
Endogenous

64
DBI
NM_001079862.1
Endogenous

65
C4orf27
NM_017867.2
Endogenous

66
TSC1
NM_000368.4
Endogenous

67
LPCAT4
NM_153613.2
Endogenous

68
SAMSN1
NM_022136.3
Endogenous

69
SNORA56
NR_002984.1
Endogenous

70
ARG1
NM_000045.3
Endogenous

71
IL1R2
NM_173343.1
Endogenous

72
CCND3
NM_001760.2
Endogenous

73
USP9Y
NM_004654.3
Endogenous

74
ATP2C1
NM_014382.3
Endogenous

75
PSMB1
NM_002793.3
Endogenous

76
NDUFAF2
NM_174889.4
Endogenous

77
VPS37C
NM_017966.4
Endogenous

78
HAT1
NM_003642.3
Endogenous

79
LOC732371
XM_001133019.1
Endogenous

80
LOC148137
NM_144692.1
Endogenous

81
CCR1
NM_001295.2
Endogenous

82
CCDC97
NM_052848.1
Endogenous

83
PPP6C
NM_002721.4
Endogenous

84
GPI
NM_000175.2
Endogenous

85
PIM2
NM_006875.3
Endogenous

86
STAT6
NM_003153.4
Endogenous

87
BATF
NM_006399.3
Endogenous

88
EIF4ENIF1
NM_019843.2
Endogenous

89
HSP90AB1
NM_007355.3
Endogenous

90
U2AF2
NM_007279.2
Endogenous

91
CYBB
NM_000397.3
Endogenous

92
WDR1
NM_005112.4
Endogenous

93
PSMB8
NM_004159.4
Endogenous

94
TBC1D12
NM_015188.1
Endogenous

95
LOC648000
XM_371757.4
Endogenous

96
XCL2
NM_003175.3
Endogenous

97
PTGDR
NM_000953.2
Endogenous

98
ACSL5
NM_203379.1
Endogenous

99
CASP1
NM_033294.3
Endogenous

100
UBTF
NM_001076683.1
Endogenous

In one embodiment, a novel gene expression profile or signature can identify and distinguish patients having cancerous tumors from patients having benign nodules. See for example the genes identified in Table I and Table II which may form a suitable gene expression profile. In another embodiment, a portion of the genes of Table I form a suitable profile. In yet another embodiment, a portion of the genes of Table II form a suitable profile. As discussed herein, these profiles are used to distinguish between cancerous and non-cancerous tumors by generating a discriminant score based on differences in gene expression profiles as exemplified below. The validity of these signatures was established on samples collected at different locations by different groups in a cohort of patients with undiagnosed lung nodules. See Example 7 and FIGS. 2A-2B and FIG. 6. The lung cancer signatures or gene expression profiles identified herein (i.e., Table I or Table II) may be further optimized to reduce the numbers of gene expression products necessary and increase accuracy of diagnosis.

In one embodiment, the composition includes 10 to 559 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In another embodiment, the composition includes 10 to 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table II. In another embodiment, the composition includes 10 to 559 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In another embodiment, the composition includes 10 to 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table II. In another embodiment, the composition includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, or 559 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In another embodiment, the composition includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table II. In one embodiment, the composition includes at least 3 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 5 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 10 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 15 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 20 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 25 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 30 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 35 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 40 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 45 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 50 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 55 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 60 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 65 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 70 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 75 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 80 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 85 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 90 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 95 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 100 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I or Table II. In one embodiment, the composition includes at least 150 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 200 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 250 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 300 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 350 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 400 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 450 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes at least 500 polynucleotides or oligonucleotides or ligands, wherein each polynucleotide or oligonucleotide or ligand hybridizes to a different gene, gene fragment, gene transcript or expression product in a sample selected from the genes of Table I. In one embodiment, the composition includes polynucleotides or oligonucleotides or ligands capable of hybridizing to each different gene, gene fragment, gene transcript or expression product listed in Table I. In another embodiment, the composition includes polynucleotides or oligonucleotides or ligands capable of hybridizing to each different gene, gene fragment, gene transcript or expression product listed in Table II.

In yet another embodiment, the expression profile is formed by the first 3 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 5 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 10 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 15 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 20 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 25 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 30 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 35 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 40 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 45 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 50 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 55 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 60 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 65 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 70 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 75 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 80 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 85 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 90 genes in rank order of Table I or Table II. In yet another embodiment, the expression profile is formed by the first 95 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 100 genes in rank order of Table I or Table II. In another embodiment, the expression profile is formed by the first 150 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 200 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 250 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 300 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 350 genes in rank order of Table I. In another embodiment, the expression profile is formed by the first 400 genes in rank order of Table I. In yet another embodiment, the expression profile is formed by the first 539 genes in rank order of Table I.

As discussed below, the compositions described herein can be used with the gene expression profiling methods which are known in the art. Thus, the compositions can be adapted accordingly to suit the method for which they are intended to be used. In one embodiment, at least one polynucleotide or oligonucleotide or ligand is attached to a detectable label. In certain embodiments, each polynucleotide or oligonucleotide is attached to a different detectable label, each capable of being detected independently. Such reagents are useful in assays such as the nCounter, as described below.

In another embodiment, the composition comprises a capture oligonucleotide or ligand, which hybridizes to at least one polynucleotide or oligonucleotide or ligand. In one embodiment, such capture oligonucleotide or ligand may include a nucleic acid sequence which is specific for a portion of the oligonucleotide or polynucleotide or ligand which is specific for the gene of interest. The capture ligand may be a peptide or polypeptide which is specific for the ligand to the gene of interest. In one embodiment, the capture ligand is an antibody, as in a sandwich ELISA.

The capture oligonucleotide also includes a moiety which allows for binding with a substrate. Such substrate includes, without limitation, a plate, bead, slide, well, chip or chamber. In one embodiment, the composition includes a capture oligonucleotide for each different polynucleotide or oligonucleotide which is specific to a gene of interest. Each capture oligonucleotide may contain the same moiety which allows for binding with the same substrate. In one embodiment, the binding moiety is biotin.

Thus, a composition for such diagnosis or evaluation in a mammalian subject as described herein can be a kit or a reagent. For example, one embodiment of a composition includes a substrate upon which the ligands used to detect and quantitate mRNA are immobilized. The reagent, in one embodiment, is an amplification nucleic acid primer (such as an RNA primer) or primer pair that amplifies and detects a nucleic acid sequence of the mRNA. In another embodiment, the reagent is a polynucleotide probe that hybridizes to the target sequence. In another embodiment, the target sequences are illustrated in Table III. In another embodiment, the reagent is an antibody or fragment of an antibody. The reagent can include multiple said primers, probes or antibodies, each specific for at least one gene, gene fragment or expression product of Table I or Table II. Optionally, the reagent can be associated with a conventional detectable label.

In another embodiment, the composition is a kit containing the relevant multiple polynucleotides or oligonucleotide probes or ligands, optional detectable labels for same, immobilization substrates, optional substrates for enzymatic labels, as well as other laboratory items. In still another embodiment, at least one polynucleotide or oligonucleotide or ligand is associated with a detectable label. In certain embodiments, the reagent is immobilized on a substrate. Exemplary substrates include a microarray, chip, microfluidics card, or chamber.

In one embodiment, the composition is a kit designed for use with the nCounter Nanostring system, as further discussed below.

II. GENE EXPRESSION PROFILING METHODS

Methods of gene expression profiling that were used in generating the profiles useful in the compositions and methods described herein or in performing the diagnostic steps using the compositions described herein are known and well summarized in U.S. Pat. No. 7,081,340. Such methods of gene expression profiling include methods based on hybridization analysis of polynucleotides, methods based on sequencing of polynucleotides, and proteomics-based methods. The most commonly used methods known in the art for the quantification of mRNA expression in a sample include northern blotting and in situ hybridization; RNAse protection assays; nCounter® Analysis; and PCR-based methods, such as RT-PCR. Alternatively, antibodies may be employed that can recognize specific duplexes, including DNA duplexes, RNA duplexes, and DNA-RNA hybrid duplexes or DNA-protein duplexes. Representative methods for sequencing-based gene expression analysis include Serial Analysis of Gene Expression (SAGE), and gene expression analysis by massively parallel signature sequencing (MPSS).

In certain embodiments, the compositions described herein are adapted for use in the methods of gene expression profiling described herein, and those known in the art.

A. Patient Sample

The “sample” or “biological sample” as used herein means any biological fluid or tissue that contains immune cells and/or cancer cells. In one embodiment, a suitable sample is whole blood. In another embodiment, the sample may be venous blood. In another embodiment, the sample may be arterial blood. In another embodiment, a suitable sample for use in the methods described herein includes peripheral blood, more specifically peripheral blood mononuclear cells. Other useful biological samples include, without limitation, plasma or serum. In still other embodiment, the sample is saliva, urine, synovial fluid, bone marrow, cerebrospinal fluid, vaginal mucus, cervical mucus, nasal secretions, sputum, semen, amniotic fluid, bronchoalveolar lavage fluid, and other cellular exudates from a subject suspected of having a lung disease. Such samples may further be diluted with saline, buffer or a physiologically acceptable diluent. Alternatively, such samples are concentrated by conventional means. It should be understood that the use or reference throughout this specification to any one biological sample is exemplary only. For example, where in the specification the sample is referred to as whole blood, it is understood that other samples, e.g., serum, plasma, etc., may also be employed in another embodiment.

In one embodiment, the biological sample is whole blood, and the method employs the PaxGene Blood RNA Workflow system (Qiagen). That system involves blood collection (e.g., single blood draws) and RNA stabilization, followed by transport and storage, followed by purification of Total RNA and Molecular RNA testing. This system provides immediate RNA stabilization and consistent blood draw volumes. The blood can be drawn at a physician's office or clinic, and the specimen transported and stored in the same tube. Short term RNA stability is 3 days at between 18-25° C. or 5 days at between 2-8° C. Long term RNA stability is 4 years at −20 to −70° C. This sample collection system enables the user to reliably obtain data on gene expression in whole blood. In one embodiment, the biological sample is whole blood. While the PAXgene system has more noise than the use of PBMC as a biological sample source, the benefits of PAXgene sample collection outweighs the problems. Noise can be subtracted bioinformatically by the person of skill in the art.

In one embodiment, the biological samples may be collected using the proprietary PaxGene Blood RNA System (PreAnalytiX, a Qiagen, BD company). The PAXgene Blood RNA System comprises two integrated components: PAXgene Blood RNA Tube and the PAXgene Blood RNA Kit. Blood samples are drawn directly into PAXgene Blood RNA Tubes via standard phlebotomy technique. These tubes contain a proprietary reagent that immediately stabilizes intracellular RNA, minimizing the ex-vivo degradation or up-regulation of RNA transcripts. The ability to eliminate freezing, batch samples, and to minimize the urgency to process samples following collection, greatly enhances lab efficiency and reduces costs. Thereafter, the miRNA is detected and/or measured using a variety of assays.

B. Nanostring Analysis

A sensitive and flexible quantitative method that is suitable for use with the compositions and methods described herein is the nCounter® Analysis system (NanoString Technologies, Inc., Seattle Wash.). The nCounter Analysis System utilizes a digital color-coded barcode technology that is based on direct multiplexed measurement of gene expression and offers high levels of precision and sensitivity (<1 copy per cell). The technology uses molecular “barcodes” and single molecule imaging to detect and count hundreds of unique transcripts in a single reaction. Each color-coded barcode is attached to a single target-specific probe (i.e., polynucleotide, oligonucleotide or ligand) corresponding to a gene of interest, i.e., a gene of Table I. Mixed together with controls, they form a multiplexed CodeSet. In one embodiment, the CodeSet includes all 559 genes of Table I. In another embodiment, the CodeSet includes all 100 genes of Table II. In another embodiment, the CodeSet includes at least 3 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 5 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 10 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 15 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 20 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 25 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 30 genes of Table I or Table II. In yet another embodiment, the CodeSet includes at least 40 genes of Table I or Table II. In yet another embodiment, the CodeSet includes at least 50 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 60 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 70 genes of Table I or Table II. In yet another embodiment, the CodeSet includes at least 80 genes of Table I or Table II. In yet another embodiment, the CodeSet includes at least 90 genes of Table I or Table II. In another embodiment, the CodeSet includes at least 100 genes of Table I. In another embodiment, the CodeSet includes at least 200 genes of Table I. In another embodiment, the CodeSet includes at least 300 genes of Table I. In yet another embodiment, the CodeSet includes at least 400 genes of Table I. In yet another embodiment, the CodeSet includes at least 500 genes of Table I. In yet another embodiment, the CodeSet is formed by the first 539 genes in rank order of Table I. In yet another embodiment, the CodeSet includes any subset of genes of Table I, as described herein. In another embodiment, the CodeSet includes any subset of genes of Table II, as described herein.

The NanoString platform employs two ˜50 base probes per mRNA that hybridizes in solution. The Reporter Probe carries the signal; the Capture Probe allows the complex to be immobilized for data collection. The probes are mixed with the patient sample. After hybridization, the excess probes are removed and the probe/target complexes aligned and immobilized to a substrate, e.g., in the nCounter Cartridge.

The target sequences utilized in the Examples below for each of the genes of Table I and Table II are shown in Table III below, and are reproduced in the sequence listing. These sequences are portions of the published sequences of these genes. Suitable alternatives may be readily designed by one of skill in the art.

Sample Cartridges are placed in the Digital Analyzer for data collection. Color codes on the surface of the cartridge are counted and tabulated for each target molecule.

A benefit of the use of the NanoString nCounter system is that no amplification of mRNA is necessary in order to perform the detection and quantification. However, in alternate embodiments, other suitable quantitative methods are used. See, e.g., Geiss et al, Direct multiplexed measurement of gene expression with color-coded probe pairs, Nat Biotechnol. 2008 March; 26(3):317-25. doi: 10.1038/nbt1385. Epub 2008 Feb. 17, which is incorporated herein by reference in its entirety.

TABLE III

Se-

quence

Posi-

ID#
Gene
tion
Target Sequence

1
ABCA5
NM_018672.4
6839-
AAGGAAGACTGTGTGTAGAATCT

6938
TACGTAATAGTCTGATTCTTTGA

CTCTGTGGCTAGAATGACAGTTA

TCTATGGAGGTGGTAGAATTAAG

CCATACCT

2
ABCF1
NM_00102509
2875-
CCTAAACAAACAAGAGGTGACC

1.1
2974
ACCTTATTGTGAGGTTCCATCCA

GCCAAGTTTATGTGGCCTATTGT

CTCAGGACTCTCATCACTCAGAA

GCCTGCCTC

3
ACAA2
NM_006111.2
1605-
CTCACTGTGACCCATCCTTACTC

1704
TACTTGGCCAGGCCACAGTAAAA

CAAGTGACCTTCAGAGCAGCTGC

CACAACTGGCCATGCCCTGCCAT

TGAAACAG

4
PHCA
NM_018367.6
3324-
AGCCAATAGTGATTTGTTTGCAT

3423
ATCACCTAATGTGAAAAGTGCTC

ATCTGTGAACTCTACAGCAAATT

ATATTTTAGAAAATACTTTGTGA

GGCCGGGC

5
ACSL5
NM_203379.1
2701-
CTATCACTCATGTCAATCATATC

2800
TATGAGACAAATGTCTCCGATGC

TCTTCTGCGTAAATTAAATTGTG

TACTGAAGGGAAAAGTTTGATCA

TACCAAAC

6
CABC1
NM_020247.4
2536-
TTCTAGAGTGAGATTTGTGTTTT

2635
CTGCCCTTTTCCTCTCCAGCCGA

TGGGCTGGAGCTGGGAGAGGTGC

TGAGCTAACAGTGCCAACAAGT

GCTCCTTAA

7
CD97
NM_078481.3
3186-
GCCAGTACTCGGGACAGACTAA

3285
GGGCGCTTGTCCCATCCTGGACT

TTTCCTCTCATGTCTTTGCTGCA

GAACTGAAGAGACTAGGCGCTGG

GGCTCAGCT

8
AFTPH
NM_017657.4
2741-
CTACCACCCGTCCAGTTTGACTG

2840
GAGTAGCAGTGGCCTTACTAACC

CTTTAGATGGTGTGGATCCGGAG

TTGTATGAGTTAACAACTTCTAA

GCTGGAAA

9
AHCYL1
NM_00124267
2401-
CTACCCGGCAGGTAGGTTAGATG

6.1
2500
TGGGTGGTGCATGTTAATTTCCC

TTAGAAGTTCCAAGCCCTGTTTC

CTGCGTAAAGGTGGTATGTCCAG

TTCAGAGA

10
AK
AK026725.1
1869-
AATGAAATTACTGTAGAGTCAGC

026725

1968
AAAGAAGTAGAGAAGAAAAAAC

ACCAAGAATGAGGAGAACCTAG

CAAGGGCAGGCTTTTGGAAGCA

AGAGGTAGATA

11
AK
AK093878.1
1554-
AGAATTTCTTGGTAGCTTTACAC

093878

1653
CGAAAAATGCGTGTAACTAAAT

ACCAGACATCTTGACCATTCAGC

TAGAACCCTGGCAGCAACAGAG

CTATTTAATT

12
AK
AK094576.1
1765-
CCCCTCCAGCCAGCCCTGCGTGG

094576

1864
TTGTGGCCCCACTGCAGAAACGC

CTCCGCTTAACACTCCAGCCTCT

CTTCTATTCGGTCAGGCCACAGC

TGCTGACT

13
AK
AK124143.1
2252-
GTACCTGGTAGAAATTGTGTCTT

124143

2351
GGAATGACCCTTTCGAGTTATTG

ACATGGCTCTGATGAATAGAACA

TGAGCCCCAAAACTAAATCCAA

AAGGAATTT

14
AK
AK126342.1
2906-
CTTATTGATTAGTGAATGTAGCT

126342

3005
TAAGCCTTTGTATGTGTCCTCAG

GGGGCAGACCGACTTTAAGAGG

GACCAGATAACGTTTGAATGGA

GGGATTATAT

15
AKAP4
NM_139289.1
417-
CTGTAAGTGTCCTCAACTGGCTT

516
CTCAGTGATCTCCAGAAGTATGC

CTTGGGTTTCCAACATGCACTGA

GCCCCTCAACCTCTACCTGTAAA

CATAAAGT

16
AKR1C3
NM_003739.4
1097-
GAGGACGTCTCTATGCCGGTGAC

1196
TGGACATATCACCTCTACTTAAA

TCCGTCCTGTTTAGCGACTTCAG

TCAACTACAGCTGAGTCCATAGG

CCAGAAAG

17
ALAS1
NM_000688.4
1616-
GGGGATCGGGATGGAGTCATGC

1715
CAAAAATGGACATCATTTCTGGA

ACACTTGGCAAAGCCTTTGGTTG

TGTTGGAGGGTACATCGCCAGCA

CGAGTTCTC

18
AMD1
NM_001634.4
572-
ACCACCCTCTTGCTGAAAGCACT

671
GGTTCCCCTGTTGAAGCTTGCTA

GGGATTACAGTGGGTTTGACTCA

ATTCAAAGCTTCTTTTATTCTCG

TAAGAATT

19
AMPD3
NM_000480.2
3389-
GTGATGCTCAGGGGCTGTCAAAG

3488
TGACTGCGTTCATCAGTTTTACA

CTGGGGCTGCTACATAATATTTT

CATTTGAACGAAGAACTTCAAAA

AGCACAGG

20
ANKHD1
NM_017747.2
7665-
CTTGGAACCCTATGATAAAAGTT

7764
ATCCAAAATTCAACTGAATGCAC

TGATGCCCAGCAGATTTGGCCTG

GCACGTGGGCACCTCATATTGGA

AACATGCA

21
ANP32B
NM_006401.2
661-
CACCTTGGAACCTTTGAAAAAGT

760
TAGAATGTCTGAAAAGCCTGGAC

CTCTTTAACTGTGAGGTTACCAA

CCTGAATGACTACCGAGAGAGT

GTCTTCAAG

22
ANXA1
NM_000700.1
516-
GAAATCAGAGACATTAACAGGG

b

615
TCTACAGAGAGGAACTGAAGAG

AGATCTGGCCAAAGACATAACCT

CAGACACATCTGGAGATTTTCGG

AACGCTTTGC

23
ANXA1
NM_000700.2
1191-
TGGATGAAACCAAAGGAGATTA

1290
TGAGAAAATCCTGGTGGCTCTTT

GTGGAGGAAACTAAACATTCCCT

TGATGGTCTCAAGCTATGATCAG

AAGACTTTA

24
AP2S1
NM_021575.3
746-
CGAGTAACCGTGCCGTTGTCGTG

845
TGATGCCATAAGCGTCTGTGCGT

GGAGTCCCCAATAAACCTGTGGT

CCTGCCTGGCCTTGCCGTCAAAA

AAAAAAAA

25
CENTD2
NM_00104011
4923-
AAACTCCAGAACAGCAGAAAGC

8.2
5022
GGGTGCTGTAGAGGAGCACTCA

GCTCACGGGGAGGGAGCTCTTG

GCTGAGCTTCTACAGGGCTGAGA

GCTGCGCTTTG

26
ARCN1
NM_001655.4
3437-
CACTTTTAGCTGGTTGAAAAGTA

3536
CCACTCCCACTCTGAACATCTGG

CCGTCCCTGCAAAGAGTGTACTG

TGCTTGAAGCAGAGCACTCACAC

ATAAATGG

27
ARG1
NM_000045.2
506-
AAGGAACTAAAAGGAAAGATTC

b

605
CCGATGTGCCAGGATTCTCCTGG

GTGACTCCCTGTATATCTGCCAA

GGATATTGTGTATATTGGCTTGA

GAGACGTGG

28
ARG1
NM_000045.3
989-
TTCGGACTTGCTCGGGAGGGTAA

1088
TCACAAGCCTATTGACTACCTTA

ACCCACCTAAGTAAATGTGGAA

ACATCCGATATAAATCTCATAGT

TAATGGCAT

29
ARHGAP
NM_018054.5
3027-
CATGTATGGTCTGTGTCTCCCCA

17

3126
GTCCCCTCAGAACCATGCCCATG

GATGGTGACTGCTGGCTCTGTCA

CCTCATCAAACTGGATGTGACCC

ATGCCGCC

30
ARHGAP
NM_033515.2
2499-
TTTTTGACCAAAAAGATAACAAA

18

2598
TACCAGGTATGGCAAGTTGTGAA

GACAGCACATTAAAACATACCTA

ATTTCACAGTATTCCTGTCACGA

CAGAATGT

31
ARHGAP
NM_015071.4
6088-
TCCCTGAGCTTTCCCAGTAGCCT

26

6187
CCAGTTTCCTTTGTAAGACCCAG

GGATCACTTAGCCATAGCCTGAA

TCTTTTAGGGGTATTAAGGTCAG

CCTCTCAC

32
ARHGEF
NM_015318.3
5128-
GATTACAACATTTCCTCACTGCG

18

5227
GGATATTTCTGACCCGCTTTAGA

ACTTAAGACCTGATTCTAGCAAT

AAACGTGTCCGAGATGAGCGGT

GAAAAAAAA

33
FLJ
NM_018071.4
5402-
GAATGTGTCTCCTCCACAGTGGC

10357

5501
TCCCAGAGGTTCCACACACTCTC

TGAAGCTCCTTCTCCCACACTGC

ACCTACTCCTTGAGGCTGAACTG

GTCACAGA

34
ARHGEF
NM_005435.3
5151-
GGGGGACCATTGGGGCCTGAGC

5

5250
CAAGGAACTTTCCTTCTACTGCC

TTATAGTGCTTAAACATTCTCCG

CCTCCAGGGTGCAGATTCAGAGC

TGGCCAGAG

35
ARL8B
NM_018184.2
2491-
ACCATTACAAAGAATGTGGCAA

2590
CTTGCTTGTGCCTAAAAGGAGGA

ATTGGAACTAGAATGTGTGACTC

TGTGGGGACTGCATAGGTTTGTT

AATTGACCT

36
ARPC2
NM_005731.2
951-
ACGGGGAAGACGTTTTCATCCCG

1050
CTAATCTTGGGAATAAGAGGAG

GAAGCGGCTGGCAACTGAAGGC

TGGAACACTTGCTACTGGATAAT

CGTAGCTTTT

37
ASF1B
NM_018154.2
1476-
CTGTCTCCGGGCCAGGGTCAGGG

1575
ACCCTCTGCCTCTGGCAGCCTTA

ACCTGTCCTCTGCTAGGACCAGG

GTGATTTCAAGCCAGGGAAGCA

ACTGGGACC

38
ATG4B
NM_178326.2
106-
GGACGCAGCTACTCTGACCTACG

205
ACACTCTCCGGTTTGCTGAGTTT

GAAGATTTTCCTGAGACCTCAGA

GCCCGTTTGGATACTGGGTAGAA

AATACAGC

39
ATG5
NM_004849.2
1105-
TGCAGTGGCTGAGTGAACATCTG

1204
AGCTACCCGGATAATTTTCTTCA

TATTAGTATCATCCCACAGCCAA

CAGATTGAAGGATCAACTATTTG

CCTGAACA

40
ATM
NM_000051.3
31-
ACGCTAAGTCGCTGGCCATTGGT

130
GGACATGGCGCAGGCGCGTTTGC

TCCGACGGGCCGAATGTTTTGGG

GCAGTGTTTTGAGCGCGGAGACC

GCGTGATA

41
ATP2C1
NM_014382.3
4070-
TAAAAAGTCCCCAAACCCAAAC

4169
AAATGGTTTATGAACCAGAGTAT

ATGTGGAAGATTCTTTGCTGGTC

TTGCTCTGTGTGCATCTGAAGCT

TCTTTGGCC

42
ATP5B
NM_001686.3
1626-
CTATATGGTGGGACCCATTGAAG

1725
AAGCTGTGGCAAAAGCTGATAA

GCTGGCTGAAGAGCATTCATCGT

GAGGGGTCTTTGTCCTCTGTACT

GTCTCTCTC

43
ATP5I
NM_007100.2
256-
TTGCCAGAGAATTGGCAGAAGA

355
TGACAGCATATTAAAGTGAGTGA

CCCTGCGACCCACTCTTTGGACC

AGCAGCGGATGAATAAAGCTTC

CTGTGTTGTG

44
ATP5J2
NM_004889.3
267-
GCTGGCATGCTACGTGCTCTTTA

366
GCTACTCCTTTTCCTACAAGCAT

CTCAAGCACGAGCGGCTCCGCA

AATACCACTGAAGAGGACACAC

TCTGCACCCC

45
ATP5L
NM_006476.4
196-
GGGACGGGGTCCTGCAGCGGGT

295
CCTTCCGGCGGGTGACATTCAGC

CGGCGGTTCGGGGCGACGGACT

CTCCATTCCAGAACCATGGCCCA

ATTTGTCCGT

46
AW
AW173314.1
419-
AGCAGAAGGCAGGGGAGTCCAC

173314

518
ACAGGGCAAGCAGCAACCAGGC

TTCTGAGGACAGGAAAGGAGGG

AGCATCTGGTGGGAAGCTGGCG

AGGAGGGGCTGG

47
AW
AW270402.1
203-
GATATCTCACACACGGAATAATC

270402

302
ATTAAGAAACAACCACTGTTGAG

CAAAGTTGATAGGCAGTAAGGA

AATAAAGTGGACATAAACACAG

CAGTACTAAT

48
AZI2
NM_022461.4
3031-
GAATTGGTGTCAGATGCTGGAAT

3130
TTATTCTGACCAATGAACACAGC

TGACTCAGGGGAGTACAATCTCC

TGCCAAGTAATAGAACCAAACC

CAATATGCA

49
BACH2
NM_021813.3
8696-
TCCAGAACCAGTCTGATGCAAGT

8795
GCACCTCTAATATATGCCTTACA

AACTCCAGAGGCCATATTCAAAA

CAGGGTCTTCTCAGTGTATGCAA

GGGGCTGC

50
BAG3
NM_004281.3
2304-
CCCCACCACCTGTTAGCTGTGGT

2403
TGTGCACTGTCTTTTGTAGCTCT

GGACTGGAGGGGTAGATGGGGAG

TCAATTACCCATCACATAAATAT

GAAACATT

51
BANP
NM_079837.2
2125-
GGAGCCCTTTGCTGTGTGCTCTG

2224
TCCAGTGTCATGAGGCAGGTGTT

TGCAAAGCCAGCTCTCGGTTCCG

ATGGGGTATTGCTGACCTACTTT

TCTAGGGG

52
BATF
NM_006399.3
294-
CCTGGCAAACAGGACTCATCTGA

393
TGATGTGAGAAGAGTTCAGAGG

AGGGAGAAAAATCGTATTGCCG

CCCAGAAGAGCCGACAGAGGCA

GACACAGAAGG

53
BCL10
NM_003921.2
1251-
TGAAAATACCATCTTCTCTTCAA

1350
CTACACTTCCCAGACCTGGGGAC

CCAGGGGCTCCTCCTTTGCCACC

AGATCTACAGTTAGAAGAAGAA

GGAACTTGT

54
BCL6
NM_00113084
3401-
CCTCACGGTGCCTTTTTTCACGG

5.1
3500
AAGTTTTCAATGATGGGCGAGCG

TGCACCATCCCTTTTTGAAGTGT

AGGCAGACACAGGGACTTGAAG

TTGTTACTA

55
BCOR
NM_017745.5
5794-
ATACAAAGCTCTGATGACAGGCC

5893
ATGACTGTAGAGTGGTCAGAACT

GTGTGGTTGGTTTGAGGGAGCGA

ATTCGGGGAAGGCACTTGGTGAT

ATAACTTT

56
BF
BF375676.1
141-
TGTATTTCTGTGCAATGAGAGAG

375676

240
GCTCTTTATGGTGGTGCTACAAA

CAAGCTCATCTTTGGAACTGGCA

CTCTGCTTGCTGTCCAGCCAAAT

ATCCAGAA

57
BID
NM_001196.2
1876-
AAGCACGACAGTGGATGCTGGG

1975
TCCATATCACACACATTGCTGTG

AACAGGAAACTCCTGTGACCAC

AACATGAGGCCACTGGAGACGC

ATATGAGTAAG

58
BMPR2
NM_001204.6
1164-
CAGCGGCCCTGGCGGGTGCCCTG

1263
GCTACCATGGACCATCCTGCTGG

TCAGCACTGCGGCTGCTTCGCAG

AATCAAGAACGGCTATGTGCGTT

TAAAGATC

59
BQ
BQ189294.1
416-
GCTGGAGTGATTGGCCCTGATGA

189294

515
CCATGGAGAAAAGAGAGTAGGG

AGAACAGTATAACCAGAAGTCA

GGGGGGTCTCCTGGAATCCCTCC

TCACAATACC

60
BU
BU743228.1
154-
CCCTGTGGGCCTTGCAGGCCAGT

743228

253
CCAGGCAGGTCTTTCACACTGTT

GTCCCACATAACAGAAAAAGCT

GAGCAGACAGGGTAGGAAACAC

ACTTGCATCT

61
BX
BX089765.1
106-
TTAAGCAACTTGCTCCAGTGACG

089765

205
CAGCTGGTAAGCAGCAGAGCTG

GGATTAAAACCCAGGCATTCTGA

TTCCACCACCTACACACTTAGCC

ATTCCGCCC

62
BX
BX108566.1
365-
ATTTAGGGTGAGAGCTTCACAGC

108566

464
TGAAAATCTCCTTTAAAGAAAAC

GCGGCCCAAATGTGCTGGGAGG

AGAAGCCAGTGGATCTAGGAGG

GGGCCCGGCG

63
BX
BX400436.2
1-
ATATTTTGGAGAGGGAAGTTGGC

400436

100
TCACTGTTGTAGAGGACCTGAAC

AAGGTGTTCCCACCCGAGGTCGC

TGTGTTTGAGCCATCAGAAGCAG

AGATCTCC

64
BX
BX436458.2
518-
ATGCAGACAATTTGCCTGTGAGA

436458

617
TGAGGAAAATTCTCTGGAAGATT

TAGGCCCTGAGAGCTGAAAAGG

GACCCTAAACATTACCTGGTGAC

AACTGCCCT

65
C15orf
NM_015492.4
3535-
CCTGAGCTTTTAACGTGAGGGTC

39

3634
TTTATTGGATAGGACTACTCCCT

ATTTCTTGCCTAGAGAACACACA

TGGGCTTTGGAGCCCGACAGACC

TGGGCTTG

66
C17orf
XM_944416.1
4909-
AAGGATGGGGGTGGATTGACCA

51

5008
AGCTGGGCCAGAGGTGCGAGGA

GCTGATCTGCGAGCCCTGTGTGC

CTGTGAGTCCTGGCGGAGTGGCC

GTGCGTGGTG

67
C3
NM_000064.2
4397-
CATCTACCTGGACAAGGTCTCAC

4496
ACTCTGAGGATGACTGTCTAGCT

TTCAAAGTTCACCAATACTTTAA

TGTAGAGCTTATCCAGCCTGGAG

CAGTCAAG

68
C4B
NM_00100202
4438-
GAGTCCAGGGTGCACTACACCGT

9.3
4537
GTGCATCTGGCGGAACGGCAAG

GTGGGGCTGTCTGGCATGGCCAT

CGCGGACGTCACCCTCCTGAGTG

GATTCCACG

69
C4orf
NM_017867.2
682-
GAACCGTGAAGATGAAACAGAG

27

781
AGATAAGAAAGTTGTGACAAAG

ACCTTTCATGGTGCAGGCTTGGT

TGTTCCAGTAGATAAAAATGATG

TTGGGTACCG

70
C8orf
NM_032847.2
1029-
TAAAAGATGAAGTTCACCCAGA

76

1128
GGTGAAGTGTGTTGGCTCCGTAG

CCCTGACTGCCTTGGTGACTGTA

TCCTCAGAAGAATTTGAAGACAA

GTGGTTCAG

71
C9orf
NM_182635.1
529-
CGCTGGCCATGGGGAAGCCACCT

164

628
CCAGGGCAGTCCCAGGGACTGA

ATTGGAAGTTGTCCCAAGTCACT

TCAGGTCCAACTGGGACAGCAG

AGGTAACCCC

72
CAMP
NM_004345.4
623-
TTGTCCAGAGAATCAAGGATTTT

722
TTGCGGAATCTTGTACCCAGGAC

AGAGTCCTAGTGTGTGCCCTACC

CTGGCTCAGGCTTCTGGGCTCTG

AGAAATAA

73
CASP1
NM_033294.3
219-
ATTTATCCAATAATGGACAAGTC

318
AAGCCGCACACGTCTTGCTCTCA

TTATCTGCAATGAAGAATTTGAC

AGTATTCCTAGAAGAACTGGAGC

TGAGGTTG

74
CASP2
NM_032983.3
3347-
CCCACCACTCTTGACTCAGGTGG

3446
TGTCCTTCTTCCTCAAGTCTTGA

CAATTCCCGGGCCCTTCAGTCCC

TGAGCAGTCTACTTCTGTGTCT

GTCACCACA

75
CASP3
NM_032991.2
686-
ACTCCACAGCACCTGGTTATTAT

785
TCTTGGCGAAATTCAAAGGATGG

CTCCTGGTTCATCCAGTCGCTTT

GTGCCATGCTGAAACAGTATGCC

GACAAGCT

76
CBLL1
NM_024814.3
1967-
ATGAGGGGGAAAAAAACTTATG

2066
TGTAGTCAATCTTTTAAGCTTTG

ACTGTTTTGGGAAGGAAGAGTAC

CTCTTATCGAGGTAGTATAAAAC

ACATAGGGT

77
CC2D1B
NM_032449.2
4183-
TTGCATAAGCACAGCTCAAGAAC

4282
TGAGCTTTGTATGTGTCCTTTTG

GGGGATAACAGGGCTGGACCATG

CTTCCCTGCCCTTAAACGCAGAG

CTTTTAGT

78
KIAA
NM_021174.5
201-
GGGAGAGGGCCCACACAGTCTC

1967

300
CTCGCCGGCACCGGCCTCCTCCA

TTTTTCCGGGCCTTGCGTGGAGG

GTTTTGGCGGATGTTTTTGAACG

AAGGAATGT

79
CCDC97
NM_052848.1
2867-
ATCCAGAGTGAGACAGCATTGG

2966
AGGGACAAGTGTGCATGCAGAT

GTCCTCAGACGGGAAGGTTTGAG

AAGGGTCAGATGGTAGGCGGGC

CTAACAAGGGC

80
CCL3
NM_002983.2
160-
CAGTTCTCTGCATCACTTGCTGC

259
TGACACGCCGACCGCCTGCTGCT

TCAGCTACACCTCCCGGCAGATT

CCACAGAATTTCATAGCTGACTA

CTTTGAGA

81
CCL3L1
NM_021006.4
422-
GGAGCCTGAGCCTTGGGAACAT

521
GCGTGTGACCTCTACAGCTACCT

CTTCTATGGACTGGTTATTGCCA

AACAGCCACACTGTGGGACTCTT

CTTAACTTA

82
CCL3L3
NM_00100143
402-
GGGGAGGAGCAGGAGCCTGAGC

7.3
501
CTTGGGAACATGCGTGTGACCTC

CACAGCTACCTCTTCTATGGACT

GGTTATTGCCAAACAGCCACACT

GTGGGACTC

83
CCL4
NM_002984.2
36-
TTCTGCAGCCTCACCTCTGAGAA

135
AACCTCTTTGCCACCAATACCAT

GAAGCTCTGCGTGACTGTCCTGT

CTCTCCTCATGCTAGTAGCTGCC

TTCTGCTC

84
CCND3
NM_001760.2
1216-
GGCCAGCCATGTCTGCATTTCGG

1315
TGGCTAGTCAAGCTCCTCCTCCC

TGCATCTGACCAGCAGCGCCTTT

CCCAACTCTAGCTGGGGGTGGGC

CAGGCTGA

85
CCR1
NM_001295.2
536-
CATCATTTGGGCCCTGGCCATCT

635
TGGCTTCCATGCCAGGCTTATAC

TTTTCCAAGACCCAATGGGAATT

CACTCACCACACCTGCAGCCTTC

ACTTTCCT

86
CCR6
NM_031409.2
936-
CTTTAACTGCGGGATGCTGCTCC

1035
TGACTTGCATTAGCATGGACCGG

TACATCGCCATTGTACAGGCGAC

TAAGTCATTCCGGCTCCGATCCA

GAACACTA

87
CCR9
NM_031200.1
1096-
CCCTGTTCTCTATGTTTTTGTGG

1195
GTGAGAGATTCCGCCGGGATCTC

GTGAAAACCCTGAAGAACTTGGG

TTGCATCAGCCAGGCCCAGTGGG

TTTCATTT

88
CCT6A
NM_001762.3
281-
GCCCAAGGGCACCATGAAGATG

380
CTCGTTTCTGGCGCTGGAGACAT

CAAACTTACTAAAGACGGCAAT

GTGCTGCTTCACGAAATGCAAAT

TCAACACCCA

89
CD14
NM_000591.2
886-
GCCCAAGCACACTCGCCTGCCTT

985
TTCCTGCGAACAGGTTCGCGCCT

TCCCGGCCCTTACCAGCCTAGAC

CTGTCTGACAATCCTGGACTGGG

CGAACGCG

90
CD160
NM_007053.2
501-
TTGATGTTCACCATAAGCCAAGT

b

600
CACACCGTTGCACAGTGGGACCT

ACCAGTGTTGTGCCAGAAGCCAG

AAGTCAGGTATCCGCCTTCAGGG

CCATTTTT

91
CD160
NM_007053.3
1286-
AAAGGAAGACAGCCAGATCCAG

1385
TGATTGACTTGGCATGAAAATGA

GAAAATGCAGACAGACCTCAAC

ATTCAACAACATCCATACAGCAC

TGCTGGAGGA

92
CD1A
NM_001763.2
1816-
CCTGTTTTAGATATCCCTTACTC

1915
CAGAGGGCCTTCCCTGACTTACA

AGTGGGAAGCAGTCTCTTCCTGG

TCTGAACTCCCGCCACATTTTAG

CCGTACTT

93
CD36
NM_000072.3
1619-
TAAAGAATCTGAAGAGGAACTA

1718
TATTGTGCCTATTCTTTGGCTTA

ATGAGACTGGGACCATTGGTGAT

GAGAAGGCAAACATGTTCAGAAG

TCAAGTAAC

94
CD48
NM_001778.2
271-
AATTTAAAGGCAGGGTCAGACTT

370
GATCCTCAGAGTGGCGCACTGTA

CATCTCTAAGGTCCAGAAAGAG

GACAACAGCACCTACATCATGA

GGGTGTTGAA

95
CD69
NM_001781.2
1360-
TATACAGTGTCTTACAGAGAAAA

1459
GACATAAGCAAAGACTATGAGG

AATATTTGCAAGACATAGAATAG

TGTTGGAAAATGTGCAATATGTG

ATGTGGCAA

96
CD70
NM_001252.2
191-
CCTATGGGTGCGTCCTGCGGGCT

290
GCTTTGGTCCCATTGGTCGCGGG

CTTGGTGATCTGCCTCGTGGTGT

GCATCCAGCGCTTCGCACAGGCT

CAGCAGCA

97
CD79A
NM_021601.3
617-
TGAAGATGAAAACCTTTATGAAG

716
GCCTGAACCTGGACGACTGCTCC

ATGTATGAGGACATCTCCCGGGG

CCTCCAGGGCACCTACCAGGATG

TGGGCAGC

98
CD79B
NM_000626.2
350-
GAAGCTGGAAAAGGGCCGCATG

449
GAAGAGTCCCAGAACGAATCTCT

CGCCACCCTCACCATCCAAGGCA

TCCGGTTTGAGGACAATGGCATC

TACTTCTGT

99
CDC
NM_006779.3
1779-
AGGGCTTTGTGGAGGACAGGCCT

42EP2

1878
TGCCCTCAAGAACGTCGTACCTG

ACGCTGAGCCTGTCATGAGAATG

CAACAGGAGCAAACCAAGTGTT

GCTGTGACA

100
CDH5
NM_001795.3
3406-
TCTCCCCTTCTCTGCCTCACCTG

3505
GTCGCCAATCCATGCTCTCTTTC

TTTTCTCTGTCTACTCCTTATCC

CTTGGTTTAGAGGAACCCAAGAT

GTGGCCTT

101
CDKN1A
NM_000389.2
1976-
CATGTGTCCTGGTTCCCGTTTCT

2075
CCACCTAGACTGTAAACCTCTCG

AGGGCAGGGACCACACCCTGTAC

TGTTCTGTGTCTTTCACAGCTCC

TCCCACAA

102
CFD
NM_001928.2
860-
CTGGTTGGTCTTTATTGAGCACC

959
TACTATATGCAGAAGGGGAGGC

CGAGGTGGGAGGATCATTGGAT

CTCAGGAGTTCGAGATCAGCATG

GGCCACGTAG

103
CHCHD3
NM_017812.2
1173-
TCCACCCTAACAAAGTAGGATGG

1272
GGTTGGGGGCTAAATTAATTGGA

GTGGGGCGAGGAGAGAGCCAGA

AAACATAGATCCGAGGGCAGCA

GTGCTGGGTG

104
CHFR
NM_018223.2
2836-
CGCCGCTCCCTCATGCTGCCCGG

2935
GCCCTTCCTCCAAGACCCTACAG

AGCCTGAGGGGCACCTTGGCTTC

CGCCTGTGCTAGCTTTGCCATGT

CATCTGGA

105
CHMP5
NM_016410.5
1148-
ACTAAGGAAATGGAATCTTAAA

1247
AGTCTATGACAGTGTAACTCTAC

AGTCTCAAAATGACCTGATAAAT

TGATAAGACAAAGATGAGATTA

TTGGGGCTGT

106
CIAPIN
NM_020313.3
1816-
GCATGTCTTGTAAAGAGAGGGG

1

1915
ATGTGCATTTGTGTGTGATGTTG

GATAGTCATCCACGCTCAGTTTG

GACCATTGGAGGAACTTAGTGTC

ACGCACAAA

107
CKS2
NM_001827.1
228-
AGACTTGGTGTCCAACAGAGTCT

327
AGGCTGGGTTCATTACATGATTC

ATGAGCCAGAACCACATATTCTT

CTCTTTAGACGACCTCTTCCAAA

AGATCAAC

108
CLEC4A
NM_194448.2
389-
ATTTCTACTGAATCAGCATCTTG

488
GCAAGACAGTGAGAAGGACTGT

GCTAGAATGGAGGCTCACCTGCT

GGTGATAAACACTCAAGAAGAG

CAGGATTTCA

109
CLEC4C
NM_203503.1
571-
TACGAGAGTATCAACAGTATCAT

670
CCAAGCCTGACCTGCGTCATGGA

AGGAAAGGACATAGAAGATTGG

AGCTGCTGCCCAACCCCTTGGAC

TTCATTTCA

110
CLEC5A
NM_013252.2
3251-
CCCCATCCAACCCTTAGACTCAC

3350
GAACAAATCCACCTGAGATCAG

CAGAGCCACCCTAGATCAGCTGA

AACTCTAAGCACAAAAATAAAA

ACTTATCACT

111
CLIC3
ILMN_179642
99-
CGTACGCCGCTACCTGGACAGCG

3.1
198
CGATGCAGGAGAAAGAGTTCAA

ATACACGTGTCCGCACAGCGCCG

AGATCCTGGCGGCCTACCGGCCC

GCCGTGCAC

112
CLK2
XM_941392.1
552-
GATTATAGCCGGGATCGGGGAG

651
ATGCCTACTATGACACAGACTAT

CGGCATTCCTATGAATATCAGCG

GGAGAACAGCAGTTACCGCAGC

CAGCGCAGCA

113
CLN8
NM_018941.3
4486-
GGCGCCAGAGCTGGGCTCTTCAA

4585
CACGGCATTTAGCGCAGAAAGTC

GTGGTTCAGGCAGTATGGGCCGC

TGTGACAAAACACCTAAGACTG

GGTAGTTTA

114
CLPTM1
NM_001294.3
2389-
TCTGTGTTTCCAGCCATCTCGCC

2488
CTGCCAGCCCAGCACCACTGGGA

ATCATGGTGAAGCTGATGCAGCG

TTGCCGAGGGGGTGGGTTGGGC

GGGGGTGGG

115
CLSTN1
NM_00100956
4990-
TTGAATACTGTTCTGTGACCCTG

6.2
5089
ACTGCTAGTTCTGAGGACACTGG

TGGCTGTGCTATGTGTGGCCATC

CTCCATGTCCCGTCCCTGTAGCT

GCTCTGTT

116
CN
CN312986.1
491-
AGGAAACTAAGACATGGAAAGG

312986

590
TTAGGTAACTTGCCCAAGGTCGC

ACAGCTAGTAAGTGGCAGACAT

CCAGAGTCTCTGCTCTGCTCTTA

ACTCTCACCA

117
CNIH4
NM_014184.3
526-
AATGACTGAAGCTGGAGAAGCC

625
GTGGTTGAAGTCAGCCTACACTA

CAGTGCACAGTTGAGGAGCCAG

AGACTTCTTAAATCATCCTTAGA

ACCGTGACCA

118
CNPY2
NM_014255.5
1038-
TTGCAGTAAGCGAACAGATCTTT

1137
GTGACCATGCCCTGCACATATCG

CATGATGAGCTATGAACCACTGG

AGCAGCCCACACTGGCTTGATGG

ATCACCCC

119
COLEC
NM_130386.2
901-
ACACAAGCCAGGCTATCCAGCG

12

1000
AATCAAGAACGACTTTCAAAATC

TGCAGCAGGTTTTTCTTCAAGCC

AAGAAGGACACGGATTGGCTGA

AGGAGAAAGT

120
GLT25D
NM_024656.2
3067-
CTGTGTGCCAGGCCTCACAGACT

1

3166
CCCAGTTGGGTTGAAGAATGGTT

GACTGAGTTTGATTCTTCCTGTA

CCCTCGGTCGTCTGAGCTGTGTG

CGGACAAC

121
COMMD6
NM_203497.3
32-
CTCTCGAGTCCGGGCCGCAAGTC

131
CCAGACGCTGCCCATGGAGGCGT

CCAGCGAGCCGCCGCTGGATGCT

AAGTCCGATGTCACCAACCAGCT

TGTAGATT

122
CORO1C
ILMN_174595
98-
AAGTAAAGTTGTTGATGGTGGTG

4.1
197
AAACACCGTAGGGCATGTGGTTC

AAAGAGAAGCAGGAGGGCAAGG

GAAAGTTACCCTGATCTTAGTTT

GTAGCTTAT

123
COX6C
NM_004374.2
70-
GAAGTTTTGCCAAAACCTCGGAT

169
GCGTGGCCTTCTGGCCAGGCGTC

TGCGAAATCATATGGCTGTAGCA

TTCGTGCTATCCCTGGGGGTTGC

AGCTTTGT

124
COX7B
NM_001866.2
160-
CAGAGCCACCAGAAACGTACAC

259
CTGATTTTCATGACAAATACGGT

AATGCTGTATTAGCTAGTGGAGC

CACTTTCTGTATTGTTACATGGA

CATATGTAG

125
COX7C
NM_001867.2
1-
CAAGGTCGTGAAAAAAAAGGTC

100
TTGGTGAGGTGCCGCCATTTCAT

CTGTCCTCATTCTCTGCGCCTTT

CGCAGAGCTTCCAGCAGCGGTAT

GTTGGGCCA

126
CPPED1
NM_018340.2
2494-
TGTATTTGTTTCTTTACAACAGG

2593
TGTAGGTATAGGAGGTCAAGAAA

AGGAGTTCGGTAAAGGGCATAG

CTAATAACAACCACACATTGGGC

CAGGCACAG

127
CR2 b
NM_00100665
486-
GGTGTCAAGCAAATAATATGTGG

8.1
585
GGGCCGACACGACTACCAACCT

GTGTAAGTGTTTTCCCTCTCGAG

TGTCCAGCACTTCCTATGATCCA

CAATGGACA

128
CR2
NM_00100665
3581-
AGCCCAGTTTCACTGCCATATAC

8.2
3680
TCTTCAAGGACTTTCTGAAGCCT

CACTTATGAGATGCCTGAAGCCA

GGCCATGGCTATAAACAATTACA

TGGCTCTA

129
CREB1
NM_004379.3
4856-
TTTGATGGTAGGTCAGCAGCAGT

4955
GCTAGTCTCTGAAAGCACAATAC

CAGTCAGGCAGCCTATCCCATCA

GATGTCATCTGGCTGAAGTTTAT

CTCTGTCT

130
CREB5
NM_182898.3
7898-
ACCTACTCACCTTTTTCCCTTCT

7997
AAGTTCTGCTAAATCACATCTGC

CTCATAGAGAAAGGAATGTTGCC

TTTGAGAACTGTCTTGGAGAACA

GATAAGCT

131
CRKL
NM_005207.3
4901-
TTCTAAAGGAGCAGAAGGACAG

5000
GTCTCTGAGACAGGATCGTTGTC

CCTACAGGAGGAACAGTGGCCTT

GCTTCTTAGACGGTCTTCACTGT

GTGTTTTAA

132
CRY2
NM_021117.3
4013-
CAGCTCAGGTGGCCCTGAGGGCT

4112
CCCTCGGAACAGTGCCTCAAATC

CTGACCCAAGGGCCAGCATGGG

GAAGAGATGGTTGCAGGCAAAA

TGCACTTTAT

133
CS
NM_004077.2
2080-
CCTCCTAGCAAGACCTGTTGGTT

2179
AGCTGGACATGCTTTGGCAATTT

TTTTATACTACCAAGTGACCATA

AAGGCATGGCATTTGTTGTGACT

GGCACCCA

134
CSK
NM_004383.2
2501-
TCTAGGGACCCCTCGCCCCAGCC

2600
TCATTCCCCATTCTGTGTCCCAT

GTCCCGTGTCTCCTCGGTCGCCC

CGTGTTTGCGCTTGACCATGTTG

CACTGTTT

135
CST7
NM_003650.3
618-
CAACCACACCTTGAAGCAGACTC

717
TGAGCTGCTACTCTGAAGTCTGG

GTCGTGCCCTGGCTCCAGCACTT

CGAGGTGCCTGTTCTCCGTTGTC

ACTGACCC

136
CTAG1B
NM_001327.2
286-
GCGGGGCCAGGGGGCCGGAGAG

385
CCGCCTGCTTGAGTTCTACCTCG

CCATGCCTTTCGCGACACCCATG

GAAGCAGAGCTGGCCCGCAGGA

GCCTGGCCCA

137
CTDSP2
NM_005730.3
4685-
GAGGTCGGGCCAGCTGCCCCATT

4784
CTTTTAACGTTGTAGGGCCTGCC

CATGGAGCGGACCCTCCTCTTTG

GGCCTCGTGAGCTTTTTTGCTTA

TCATGTTC

138
CTSW
NM_001335.3
1076-
TGCACCGAGGGAGCAATACCTGT

1175
GGCATCACCAAGTTCCCGCTCAC

TGCCCGTGTGCAGAAACCGGATA

TGAAGCCCCGAGTCTCCTGCCCT

CCCTGAAC

139
CTSZ
NM_001336.3
1174-
CACTGGCTGCGAGTGTTCCTGAG

1273
AGTTGAAAGTGGGATGACTTATG

ACACTTGCACAGCATGGCTCTGC

CTCACAATGATGCAGTCAGCCAC

CTGGTGAA

140
CX3CL1
NM_002996.3
141-
AGCACCACGGTGTGACGAAATG

240
CAACATCACGTGCAGCAAGATG

ACATCAAAGATACCTGTAGCTTT

GCTCATCCACTATCAACAGAACC

AGGCATCATG

141
CXCL2
NM_002089.3
855-
ATCACATGTCAGCCACTGTGATA

954
GAGGCTGAGGAATCCAAGAAAA

TGGCCAGTGAGATCAATGTGACG

GCAGGGAAATGTATGTGTGTCTA

TTTTGTAAC

142
IL8RB
NM_001557.3
410-
ACCTCAAAAATGGAAGATTTTAA

509
CATGGAGAGTGACAGCTTTGAA

GATTTCTGGAAAGGTGAAGATCT

TAGTAATTACAGTTACAGCTCTA

CCCTGCCCC

143
CXCR5
NM_001716.3
2619-
ACGTCCCTTTTTTCTCTGAGTAT

b

2718
CTCCTCGCAAGCTGGGTAATCGA

TGGGGGAGTCTGAAGCAGATGCA

AAGAGGCAAGAGGCTGGATTTT

GAATTTTCT

144
CYBB
NM_000397.3
3787-
ACTGGAGAGGGTACCTCAGTTAT

3886
AAGGAGTCTGAGAATATTGGCCC

TTTCTAACCTATGTGCATAATTA

AAACCAGCTTCATTTGTTGCTCC

GAGAGTGT

145
CYP1B1
NM_000104.3
2361-
CTTACACCAAACTACTGAATGAA

2460
GCAGTATTTTGGTAACCAGGCCA

TTTTTGGTGGGAATCCAAGATTG

GTCTCCCATATGCAGAAATAGAC

AAAAAGTA

146
DB
DB338252.1
436-
GTTCTTGGTCTGTATGTGTAGGT

338252

535
GGAGGGAGGCAAAGTTGTGGTA

ATAAAGTGGGAAGGCCCGGGAA

GAACAGCTAACTGTATAGGGGT

GAAATGACGCT

147
DBI
NM_00107986
241-
CATAAATACAGAACGGCCCGGG

2.1
340
ATGTTGGACTTCACGGGCAAGGC

CAAGTGGGATGCCTGGAATGAG

CTGAAAGGGACTTCCAAGGAAG

ATGCCATGAAA

148
DCAF7
NM_005828.4
6155-
TTAACACTGTGCTGTGAAACAAC

6254
TATGGGGAATCTCCATTGAAGGC

TACTTCATGGGCACCTGAAAGTG

GAGTGTTATAGCTATGACTTTCT

ATTTCTTG

149
DDIT4
NM_019058.2
1414-
GACCTGTTGTAGGCAGCTATCTT

1513
ACAGACGCATGAATGTAAGAGT

AGGAAGGGGTGGGTGTCAGGGA

TCACTTGGGATCTTTGACACTTG

AAAAATTACA

150
DDX23
NM_004818.2
2811-
ATTGCACTGGGCCATCAGCTCAT

2910
GCCAGGCTATGGGGGCAGCCAG

TTGGCATTGCTCCCCAGACTGAA

CAGAAACCTGGCCGCCGGATGG

GACCTCCTTT

151
DGUOK
NM_080916.2
573-
ACATCGAGTGGCATATCTATCAG

672
GACTGGCATTCTTTTCTCCTGTG

GGAGTTTGCCAGCCGGATCACAT

TACATGGCTTCATCTACCTCCAG

GCTTCTCC

152
DGUOK
NM_080916.2
903-
TTGTAAAGAATCTGTAACCAATA

b

1002
CCATGAAGTTCAGGCTGTGATCT

GGGCTCCCTGACTTTCTGAAGCT

AGAAAAATGTTGTGTCTCCCAAC

CACCTTTC

153
DHX16
NM_00116423
2491-
CCCGTGTCAACTTCTTTCTCCCT

b
9.1
2590
GGCGGTGACCACCTGGTTCTGCT

AAATGTTTACACACAGTGGGCTG

AGAGTGGTTACTCTTCCCAGTGG

TGCTATGA

154
DHX16
NM_003587.4
3189-
ACCAAAGAGTTCATGAGACAGG

3288
TACTGGAGATTGAGAGCAGTTGG

CTTCTGGAGGTGGCTCCCCATTA

TTATAAGGCCAAGGAGCTAGAA

GATCCCCATG

155
DKFZp
XM_291277.4
4192-
CTCCTGCAGCTTCTGTGAGCCAA

761PO4

4291
GCCCCAGCCTGCACCGTCGCTGC

23

CCCTTCCCTGCCTAACCCTTTCC

TGTCTCGCCTTGGAAGCACCCAT

GTCTCCCT

156
DMBT1
NM_007329.2
3713-
CACAATGGCTGGCTCTCCCACAA

3812
CTGTGGCCATCATGAAGACGCTG

GTGTCATCTGCTCAGCTTCCCAG

TCCCAGCCGACACCCAGCCCAGA

CACTTGGC

157
DNAJB1
NM_006145.2
1904-
GACCTCTGGCTCCAGTGAAGCTG

2003
AATGTCCTCACTTTGTGGGTCAC

ACTCTTTACATTTCTGTAAGGCA

ATCTTGGCACACGTGGGGCTTAC

CAGTGGCC

158
DNAJB6
NM_058246.3
2087-
CTTCCCTGCATGCTCCCTCCCAG

2186
TGACTTTCCTTCCCTTTCACATG

AGGATCTGCCGTTCATGTTGCTT

TCTCCTTTGTCCTCTTGGACTTG

AGGGCATT

159
DOCK5
NM_024940.6
7201-
AAAGAGATTTCCATTTCTGCTGC

7300
CAGAGCTGGTATTTGCCTGCCTG

ATTCTCTGTGTTTCCTGTTTCAC

CGCCACCCTTTCAGGAGAGAACT

ACACCAGT

160
DPF2
NM_006268.4
2249-
TCTCAGCTCATGGGGAAGCCACA

2348
TAGACATCCCTTTCTTCCCTTGC

ACGCTCGCTAGCAGCTGGTAAGG

TCTTCACACCCTGATTCCTCAAG

TTTTCTGC

161
DYNC2
NM_016008.3
351-
TTTGGGAACTCGGTGGAGGAACC

LI1

450
TCTTTATTGGACTTAATCAGCAT

ACCCATCACAGGTGACACCTTAC

GGACGTTTTCTCTTGTTCTCGTT

CTGGATCT

162
DZIP3
NM_014648.3
4323-
CCCAGTGTCTTGCCCAGTAGATA

4422
CAAGATAAATATTGCCAGAATCA

GATATCAGGAAGTAGTAAGAAA

AGGAGTTAATATGCAAACTAAAT

CACTCGCTC

163
EEF1B2
NM_00103766
699-
GGATACGGAATTAAGAAACTTC

3.1
798
AAATACAGTGTGTAGTTGAAGAT

GATAAAGTTGGAACAGATATGCT

GGAGGAGCAGATCACTGCTTTTG

AGGACTATG

164
EGLN1
NM_022051.1
3976-
AGCAGCATGGACGACCTGATAC

4075
GCCACTGTAACGGGAAGCTGGG

CAGCTACAAAATCAATGGCCGG

ACGAAAGCCATGGT

165
EGR1
NM_001964.2
1506-
GAGGCATACCAAGATCCACTTGC

1605
GGCAGAAGGACAAGAAAGCAGA

CAAAAGTGTTGTGGCCTCTTCGG

CCACCTCCTCTCTCTCTTCCTAC

CCGTCCCCG

166
EHD4
NM_139265.3
2605-
TCAAACATTAAATATCCCGAGGT

2704
CTCCTTGGTGGGTGGCAGGATTT

AAATTCAATCAAATCCTGTCCTA

GTGTGTGCAGTGTCTTCGGCCCT

GTGGACAC

167
EID2B
NM_152361.2
628-
GCCAGTTTAGTTAACTCAGTCAT

727
TAGGGGGAATGCAAACTGGAAG

GGAATACGGCAATGTGCAATTG

AAGGAGGAAGCACACTCCGAAA

TGGAAACAGAC

168
EIF2B4
NM_015636.3
1497-
GTCTCTAATGAGCTAGATGACCC

1596
TGATGATCTGCAATGTAAGCGGG

GAGAACATGTTGCGCTGGCTAAC

TGGCAGAACCACGCATCCCTACG

GTTGTTGA

169
EIF4EN
NM_019843.2
3051-
CACACTGGGCAGGACCCTGCTTC

IF1

3150
ATCTCGGGTTGGTTTATGGGCTT

TTACTTTGGAGCACTCTGTGTGA

AGCTGTTTGGTGGAACCCATGCA

TCTGGTGT

170
EMR4
NM_00108049
1719-
GGGAAGACGATTGGATCAATCA

8.2
1818
TTGCATACTCATTCACCATCATC

AACACCCTTCAGGGAGTGTTGCT

CTTTGTGGTACACTGTCTCCTTA

ATCGCCAGG

171
EP300
NM_001429.2
716-
CCAGCCAGGCCCAACAGAGCAG

815
TCCTGGATTAGGTTTGATAAATA

GCATGGTCAAAAGCCCAATGAC

ACAGGCAGGCTTGACTTCTCCCA

ACATGGGGAT

172
EPHX2
NM_001979.5
1909-
CATCCTTCCACCTGCTGGGGCAC

2008
CATTCTTAGTATACAGAGGTGGC

CTTACACACATCTTGCATGGATG

GCAGCATTGTTCTGAAGGGGTTT

GCAGAAAA

173
ERLIN1
NM_006459.3
3197-
TGATGGCCCTGGAGGCGGGGCT

3296
GAGGAACAGGGAAATGCCGCTG

TGAAGTCTTAAAGCACTTCTGCT

TAAACTCCCATGTGTGAGGAGTG

TGCCTCCCTG

174
ETFDH
NM_004453.3
1904-
TGACCTCTTGTCATCTGTGGCTC

2003
TGAGTGGTACTAATCATGAACAT

GACCAGCCGGCACACTTAACCTT

AAGGGATGACAGTATACCTGTAA

ATAGAAAT

175
EVI2A
NM_014210.3
1410-
GAGAGAGCTAAACTGTGTAATTT

1509
AATGGTATCTTCCTTGCTGGATG

TGGCAGAATCCACACCAGCTTAT

CAACCAACACAGCTAATTTTAGA

ATAGATCC

176
EWSR1
NM_005243.3
2248-
AAAAATGGATAAAGGCGAGCAC

2347
CGTCAGGAGCGCAGAGATCGGC

CCTACTAGATGCAGAGACCCCGC

AGAGCTGCATTGACTACCAGATT

TATTTTTTAA

177
EYA3
NM_001990.3
1551-
GATTCCTGGTTAGGAACTGCATT

1650
AAAGTCCTTACTTCTCATCCAGT

CCAGAAAGAATTGTGTGAATGTT

CTGATCACTACCACCCAGCTGGT

TCCAGCCC

178
C5orf
NM_032042.5
4058-
TTAGAACAAGTAGAATGGGAAA

21

4157
GGAGTGACTGATAAATCTAAGAT

TCAAAATAGTCCCGTCGAAACTT

AAAGGCCAGATTATTGCTTTGGA

GCTTTCTAT

179
FAM179
NM_199280.2
3306-
ACTCTTAGACTCAGAGTCCTTGG

A

3405
GAGGCAGCCGCAAGGCCACTGA

CAGAGGGGTGGCCCCTGACAGC

AAGACAACTGGCAGCTCATACCC

TTTTCAGCTG

180
FAM193
NM_003704.3
4523-
CCCTGACTTGTAGCCAGCTTGTG

A

4622
TAAGATCCCTTGCAGAACGAGA

AAGTTAAAAACAAGCCCACCCA

GTACTCACACCATCAAGTCTGTT

ATAGAGTGTA

181
FAM43A
NM_153690.4
2741-
AGACCCCTGAAATGTTGCCAAAT

2840
TCTTCAAATAACTGTTTGGGGGG

TGGGGGGAGATGAAAGAGAGTC

GCGTTTTGTTTACAGTTAAAGAC

ATCCAATAT

182
FAM50B
NM_012135.1
1273-
TTCTGAGTATTTTAGTGTTGCCA

1372
CCTGGATTTGCTGCATTGCTCTG

CTGAGCTGTATTGAAACCATGAC

TGGGCCCACTGTCAGACAGAAAT

TAGAATAG

183
FAIM3
NM_005449.4
1689-
CAGGCTCTAGATCACATGGCATC

1788
AGGCTGGGGCAGAGGCATAGCT

ATTGTCTCGGGCATCCTTCCCAG

GGTTGGGTCTTACACAAATAGAA

GGCTCTTGC

184
FKBP1A
NM_054014.3
301-
AGAAACAAGCCCTTTAAGTTTAT

400
GCTAGGCAAGCAGGAGGTGATC

CGAGGCTGGGAAGAAGGGGTTG

CCCAGATGAGTGTGGGTCAGAG

AGCCAAACTGA

185
FLNB
NM_001457.3
9148-
CAGACCTGAGCTGGCTTTGGAAT

9247
GAGGTTAAAGTGTCAGGGACGTT

GCCTGAGCCCAAATGTGTAGTGT

GGTCTGGGCAGGCAGACCTTTAG

GTTTTGCT

186
FNBP1
NM_015033.2
5237-
TGTGTGTTGCACTAATTCTAAAC

5336
TTTGGAGGCATTTTGCTGTGTGA

GGCCGATCGCCACTGTAAAGGTC

CTAGAGTTGCCTGTTTGTCTCTG

GAGATGGA

187
FOXK2
NM_004514.3
4387-
TTTTTTGCCGTAGGCACCATTCT

4486
GCATCTTGAACCCAGACTGAAGT

GTGCCTCTCACAGATGGAAGGTG

CACACGCTCCTGTCTCCTCCTCA

CTCTGCCA

188
FRAT2
NM_012083.2
1769-
CTTGTCCTCCCAGCTGAGCTTTC

1868
TTATTCCACCCTTTCTGGTGTCT

ATAGGAATGCATGAGAGACCCTG

GACGTTTTTCTGCTCTCTTCTGG

CCCTCCAT

189
FTHL16
XR_041433.1
255-
GGACTCAGAGGCCGCCATCAAC

354
CGCCAGATCAACCTAGAGCTCTG

TGCCTCCTACGTTTACCTGTCCA

TGTCTTACTGCTTTGACCGTGAT

GATGTGGCT

190
GATA2
NM_00114566
2573-
GTCCAGTTGATTGTACGTAGCCA

2.1
2672
CAGGAGCCCTGCTATGAAAGGA

ATAAAACCTACACACAAGGTTG

GAGCTTTGCAATTCTTTTTGGAA

AAGAGCTGGG

191
GLIS3
NM_00104241
548-
ACTCGCGCTGGCCGGCCGGGGG

3.1
647
AAGGGACCCGCACGCCGGGCTTT

GTTGTGGAAATCCCGGTTACCTG

GCTTATAACCCACACCATGGATA

ACTTATTGG

192
GLRX
ILMN_173730
119-
AAAGCATAGTTGGTCTTGGTGTC

8.1
218
ATATGGATCAGAGGCACAAGTG

CAGAGGCTGTGGTCATGCGGAA

CACTCTGTTATTTAAGATGGCTA

TCCAGATAAT

193
GNL3
NM_014366.4
1733-
TACAGCAGGTGAACAGTCTACA

1832
AGGTCTTTTATCTTGGATAAAAT

CATTGAAGAGGATGATGCTTATG

ACTTCAGTACAGATTATGTGTAA

CAGAACAAT

194
GNS
NM_002076.3
4988-
CCTGTGTTTGCATCCTCTGTTCC

5087
TATTCTGCCCTTGCTCTGTGTCA

TCTCAGTCATTTGACTTAGAAAG

TGCCCTTCAAAAGGACCCTGTTC

ACTGCTGC

195
GOLGA3
NM_005895.3
8961-
CTCACTGACCGGAAGGTCCAGGT

9060
GAATCTCGTCATAAGTGATCTCA

GGCTCTCACAGGATCCGGAGGG

AAATGTGTTAGAGGGTCTGGAA

AATTCAGTGC

196
GPATCH
NM_022078.2
1686-
AGTCTGGGAGCAGCAGTCTTCGT

3

1785
GGCTGGTTCAGGGTGTTTTGTTC

CGAGCCTGCCTGCCTGCCGGTTC

TATACCTCAGGGGCATTTTTACA

AAAAGCCC

197
GPI
NM_000175.2
1696-
CAGTGCTCAAGTGACCTCTCACG

1795
ACGCTTCTACCAATGGGCTCATC

AACTTCATCAAGCAGCAGCGCG

AGGCCAGAGTCCAATAAACTCGT

GCTCATCTG

198
GPR65
NM_003608.3
1899-
TATGATTTTTCTCACTCTTTCTT

1998
TGGACTCCAGGGTGTCAGCCATC

AGGTCTCCTAATTTTGTGTACCG

GTCTCCAACAACCCCAGCTACTG

AATACTGC

199
GSTO1
NM_004832.2
897-
AGAGCTCTACTTACAGAACAGCC

996
CTGAGGCCTGTGACTATGGGCTC

TGAAGGGGGCAGGAGTCAGCAA

TAAAGCTATGTCTGATATTTTCC

TTCACTAAT

200
GUSH
NM_000181.3
2032-
GGTATCCCCACTCAGTAGCCAAG

2131
TCACAATGTTTGGAAAACAGCCT

GTTTACTTGAGCAAGACTGATAC

CACCTGCGTGTCCCTTCCTCCCC

GAGTCAGG

201
GZMA
NM_006144.3
636-
GCCTCCGAGGTGGAAGAGACTC

735
GTGCAATGGAGATTCTGGAAGCC

CTTTGTTGTGCGAGGGTGTTTTC

CGAGGGGTCACTTCCTTTGGCCT

TGAAAATAA

202
GZMB
NM_004131.3
541-
ACACTACAAGAGGTGAAGATGA

640
CAGTGCAGGAAGATCGAAAGTG

CGAATCTGACTTACGCCATTATT

ACGACAGTACCATTGAGTTGTGC

GTGGGGGACC

203
GZMH
NM_033423.4
718-
GGCCCCTCGTGTGTAAGGACGTA

817
GCCCAAGGTATTCTCTCCTATGG

AAACAAAAAAGGGACACCTCCA

GGAGTCTACATCAAGGTCTCACA

CTTCCTGCC

204
HAT1
NM_003642.3
1235-
AACCAAATAGAAATAAGCATGC

1334
AACATGAACAGCTGGAAGAGAG

TTTTCAGGAACTAGTGGAAGATT

ACCGGCGTGTTATTGAACGACTT

GCTCAAGAGT

205
HAVCR2
NM_032782.3
956-
TATATGAAGTGGAGGAGCCCAA

1055
TGAGTATTATTGCTATGTCAGCA

GCAGGCAGCAACCCTCACAACCT

TTGGGTTGTCGCTTTGCAATGCC

ATAGATCCA

206
HDAC3
NM_003883.3
1765-
AAGATGAAGAGAGAGAGATTTG

1864
GAAGGGGCTCTGGCTCCCTAACA

CCTGAATCCCAGATGATGGGAA

GTATGTTTTCAAGTGTGGGGAGG

ATATGAAAAT

207
HERC1
NM_003922.3
14664-
CAATCGACATGGACAACTACATG

14763
CTCTCGAGAAACGTGGACAACG

CCGAGGGCTCCGACACTGACTAC

TGACCGTGCGGGTGCTCTCACCC

TCCCTTCTC

208
HERC3
NM_014606.2
3796-
TAAGAATGATTTAGACTGACCTG

3895
TCCTTTTTTATCTGCGCATGCGA

GAACATCACCTTCCTCTGTACAC

TTGGAAATGCCTCTGGCTTGTTG

CAGCCCTC

209
HK3
NM_002115.2
2785-
AGTCAGAGGATGGGTCCGGCAA

2884
AGGTGCGGCCCTGGTCACCGCTG

TTGCCTGCCGCCTTGCGCAGTTG

ACTCGTGTCTGAGGAAACCTCCA

GGCTGAGGA

210
HLA-B
NM_005514.6
938-
CCCTGAGATGGGAGCCGTCTTCC

1037
CAGTCCACCGTCCCCATCGTGGG

CATTGTTGCTGGCCTGGCTGTCC

TAGCAGTTGTGGTCATCGGAGCT

GTGGTCGC

211
HLA-
NM_002118.3
21-
CCCGTGAGCTGGAAGGAACAGA

DMB

120
TTTAATATCTAGGGGCTGGGTAT

CCCCACATCACTCATTTGGGGGG

TCAAGGGACCCGGGCAATATAG

TATTCTGCTC

212
HLA-G
NM_002127.4
1181-
AAGAGCTCAGATTGAAAAGGAG

1280
GGAGCTACTCTCAGGCTGCAATG

TGAAACAGCTGCCCTGTGTGGGA

CTGAGTGGCAAGTCCCTTTGTGA

CTTCAAGAA

213
HMGB1
NM_002128.4
209-
TATGCATTTTTTGTGCAAACTTG

308
TCGGGAGGAGCATAAGAAGAAGC

ACCCAGATGCTTCAGTCAACTTC

TCAGAGTTTTCTAAGAAGTGCTC

AGAGAGGT

214
HMGB2
NM_002129.3
670-
TGCTGCATATCGTGCCAAGGGCA

769
AAAGTGAAGCAGGAAAGAAGGG

CCCTGGCAGGCCAACAGGCTCA

AAGAAGAAGAACGAACCAGAAG

ATGAGGAGGAG

215
HNRNPA
NM_004499.3
1246-
CCCCATGGAAATCACTCTCCTGT

B

1345
TGACTATTTCCAGAGCTCTAGGT

GTTTAGGCAGCGTGTGGTGTCTG

AGAGGCCATAGCGCCATCATGG

GCTGATTTT

216
HNRNPK
NM_031263.2
538-
TCCCTACCTTGGAAGAGGGCCTG

637
CAGTTGCCATCACCCACTGCAAC

CAGCCAGCTCCCGCTCGAATCTG

ATGCTGTGGAATGCTTAAATTAC

CAACACTA

217
HOOK3
NM_032410.3
2391-
GCAAGGTAGAGAAGTTGTGCCG

2490
CTCAATCACAGACACCTGCACCC

ACAACATACTTCTGTTACACACA

AGAACATTTCAGGAAACTCAGCC

AGCTTATTT

218
HOPX
NM_139211.4
590-
AACAATAGGAAGCTATGTGTATC

689
TTCTGTGTAAAGCAGTGGCTTCA

CTGGAAAAATGGTGTGGCTAGC

ATTTCCCTTTGAGTCATGATGAC

AGATGGTGT

219
HPSE
NM_006665.5
3920-
GAGGTTCCTATAATTGTCTCTGA

4019
GTAACCCTTTGGAATGGAGAGG

GTGTTGGTCAGTCTACAAACTGA

ACACTGCAGTTCTGCGCTTTTTA

CCAGTGAAA

220
HSCB
NM_172002.3
343-
TCCACCCAGATTTCTTCAGCCAG

442
AGGTCTCAGACTGAAAAGGACTT

CTCAGAGAAGCATTCGACCCTGG

TGAATGATGCCTATAAGACCCTC

CTGGCCCC

221
HSD11B
NM_181755.1
156-
GCCTACTACTACTATTCTGCAAA

1

255
CGAGGAATTCAGACCAGAGATG

CTCCAAGGAAAGAAAGTGATTG

TCACAGGGGCCAGCAAAGGGAT

CGGAAGAGAGA

222
HSP90
NM_007355.3
1531-
GGCATTCTCTAAAAATCTCAAGC

AB1

1630
TTGGAATCCACGAAGACTCCACT

AACCGCCGCCGCCTGTCTGAGCT

GCTGCGCTATCATACCTCCCAGT

CTGGAGAT

223
HSPA6
NM_002155.4
1990-
GTGGCACTCAAGCCCGCCAGGG

2089
GGACCCCAGCACCGGCCCCATCA

TTGAGGAGGTTGATTGAATGGCC

CTTCGTGATAAGTCAGCTGTGAC

TGTCAGGGC

224
HUWE1
NM_031407.6
13637-
CCACCAACTCACCGTGTGTGTCC

13736
CAGCTGCCCCATCTTCCCCAGCG

CATACCTGTTCCTCTTCTCATTC

TCTCCCCGCCGCCTGTTTCCTCA

CCTTCTCT

225
HVCN1
NM_032369.3
747-
TGTTCCAGGAGCACCAGTTTGAG

846
GCTCTGGGCCTGCTGATTCTGCT

CCGGCTGTGGCGGGTGGCCCGG

ATCATCAATGGGATTATCATCTC

AGTTAAGAC

226
IDO1
NM_002164.3
51-
CTATTATAAGATGCTCTGAAAAC

150
TCTTCAGACACTGAGGGGCACCA

GAGGAGCAGACTACAAGAATGG

CACACGCTATGGAAAACTCCTGG

ACAATCAGT

227
IDS
NM_006123.4
1016-
TGGATGGACATCAGGCAACGGG

1115
AAGACGTCCAAGCCTTAAACATC

AGTGTGCCGTATGGTCCAATTCC

TGTGGACTTTCAGCGGAAAATCC

GCCAGAGCT

228
IER5
NM_016545.4
1802-
ACTTTACACCTACCCCTCACCGG

1901
AAAGCTAGACCCGCTTCAGGGCC

AGGAGTGGCGTTTCCGCACAGG

ATTTCCTAAGACGAGAGGGATTT

AGCCAAGAG

229
IFI27L
NM_032036.2
305-
GTCAGTGTTGGGGGCCTGCTTGG

2

404
GGAATTCACCTTCTTCTTCTCTC

CCAGCTGAACCCGAGGCTAAAGA

AGATGAGGCAAGAGAAAATGTA

CCCCAAGGT

230
IFNA17
NM_021268.2
292-
TGAGATGATCCAGCAGACCTTCA

391
ATCTCTTCAGCACAGAGGACTCA

TCTGCTGCTTGGGAACAGAGCCT

CCTAGAAAAATTTTCCACTGAAC

TTTACCAG

231
IFNAR1
NM_000629.2
3124-
CTAATCAGCTCTCAGTGATCAAC

3223
CCACTCTTGTTATGGGTGGTCTC

TGTCACTTTGAATGCCAGGCTGG

CTTCTCGTCTAGCAGTATTCAGA

TACCCCTT

232
IFNAR2
NM_000874.3
632-
AAATACCACAAGATCATTTTGTG

731
ACCTCACAGATGAGTGGAGAAG

CACACACGAGGCCTATGTCACCG

TCCTAGAAGGATTCAGCGGGAA

CACAACGTTG

233
IFNGR1
NM_000416.1
1141-
CCCGGGCAGCCATCTGACTCCAA

1240
TAGAGAGAGAGAGTTCTTCACCT

TTAAGTAGTAACCAGTCTGAACC

TGGCAGCATCGCTTTAAACTCGT

ATCACTCC

234
IGFBP7
NM_001553.2
584-
ATCGGAATCCCGACACCTGTCCT

683
CATCTGGAACAAGGTAAAAAGG

GGTCACTATGGAGTTCAAAGGAC

AGAACTCCTGCCTGGTGACCGGG

ACAACCTGG

235
IL16
NM_004513.4
1263-
GGCATCTCCAACATCATCATCCA

1362
ACGAAGACTCAGCTGCAAATGG

TTCTGCTGAAACATCTGCCTTGG

ACACAGGGTTCTCGCTCAACCTT

TCAGAGCTG

236
IL1B
NM_000576.2
841-
GGGACCAAAGGCGGCCAGGATA

940
TAACTGACTTCACCATGCAATTT

GTGTCTTCCTAAAGAGAGCTGTA

CCCAGAGAGTCCTGTGCTGAATG

TGGACTCAA

237
IL1R2
NM_173343.1
114-
TGCTTCTGCCACGTGCTGCTGGG

213
TCTCAGTCCTCCACTTCCCGTGT

CCTCTGGAAGTTGTCAGGAGCAA

TGTTGCGCTTGTACGTGTTGGTA

ATGGGAGT

238
IL4
NM_000589.2
626-
GACACTCGCTGCCTGGGTGCGAC

725
TGCACAGCAGTTCCACAGGCACA

AGCAGCTGATCCGATTCCTGAAA

CGGCTCGACAGGAACCTCTGGG

GCCTGGCGG

239
IL7
NM_000880.2
39-
AATAACCCAGCTTGCGTCCTGCA

138
CACTTGTGGCTTCCGTGCACACA

TTAACAACTCATGGTTCTAGCTC

CCAGTCGCCAAGCGTTGCCAAGG

CGTTGAGA

240
INTS4
NM_033547.3
652-
CCCACGTGTCAGAACAGCAGCTA

751
TAAAAGCCATGTTGCAGCTCCAT

GAAAGAGGACTGAAATTACACC

AAACAATTTATAATCAGGCCTGT

AAATTACTC

241
IRAK2
NM_001570.3
1286-
GTGTTGGCCGAGGTCCTCACGGG

1385
CATCCCTGCAATGGATAACAACC

GAAGCCCGGTTTACCTGAAGGAC

TTACTCCTCAGTGATATTCCAAG

CAGCACCG

242
IRF1
NM_002198.1
511-
CTGTGCGAGTGTACCGGATGCTT

610
CCACCTCTCACCAAGAACCAGAG

AAAAGAAAGAAAGTCGAAGTCC

AGCCGAGATGCTAAGAGCAAGG

CCAAGAGGAA

243
IRF4
NM_002460.1
326-
GGGCACTGTTTAAAGGAAAGTTC

425
CGAGAAGGCATCGACAAGCCGG

ACCCTCCCACCTGGAAGACGCGC

CTGCGGTGCGCTTTGAACAAGAG

CAATGACTT

244
KIAA
NM_014761.3
2187-
ATGGATGGGACTCTTATGTCATA

0174

2286
ACTTCTGTTACTCCTTTGGCCCA

TAGCTAAGGTCATCCTTCCCCAC

AGGGGTGGCTTTGGGATTGGATG

ATACAGCT

245
ITCH
NM_00125713
439-
GAGGTGACAAAGAGCCAACAGA

8.1
538
GACAATAGGAGACTTGTCAATTT

GTCTTGATGGGCTACAGTTAGAG

TCTGAAGTTGTTACCAATGGTGA

AACTACATG

246
ITFG2
NM_018463.3
1985-
GTCTGGTCTTACCCATGTTCCTA

2084
GCAACCCTGAGATGATTTTCTTC

CATTTACCAAAGCAGCCGGGTCA

GTGCTTTCTCACGTTGCCGTATT

CTTCAGGT

247
ITGAE
NM_002208.4
3406-
CTGAATGCAGAGAACCACAGAA

3505
CTAAGATCACTGTCGTCTTCCTG

AAAGATGAGAAGTACCATTCTTT

GCCTATCATCATTAAAGGCAGCG

TTGGTGGAC

248
ITGAL
NM_002209.2
3906-
GTGAGGGCTTGTCATTACCAGAC

4005
GGTTCACCAGCCTCTCTTGGTTT

CCTTCCTTGGAAGAGAATGTCTG

ATCTAAATGTGGAGAAACTGTAG

TCTCAGGA

249
JAK1
NM_002227.1
286-
GAGAACACCAAGCTCTGGTAGC

385
TCCAAATCGCACCATCACCGTTG

ATGACAAGATGTCCCTCCGGCTC

CACTACCGGATGAGGTTCTATTT

CACCAATT

250
KIAA
NM_015443.3
4402-
CCTTCACATCCAGATCCCTGTCG

1267

4501
GTGTTAGTTCCACTCTTGGTCTT

TCACGCTCCCCTTGCCTGTGGAA

CATTGTCTGGTCCTAGCTGTGGT

TCCCATTG

251
MYST4
NM_012330.3
6541-
CCCAGACTGTAGCCATGCAGGGT

6640
CCTGCACGGACTTTAACGATGCA

AAGAGGCATGAACATGAGTGTG

AACCTGATGCCAGCGCCAGCCTA

CAATGTCAA

252
KCTD12
NM_138444.3
4208-
ACAAGTAAAATAACTTGACATG

4307
AGCACCTTTAGATCCCTTCCCCT

CCATGGGCTTTGGGCCACAGAAT

GAACCTTTGAGGCCTGTAAAGTG

GATTGTAAT

253
KIAA
NM_014736.4
236-
CGACATCAGTTTCATCGAGGAAA

0101

335
GCTGAAAATAAATATGCAGGAG

GGAACCCCGTTTGCGTGCGCCCA

ACTCCCAAGTGGCAAAAAGGAA

TTGGAGAATT

254
SETD1B
XM_
7779-
ATCGTGCCCAGTGTTAACCTCGG

037523.11
7878
CTGGCCTTCACTAAGGGGACTAG

ACCTCCCTCTCCCCAGGAGCCCC

AGCCCCAGAGTGGTTTGCAATAA

TCAAGATA

255
KIR2DL
XM_00112635
265-
GAGGTGACATATGCACAGTTGG

5A
4.1
364
ATCACTGCGTTTTCACACAGACA

AAAATCACTTCCCCTTCTCAGAG

GCCCAAGACACCTCCAACAGAT

ACCACCATGT

256
KIR_
NM_014512.1
719-
TCCGAAACCGGTAACCCCAGAC

Acti-

818
ACCTACATGTTCTGATTGGGACC

vat-

TCAGTGGTCAAAATCCCTTTCAC

ing_

CATCCTCCTCTTCTTTCTCCTTC

Sub-

ATCGCTGGT

group_

2

257
KIR2D
NM_012313.1
1-
CCGGCAGCACCATGTCGCTCATG

S3

100
GTCATCAGCATGGCATGTGTTGG

GTTCTTCTGGCTGCAGGGGGCCT

GGCCACATGAGGGATTCCGCAG

AAAACCTTC

258
KLRB1
NM_002258.2
357-
CAGCAACTCCGAGAGAAATGCTT

456
GTTATTTTCTCACACTGTCAACC

CTTGGAATAACAGTCTAGCTGAT

TGTTCCACCAAAGAATCCAGCCT

GCTGCTTA

259
KLRC1
NM_002259.3
336-
ACCTATCACTGCAAAGATTTACC

435
ATCAGCTCCAGAGAAGCTCATTG

TTGGGATCCTGGGAATTATCTGT

CTTATCTTAATGGCCTCTGTGGT

AACGATAG

260
KLRC2
NM_002260.3
943-
TATGTGAGTCAGCTTATAGGAAG

1042
TACCAAGAACAGTCAAACCCAT

GGAGACAGAAAGTAGAATAGTG

GTTGCCAATGTCTCAGGGAGGTT

GAAATAGGAG

261
KLRD1
NM_002262.3
597-
CAATTTTACTGGATTGGACTCTC

696
TTACAGTGAGGAGCACACCGCCT

GGTTGTGGGAGAATGGCTCTGCA

CTCTCCCAGTATCTATTTCCATC

ATTTGAAA

262
KLRF1
NM_016523.1
544-
TATACAGAAAAACCTAAGACAA

643
TTAAACTACGTATGGATTGGGCT

TAACTTTACCTCCTTGAAAATGA

CATGGACTTGGGTGGATGGTTCT

CCAATAGAT

263
KLRF1
NM_016523.2
849-
AAGTGCAATTAAATGCCAAAATC

b

948
TCTTCTCCCTTCTCCCTCCATCA

TCGACACTGGTCTAGCCTCAGAG

TAACCCCTGTTAACAAACTAAAA

TGTACACT

264
KRTAP
NM_198696.2
213-
CTGCTGCCAGGCGGCCTGTGAGC

10-3

312
CCAGCCCCTGCCAGTCAGGCTGC

ACCAGCTCCTGCACGCCCTCGTG

CTGCCAGCAGTCTAGCTGCCAGC

CAGCTTGC

265
KYNU
NM_00103299
936-
TTGCCTGCTGGTGTTCCTACAAG

8.1
1035
TATTTAAATGCAGGAGCAGGAG

GAATTGCTGGTGCCTTCATTCAT

GAAAAGCATGCCCATACGATTA

AACCTGCGAG

266
LAMA5
NM_005560.4
11163-
CCAACCCCGGCCCCTGGTCAGGC

11262
CCCTGCAGCTGCCTCACACCGCC

CCTTGTGCTCGCCTCATAGGTGT

CTATTTGGACTCTAAGCTCTACG

GGTGACAG

267
LDHA
NM_00116541
1348-
ATCTTGTGTAGTCTTCAACTGGT

6.1
1447
TAGTGTGAAATAGTTCTGCCACC

TCTGACGCACCACTGCCAATGCT

GTACGTACTGCATTTGCCCCTTG

AGCCAGGT

268
LEF1
NM_016269.4
3136-
AACACATAGTGGCTTCTCCGCCC

3235
TTGTAAGGTGTTCAGTAGAGCTA

AATAAATGTAATAGCCAAACCC

ACTCTGTTGGTAGCAATTGGCAG

CCCTATTTC

269
LETM2
NM_144652.3
1331-
AAAGGACCCATCACTTCTTCTGA

1430
AGAACCTACACTCCAGGCCAAAT

CACAAATGACGGCCCAGAACAG

CAAGGCTAGTTCAAAAGGAGCA

TAAAGGACTA

270
LIF
NM_002309.3
1241-
GGGATGGAAGGCTGTCTTCTTTT

1340
GAGGATGATCAGAGAACTTGGG

CATAGGAACAATCTGGCAGAAG

TTTCCAGAAGGAGGTCACTTGGC

ATTCAGGCTC

271
LILRA5
NM_021250.3
1044-
TTGAATGCTGGAGCCTTGGAAGC

1143
GAATCTGATGGTCCTAGGAGGTT

CGGGAAGACCATCTGAGGCCTAT

GCCATCTGGACTGTCTGCTGGCA

ATTTCTTT

272
LILRA5
NM_181879.2
546-
CACCCTCTCAGCCCTGCCCAGTC

b

645
CTGTGGTGACCTCAGGAGAGAA

CGTGACCCTCCAGTGTGGCTCAC

GGCTGAGATTCGACAGGTTCATT

CTGACTGAG

273
LOC
NR_002809.2
471-
GCGGCAGCCAATCAGCGCGCGG

338799

570
CTTCTATAGGGCTTGAGTTATTA

GACGCTGATCTCAAAACATCCTT

CATCAGACACGAAGGAGAGGCC

AACAGATGAG

274
LOC100
XM_00171659
568-
AGGGTCATGCAGCTACTGAGGTC

129022
1.1
667
ACAGCCTGGATTCATACACAGGT

CTGACTCCTGAGCACTTAGCCAG

GTGGCTGTAACAGTGTTCCCAGA

AACACAGG

275
LOC100
XM_00173282
1148-
ACCTGTCTTCCGGGTCTGTTCAC

129697
2.2
1247
CCGTCCCCTGGACTGGCACCAGC

ACAGAGGGTCGAGTGTTGGCAC

CTGTCTTCTGGGTCTCCATCCCT

CCCTTTGTT

276
LOC100
XM_00171715
1469-
GAGAATGTCTGCGCGGAGACAG

130229
8.1
1568
CATAGCTCTGTAGAAATGAGTGG

CAGCGTATGTAACCTGGCATTTT

GAACCCAGGAGCACAATTTTATT

AAAGGAAAA

277
LOC100
XR_036994.1
15-
GAGTAGTAGGTGGACAGCCGTC

132797

114
CCACACAAGGGTTTGTATCTGGG

CTACACAGATTCCCTTCAGAAAA

GCACCAATGTAAGCAACTCCCTT

ACAGTTGCT

278
LOC100
XR_039238.1
342-
GAGATAGCTTCCTGAAATGTGTG

133273

441
AAGGAAAATGATCAGAAAAAGA

AAGAAGCCAAAGAGAAAGGTAC

CTGGGTTCAACTAAAGTGCCAGC

CTGCTCCACC

279
LOC
NM_144692.1
3367-
GCTCTGTCCTTTGCCGCTCAGAC

148137

3466
CAAAAACCTTAGAGCTGTCTTTG

ACTTCTGTCTTTCCCTTCCACCC

ACAGTTAACCAGGAAATCCTGCC

ATCTCCGC

280
LOC
NR_024275.1
5062-
GGTTACAGCCATTTTGTGTGATT

151162

5161
CACTTCGGGGGTTAAGTAATGCA

GGATTCTGCAAACAAGGTGTCGC

CGTCCAAATGTACTGTCCTGGCA

TAGAGAGC

281
C1orf
NM_00100380
2561-
ACATGGCGCCACGGCCACTTCCT

222
8.1
2660
GCTGCCCTGGACCCCGCAAGCCC

AGGGACATCCAAGAGCACCCCT

CCTGAGACCCCAGACTCAGAAG

CAGCGAGAAG

282
LOC
XM_934917.1
376-
CCCCTGGTGGACCGCGACCTCCG

339674

475
CAAGACGCTAATGGTGCGCGAC

AACCTGGCCTTCGGCGGCCCGGA

GGTCTGAGCCGACTTGCAAAGG

GGATAGGCGG

283
LOC
XM_371757.4
210-
GCAAAGCACTATCACAAGGAAT

648000

309
ATAGGCAAATGTACAGAACTGA

AATTCGAGTGGCGAGGATGGCA

AGAAAAGCTGGCAACTTCTATGT

ACCTGCAGAAC

284
LOC
XR_017684.2
82-
AAGATTATGTCTTCCCCTGTTTC

391126

181
CAAAGAGCTGAGACAGAAGTACA

ATGTGCAATCCATGCCCATCCGA

AAGGATGATGAAGTTCAGGTTGT

ACGAGGGC

285
LOC
XM_930634.1
1448-
ATGGGACCCACTCTACTGAGGCT

399753

1547
TTATGTAGAACTCATAGAGGAAG

CTGGCTTTGAGGAATGAACTACC

CTGTGCTTTTCTTAGGACTAAAA

TCTCAGGA

286
LOC
XM_934471.1
21-
GACGGTAACCGGGACCCAGTGT

399942

120
CTGCTCCTGTCACCTTCGCCTCC

TAATCCCTAGCCACTATGCGAGA

TGACTCCTTCAACACCTTCAGTG

AGACGGGTG

287
LOC
XM_498648.3
552-
GAGTTTTCCAAACCCTGGATTTC

440389

651
CTTCGGAGAGAGCTAGATTCTAT

TCCATTCTTGGAATTCAGCTCCT

TGCCCTTCTCTGTGACCCCGGAT

CGCGAATG

288
LOC
XM_942885.1
1533-
TGTTGCAAAAGCCAACTACCACT

440928

1632
GTCAAACTTAGCCCGTTTACAAC

ATGGGGAAAGGCGTATTTCTTAC

TAATATCTCAACAACGATAACAA

TGCTGTAT

289
LOC
XR_018937.2
287-
CGGGTGCAGCGGGAAAAGGCTA

441073

386
ATGGCACAACTGTCCACGTAGGC

ATTCACCCCAGCAAGGTGGTTAT

CACTAGGCTAAAACTGGACAAA

GACTGTGAAA

290
LOC
XR_036892.1
591-
GGTGAAGAATTTGTTCTATTATG

642812

690
AAGATACTGTCTGGGCTAAAAA

GCTTACAGTGAGTGGAAGATAG

CAACTTGTAGGGTTGGTGGCTGA

ACAGGCCGAC

291
LOC
XM_927980.1
255-
CTGGCTCAAGGATGGCACGGTGT

643319

354
TATGTGAGCTCAATAATGCACTG

TACCCCAAGGGGCAGGTCCCAGT

AAAGAAGATCCAGGCCTCCACC

ATGGCCTTC

292
LOC
XR_017529.2
38-
CAGGCGCTGCAAGTTCTCCCAGG

644315

137
AGAAAGCCATGTTCAGTTCGAGC

GCCAAGATCGTGAAGCCCAATG

GCGAGAAGCCGGACGAGTTCGA

GTCCGGCCAT

293
LOC
XM_928884.1
13-
GAAGCACTGGTAAATGTCTGCTG

645914

112
CATTAACTCACTCAGACCAAACT

TTCTCTTATCTAGGTCCAAAAGG

AAGCTGCTCGGCTGGAAGGAAC

CTGGTGAGG

294
LOC
XR_018104.1
670-
AGGTGCTGCAAAATTACCAGGA

647340

769
ATACAGTCTGGCCAACAGCATCT

ACTACTCTCTGAAGGAGTCCACC

ACTAGTGAGCAGAGTGCCAGGA

TGACAGCCAT

295
LOC
XR_038906.2
1638-
TGGAGAGAAGAATGAAGAGGTG

648927

1737
GTGGTTCTGGGTTTGATTTGAGT

TCACCTGTGGGCAGTGGGCAGTG

TCTTGGTGAAAGGGAGCGGATA

CTACTTTTTG

296
LOC
XM_938755.1
38-
GCCCTTCTGCCATCAACGAGGTG

653773

137
GTGACCCAAGAACATACCATCA

ACATTCACAAGCGCATCCATGGA

GAGGGCTTCAAGAAGCGTGCTCC

TCGGGCACT

297
LOC
XR_015610.3
1861-
GTAGTTGTCCACTGCTTTCCTGG

728533

1960
ATGGATGGGACTCTTATGTCATA

ACTTCTATACTCCTTTGGCCCAT

AGCTAAGGTCATCCTTCCCCACA

GGGGTGGC

298
LOC
XM_00113319
510-
CCAAACCAAAAGAGGCAAGCAA

728835
0.1
609
GTCTGCGCTGACCCCAGTGAGTC

CTGGGTCCAGGAGTACGTGTATG

ACCTGGAACTGAACTGAGCTGCT

CAGAGACAG

299
LOC
XR_040891.2
625-
CCCTGGGTGCCCCTTAACCCGGG

729887

724
CGGTAGCTCGTTAAGATGGCGAA

GTGTCCGGTCCGGAACACGCGA

AACCCCAAATCCCGCCTGCCCGA

CCTCCTGAC

300
LOC
XM_00113427
765-
GCGCGGTTGCGGTTAGCGGGCGC

732111
5.1
864
GGTGCCAAAGCTGCCATCCCCAG

CTCACAGCTCCTCATATCCACCC

TGCCCTCATCTTTATGAATTGCG

TGTAGACC

301
LOC
XM_00113301
182-
GCCCTTCAGAGCTGCGGGAGATC

732371
9.1
281
ATTGATGAGTGCCGGGCCCATGA

TCCCTCTGTGCGGCCCTCTGTGG

ATGAGCAGAAGCGCAGACTTAA

TGATGTGTT

302
LOC
NM_00109977
2666-
ATGTTGCATTGACTAGAGGAAAG

91431
6.1
2765
AGGCATTTGTTGATTGTGGGAAA

TTTAGCCTGTTTGAGGAAAAATC

AACTTTGGGGACGAGTGATCCAA

CACTGCGA

303
P2RY5
NM_005767.5
2026-
AGATTGTTTGCACTGGCGTGTGG

2125
TTAACTGTGATCGGAGGAAGTGC

ACCCGCCGTTTTTGTTCAGTCTA

CCCACTCTCAGGGTAACAATGCC

TCAGAAGC

304
LPCAT4
NM_153613.2
1560-
CCCCACACACCTCTCGAGGCACC

1659
TCCCAGACACCAAATGCCTCATC

CCCAGGCAACCCCACTGCTCTGG

CCAATGGGACTGTGCAAGCACCC

AAGCAGAA

305
LPIN2
NM_014646.2
5620-
AGAAAAAACTTAAAAATGGGAT

5719
GTCCTAAAATGAAAGCTGCTCAA

AGTCACAGAACAACCGAGGGAC

AAAGGAGATTGGATGACTGGGA

AGCGCTGGCCC

306
C1orf
NM_018372.3
1543-
TTCCAATACCCAGCTTGCTTCCA

103

1642
TGGCCAATCTAAGGGCAGAGAA

GAATAAAGTGGAGAAACCATCT

CCTTCTACCACAAATCCACATAT

GAACCAATCC

307
LRRC47
NM_020710.2
2461-
GGGTCAGTGACGGACACTTACCT

2560
GACAGCGGATCCACAATATTCTC

GTGCAGTGTGTTTGGAATCCTGG

TCTGGGCTCTCGTCGTTGGCCTT

GTAGATCA

308
LY96
NM_015364.4
439-
AAGGGAGAGACTGTGAATACAA

538
CAATATCATTCTCCTTCAAGGGA

ATAAAATTTTCTAAGGGAAAATA

CAAATGTGTTGTTGAAGCTATTT

CTGGGAGCC

309
LYN
NM_002350.1
1286-
TCCTGAAGAGCGATGAAGGTGG

1385
CAAAGTGCTGCTTCCAAAGCTCA

TTGACTTTTCTGCTCAGATTGCA

GAGGGAATGGCATACATCGAGC

GGAAGAACTA

310
MAGEA1
NM_004988.4
477-
AGGGGCCAAGCACCTCTTGTATC

576
CTGGAGTCCTTGTTCCGAGCAGT

AATCACTAAGAAGGTGGCTGATT

TGGTTGGTTTTCTGCTCCTCAAA

TATCGAGC

311
MAGEA3
NM_005362.3
850-
ACTGTGCCCCTGAGGAGAAAATC

949
TGGGAGGAGCTGAGTGTGTTAG

AGGTGTTTGAGGGGAGGGAAGA

CAGTATCTTGGGGGATCCCAAGA

AGCTGCTCAC

312
MAP3K7
NM_145333.1
671-
GCCATATTATACTGCTGCCCACG

770
CAATGAGTTGGTGTTTACAGTGT

TCCCAAGGAGTGGCTTATCTTCA

CAGCATGCAACCCAAAGCGCTA

ATTCACAGG

313
MARCKS
NM_002356.6
1800-
GTCAAAAAGGGATATCAAATGA

1899
AGTGATGGGGTCACAATGGGGA

AATTGAAGTGGTGCATAACATTG

CCAAAATAGTGTGCCACTAGAA

ATGGTGTAAAG

314
MARCKS
NM_023009.5
1117-
TCCAAGTAGGTTTTGTTTACCCT

L1

1216
ACTCCCCAAATCCCTGAGCCAGA

AGTGGGGTGCTTATACTCCCAAA

CCTTGAGTGTCCAGCCTTCCCCT

GTTGTTTT

315
MBD1
NM_015844.2
2380-
TGGCTGCAGGCCTGACTACTGCC

2479
CACACCAACGAGGTGATCTAGC

AGATACATGGCAACGTGTGAACT

GCAACAACGCCTGGTGCCCCAGC

ACCAACCTT

316
C19orf
NM_174918.2
1062-
CATACTAGAGTATACTGCGGCGT

59

1161
GTTTTCTGTCTACCCATGTCATG

GTGGGGGAGATTTATCTCCGTAC

ATGTGGGTGTCGCCATGTGTGCC

CTGTCACT

317
MED16
NM_005481.2
2152-
TCTGAAGCCCAGCTGCCTGCCCG

2251
TGTATACGGCCACCTCGGATACC

CAGGACAGCATGTCCCTGCTCTT

CCGCCTGCTCACCAAGCTCTGGA

TCTGCTGT

318
MEN1
NM_130799.2
2222-
CCCAGCCCCTAGAAACCCAAGCT

2321
CCTCCTCGGAACCGCTCACCTAG

AGCCAGACCAACGTTACTCAGG

GCTCCTCCCAGCTTGTAGGAGCT

GAGGTTTCA

319
MERTK
NM_006343.2
666-
GAAGAGATCGTGTCTGATCCCAT

765
CTACATCGAAGTACAAGGACTTC

CTCACTTTACTAAGCAGCCTGAG

AGCATGAATGTCACCAGAAACA

CAGCCTTCA

320
MFSD1
NM_022736.2
2023-
AAGGGCTGCGTTACACAAAATA

2122
AACAATGGCATTGTCATAGGCCT

TCCTTTTACTAGTAGGGCATAAT

GCTAGGGAATATGTGAAGATGTT

TTTATGAAG

321
MID1IP
NM_021242.5
3472-
AGCTGGCATTTCGCCAGCTTGTA

1

3571
CGTAGCTTGCCACTCAGTGAAAA

TAATAACATTATTATGAGAAAGT

GGACTTAACCGAAATGGAACCA

ACTGACATT

322
MPDU1
NM_004870.3
1226-
CATTCAGCCAAGCCTCCTCCTCT

1325
AGCAGCAATTTCCAGCTGTGTAA

CACTATCCTGGGCAAATGTTTTA

CCCTGTCCTCCAGCCTCCCTGCT

TCCCTTCT

323
MRPL27
NM_148571.1
2189-
TCAAACTGGTAGCTATGCTTTGA

2288
TGTCCTGTTGAGGCCATCGGACA

GAGACTGGAGCCCAGGTGACAG

GAGATGGTGATACCAGAAGTCA

AGGGTTGGGG

324
MRPS16
NM_016065.3
1811-
ATTCAAATGTGGCTGTGATTTCT

1910
GCATATATCATAGATGGGATCCT

TCTGAGAATACTGGAATAGGGA

ATTAGGACACCAAGCCAATTCAG

CTGTGAACC

325
MS4A2
NM_000139.3
662-
TTCTCACCATTCTGGGACTTGGT

761
AGTGCTGTGTCACTCACAATCTG

TGGAGCTGGGGAAGAACTCAAA

GGAAACAAGGTTCCAGAGGATC

GTGTTTATGA

326
MS4A6A
NM_022349.3
1290-
CTGGGAAGTTAAATGACTGGCCT

1389
GGCATTATGCTATGAGTTTGTGC

CTTTGCTGAGGACACTAGAACCT

GGCTTGCCTCCCTTATAAGCAGA

AACAATTT

327
MS4A6A
NM_152851.2
880-
CTGCGGTGGAAACAGGCTTACTC

b

979
TGACTTCCCTGGGAGTGTACTTT

TCCTGCCTCACAGTTACATTGGT

AATTCTGGCATGTCCTCAAAAAT

GACTCATG

328
MTCH1
NM_014341.2
2081-
TCCTCCTCATCTAATGCTCATCT

2180
GTTTAATGGTGATGCCTCGCGTA

CAGGATCTGGTTACCTGTGCAGT

TGTGAATACCCAGAGGTTGGGCA

GATCAGTG

329
MYADM
NM_00102082
2656-
TCTTTTTCCTGGCCATGAGGACA

0.1
2755
AAAATTACTGAGTGGCCCTTAAA

GAGGGAAGTTTGTTTTCAGCTGT

TCTCTTTTGCCCGTAGGTGGGAG

GGTGGGGA

330
MYADM
NM_00102082
2789-
TGAATGTGTAGTGCACACGCACG

b
0.1
2888
GGTGTTTCTGTGTGCTAGTTGCT

TCTTGCTGCTGCTTCCTGCTTGT

CTGGGACTCACATACATAACGTG

ATATATAT

331
C19orf
NM_019107.3
649-
TGTCCCTGAAAGGGCCAGCACAT

10

748
CACTGGTTTTCTAGGAGGGACTC

TTAAGTTTTCTACCTGGGCTGAC

GTTGCCTTGTCCGGAGGGGCTTG

CAGGGTGG

332
MYL12A
NM_006471.3
305-
TCTCTGGGTAGCAGGGTGGTGTG

404
ATAGCGGCAGCGAGGGGCTCGG

AGAGGTGCTCGGATTCTCGTAGC

TGTGCCGGGACTTAACCACCACC

ATGTCGAGC

333
MYLIP
NM_013262.3
2701-
TTGGGCATTTTGGAAGCTGGTCA

2800
GCTAGCAGGTTTTCTGGGATGTC

GGGAGACCTAGATGACCTTATCG

GGTGCAATACTAGCTAAGGTAA

AGCTAGAAA

334
NAT5
NM_181528.3
735-
AAACATACCACTCTCATGGTTCA

834
TAGTATTCACTGTATGTATGCTA

GGGAAAAGACTTGCTCCAGTCTC

CTCCTCAGTTCTGTGCCTGAGAA

CCACTGCT

335
NADK
NM_023018.4
2449-
TCCGGGGCTAGTGATCGTGATCC

2548
CTTTTATTTGCAACTGTAATGAG

AATTTTTCACACTAACACAGCGA

GGGACTCAACACGCTGATTCTCC

TCCTGCCT

336
NAGK
NM_017567.4
1362-
GGGCCAGGCACATCGGGCACCT

1461
CCTCCCCATGGACTATAGCGCCA

ATGCCATTGCCTTCTATTCCTAC

ACCTTTTCCTAGGGGGCTGGTCC

CGGCTCCAC

337
NCAPG
NM_022346.4
3080-
ACCCAAGCATCAAAGTCTACTCA

3179
GCTAAAGACTAACAGAGGACAG

AGAAAAGTGACAGTTTCAGCTA

GGACGAACAGGAGGTGTCAGAC

TGCTGAAGCCG

338
NCOA5
NM_020967.2
2837-
TGGACATGTTCTCGAGATGGGTG

2936
GCTGTTCGCGACTTTTGTACCAG

AGTGAAATTGTTAGAAGGAGGG

TTTCTGGCTGTGGTTCTAAATGG

AGCCCCAGG

339
NCR1
NM_004829.5
603-
CGATGTTTTGGCTCCTATAACAA

702
CCATGCCTGGTCTTTCCCCAGTG

AGCCAGTGAAGCTCCTGGTCACA

GGCGACATTGAGAACACCAGCC

TTGCACCTG

340
NDRG2
NM_016250.2
1516-
TATGCATCCTCTGTCCTGATCTA

1615
GGTGTCTATAGCTGAGGGGTAAG

AGGTTGTTGTAGTTGTCCTGGTG

CCTCCATCAGACTCTCCCTACTT

GTCCCATA

341
NDUFA4
NM_002489.3
262-
TGGGACAGAAATAACCCAGAGC

361
CCTGGAACAAACTGGGTCCCAAT

GATCAATACAAGTTCTACTCAGT

GAATGTGGATTACAGCAAGCTG

AAGAAGGAAC

342
NDUFAF
NM_174889.4
486-
TCCTGCCTCCACCAGTTCAAACT

2

585
CAAATTAAAGGCCATGCCTCTGC

TCCATACTTTGGAAAGGAAGAAC

CCTCAGTGGCTCCCAGCAGCACT

GGTAAAAC

343
NDUFB3
NM_002491.2
383-
ACAATGGAAGATAGAAGGGACA

482
CCATTAGAAACTATCCAGAAGA

AGCTGGCTGCAAAAGGGCTAAG

GGATCCATGGGGCCGCAATGAA

GCTTGGAGATAC

344
NDUFS4
NM_002495.2
326-
GAGTTTGATACCAGAGAGCGAT

425
GGGAAAATCCTTTGATGGGTTGG

GCATCAACGGCTGATCCCTTATC

CAACATGGTTCTAACCTTCAGTA

CTAAAGAAG

345
NDUFV2
NM_021074.4
687-
TTACTATGAGGATTTGACAGCTA

786
AGGATATTGAAGAAATTATTGAT

GAGCTCAAGGCTGGCAAAATCC

CAAAACCAGGGCCAAGGAGTGG

ACGCTTCTCT

346
NFAT5
NM_138713.3
3857-
CCCAAGAAGCATTTTTTGCAGCA

3956
CCGAACTCAATTTCTCCACTTCA

GTCAACATCAAACAGTGAACAA

CAAGCTGCTTTCCAACAGCAAGC

TCCAATATC

347
NFATC1
NM_172389.1
1985-
CGAATTCTCTGGTGGTTGAGATC

2084
CCGCCATTTCGGAATCAGAGGAT

AACCAGCCCCGTTCACGTCAGTT

TCTACGTCTGCAACGGGAAGAG

AAAGCGAAG

348
NFATC4
NM_00113602
2297-
ACAAGAGGGTTTCCCGGCCAGTC

2.2
2396
CAGGTCTACTTTTATGTCTCCAA

TGGGCGGAGGAAACGCAGTCCT

ACCCAGAGTTTCAGGTTTCTGCC

TGTGATCTG

349
NFKB1
NM_003998.3
3606-
CGGATGCATCTGGGGATGAGGTT

3705
GCTTACTAAGCTTTGCCAGCTGC

TGCTGGATCACAGCTGCTTTCTG

TTGTCATTGCTGTTGTCCCTCTG

CTACGTTC

350
NFKB2
NM_002502.2
826-
ATCTCCGGGGGCATCAAACCTGA

925
AGATTTCTCGAATGGACAAGACA

GCAGGCTCTGTGCGGGGTGGAG

ATGAAGTTTATCTGCTTTGTGAC

AAGGTGCAG

351
NIPBL
NM_133433.3
8755-
GCGCCGTGATGGCCGCAAACTG

8854
GTGCCTTGGGTAGACACTATTAA

AGAGTCAGACATTATTTACAAAA

AAATTGCTCTAACGAGTGCTAAT

AAGCTGACT

352
NLRP3
NM_00107982
416-
AGTGGGGTTCAGATAATGCACGT

1.2
515
GTTTCGAATCCCACTGTGATATG

CCAGGAAGACAGCATTGAAGAG

GAGTGGATGGGTTTACTGGAGTA

CCTTTCGAG

353
NME1-
NM_00101813
484-
ACCTGGAGCGCACCTTCATCGCC

NME2
6.2
583
ATCAAGCCGGACGGCGTGCAGC

GCGGCCTGGTGGGCGAGATCATC

AAGCGCTTCGAGCAGAAGGGAT

TCCGCCTCGT

354
NUDT18
NM_024815.3
1369-
CCCCAGTGGCATCTCCTCATCAC

1468
GTTCTGTGCCGTCCTTGGGAAAG

GCCTGCATTCTGATCCTTCCAGG

CCCTTCGAGCATGGAGGGGCACT

GGGGAAGG

355
NUMB
NM_00100574
2833-
CATAAGATTGATTTATCATTGAT

4.1
2932
GCCTACTGAAATAAAAAGAGGA

AAGGCTGGAAGCTGCAGACAGG

ATCCCTAGCTTGTTTTCTGTCAG

TCATTCATTG

356
NUP153
NM_005124.3
5104-
TTTATGATCCAGCAGATTATTCA

5203
CTGATTTGACATAGTCTGGCTGT

ACCCAGGAATGGAGCCTGCACG

GTGAATGGCTTTGTATAGAACCT

CTTTGTCTA

357
OLR1
NM_002543.3
1524-
ACACATTTTGGGACAAGTGGGG

1623
AGCCCAAGAAAGTAATTAGTAA

GTGAGTGGTCTTTTCTGTAAGCT

AATCCACAACCTGTTACCACTTC

CTGAATCAGT

358
OSBP
ILMN_170637
130-
TTCTCTTCCTTCACCATCTGCAC

6.1
229
TACATTTCTGGCTGATCCCAATC

AGATTCCCGCTAATGGAAGAAGT

TTAGAATCTTTCAGGTGGAATAA

AGTCACAT

359
FAM105
NM_138348.4
2537-
TGCAGATGGTGTTCACATGAACC

B

2636
GGAGACATCACTCTTTAGGATTC

TACTGGCAGCCCCTGAATTGGCT

CAACGTTTGTGGAGGTGGTATTT

CCCTGAAG

360
P2RY10
NM_198333.1
972-
TTACACCATGGTAAAGGAAACC

1071
ATCATTAGCAGTTGTCCCGTTGT

CCGAATCGCACTGTATTTCCACC

CTTTTTGCCTGTGCCTTGCAAGT

CTCTGCTGC

361
PACS1
NM_018026.3
3830-
CGCTGTCTTCGTGGCTTCCACCC

3929
TTGTTAATGATGCTCCTGCCTCT

GCCTCCCAGCCCCTCACCCAGCA

CAGCTCTGCCTGGACTTGGAGAG

ATGGGAGG

362
PANK2
NM_153640.2
824-
AGTGGATAAACTAGTACGAGAT

923
ATTTATGGAGGGGACTATGAGA

GGTTTGGACTGCCAGGCTGGGCT

GTGGCTTCAAGCTTTGGAAACAT

GATGAGCAAG

363
PDCD10
NM_145859.1
901-
AAGAGATGTACTTCTCAGTGGCA

1000
GTATTGAACTGCCTTTATCTGTA

AATTTTAAAGTTTGACTGTATAA

ATTATCAGTCCCTCCTGAAGGGA

TCTAATCC

364
PDGFD
NM_033135.3
3394-
CCTGTGAAAACATCAGTTTCCTG

3493
TACCAAAGTCAAAATGAACGTTA

CATCACTCTAACCTGAACAGCTC

ACAATGTAGCTGTAAATATAAAA

AATGAGAG

365
PDSS1
NM_014317.3
1199-
CATGAAGCAATAAGAGAGATCA

1298
GTAAACTTCGACCATCCCCAGAA

AGAGATGCCCTCATTCAGCTTTC

AGAAATTGTACTCACAAGAGAT

AAATGACAAC

366
PELP1
NM_014389.2
1989-
TGGCCCCGTCTCCTCGCTGCCCA

2088
CCTCCTCTTGCCTGTGCCCTGCA

AGCCTTCTCCCTCGGCCAGCGAG

AAGATAGCCTTGAGGTCTCCTCT

TTCTGCTC

367
PFAS
NM_012393.2
5109-
CATCCCTAGATCCTAACCCTTTA

5208
GTATGCTGGAATTCTACTCTTCA

CTTACTGCATTGACTGTTGTTGA

TTAGTTATTATTGCAAAGCACTG

TCACCGGC

368
PFDN5
NM_145897.2
232-
ATCGATGTGGGAACTGGGTACTA

331
TGTAGAGAAGACAGCTGAGGAT

GCCAAGGACTTCTTCAAGAGGA

AGATAGATTTTCTAACCAAGCAG

ATGGAGAAAA

369
PFDN5
NM_145897.2
331-
ATCCAACCAGCTCTTCAGGAGAA

b

430
GCACGCCATGAAACAGGCCGTC

ATGGAAATGATGAGTCAGAAGA

TTCAGCAGCTCACAGCCCTGGGG

GCAGCTCAGG

370
PGK1
NM_000291.3
1122-
GTCCTGAAAGCAGCAAGAAGTA

1221
TGCTGAGGCTGTCACTCGGGCTA

AGCAGATTGTGTGGAATGGTCCT

GTGGGGGTATTTGAATGGGAAG

CTTTTGCCCG

371
PHF8
NM_015107.2
5704-
ATCAAGGTTTAGAACACCATGAG

5803
ATAGTTACCCCTGATCTCCAGTC

CCTAGCTGGGGGCTGGACAGGG

GGAAGGGAGAGAGGATTTCTAT

TCACCTTTAA

372
PHLPP2
NM_015020.3
7601-
CCAGTTGGGTGTGGCAGATCTAC

7700
TGAATATCAAATGATGCTCTTCT

TCCCATGTAGACCTTCAGCAAAA

GCCGGTACTTGGAAGCCACAGG

CTCACCTTC

373
PHRF1
NM_020901.3
5239-
GGGAAATGGGGGGCATCACCAT

5338
GCCTGCCGTCGGGTTCCTGCGCT

GACACCTGGTCTGTGCACCTGTG

TTGCTCACAGTTGAAAACTGGAC

ACTTTTGTA

374
PI4K2A
NM_018425.3
3886-
TCCATGGAATTGCTGAGACGTGG

3985
CTCCTGGGGCTATTTCTCCCTAA

TAAAGGATGATCCAGGTCCTCAT

TTCCAAAGTCCCAATGCTCTGAA

AACCAAAA

375
PIK3CD
NM_005026.3
4799-
GAGCCAGAAGTAGCCGCCCGCT

4898
CAGCGGCTCAGGTGCCAGCTCTG

TTCTGATTCACCAGGGGTCCGTC

AGTAGTCATTGCCACCCGCGGGG

CACCTCCCT

376
PIM2
NM_006875.3
1947-
TTTTTGGGGGATGGGCTAGGGGA

2046
AATAAGGCTTGCTGTTTGTTCTC

CTGGGGCGCTCCCTCCAACTTTT

GCAGATTCTTGCAACCTCCTCCT

GAGCCGGG

377
PLAC8
NM_00113071
289-
CTGATATGAATGAATGCTGTCTG

5.1
388
TGTGGAACAAGCGTCGCAATGA

GGACTCTCTACAGGACCCGATAT

GGCATCCCTGGATCTATTTGTGA

TGACTATAT

378
PLEKHG
NM_015432.3
6365-
CCAGTTGTGGGTTAAGAATAGGC

4

6464
TAGAGCAGACATTGGGTGTTTCC

ATGCTGTAGGCTGGTGGGGGACC

ATGTGCCTCTAGGCAGTGACTAG

GGTGCCCC

379
POLR2A
NM_000937.4
6539-
CCCCTGCCTGTCCCCAAATTGAA

6638
GATCCTTCCTTGCCTGTGGCTTG

ATGCGGGGCGGGTAAAGGGTAT

TTTAACTTAGGGGTAGTTCCTGC

TGTGAGTGG

380
PPP1R3
XM_927029.1
4342-
CAGAACCTCCTCAGTTCCTTCAC

E

4441
AGTGCAACCCTGTGTACTTGGCC

CGCAACCCAATAGTATTGTGCCT

CACTTCACCTTCCATGGGCAACT

GCCCTCCC

381
PPP2R5
NM_178588.1
941-
ACAGCACCCTCACGGAACCAGT

C

1040
GGTGATGGCACTTCTCAAATACT

GGCCAAAGACTCACAGTCCAAA

AGAAGTAATGTTCTTAAACGAAT

TAGAAGAGAT

382
PPP6C
NM_002721.4
1536-
TTAAGAAATTTCAGCAGCAAAGT

1635
TGTTATTCAGTGGGCACGATGGA

CTCCAAATGCCTCAAGTTATGTA

TACCTGTCCCAGATGTAAACTTC

ATTGTCCT

383
PRG2
NM_002728.4
257-
CTCTGGAAGTGAAGATGCCTCCA

356
AGAAAGATGGGGCTGTTGAGTCT

ATCTCAGTGCCAGATATGGTGGA

CAAAAACCTTACGTGTCCTGAGG

AAGAGGAC

384
PRPF3
NM_004698.2
2116-
CCTACAGAGAACATGGCTCGTGA

2215
GCATTTCAAAAAGCATGGGGCTG

AACACTACTGGGACCTTGCGCTG

AGTGAATCTGTGTTAGAGTCCAC

TGATTGAG

385
PRPF8
NM_006445.3
7091-
ACTCTGCGGATCGGGAGGACCTG

7190
TATGCCTGACCGTTTCCCTGCCT

CCTGCTTCAGCCTCCCGAGGCCG

AAGCCTCAGCCCCTCCAGACAGG

CCGCTGAC

386
C22orf
NM_173566.2
10495-
CCCGTTGAGCTGGCCATCTAGTG

30

10594
CAGTGTGCTCTCAGATTCCATGT

TTGTTGATTGTGTGTCTTCACAA

GCCCCTCTCTGGTGCTGAATTGG

ATTTGAAT

387
BAT2D1
NM_015172.3
9620-
AGAACAGTGAGTACCTAGAACT

9719
GTGCCACTAATTAAAGGAAATCC

TAAGAAGGTGCATTTCTTTACAG

AGCTGTGTCATGCCATCCTTTGG

GCCCTCTGC

388
PRRG4
NM_024081.5
761-
GAAGACCTGAGGAGGCTGCCTT

860
GTCTCCATTGCCGCCTTCTGTGG

AGGATGCAGGATTACCTTCTTAT

GAACAGGCAGTGGCGCTGACCA

GAAAACACAG

389
PSMA3
NM_152132.2
422-
CTTTGGCTACAACATTCCACTAA

521
AACATCTTGCAGACAGAGTGGCC

ATGTATGTGCATGCATATACACT

CTACAGTGCTGTTAGACCTTTTG

GCTGCAGT

390
PSMA4
NM_002789.3
541-
GTACATTGGCTGGGATAAGCACT

640
ATGGCTTTCAGCTCTATCAGAGT

GACCCTAGTGGAAATTACGGGG

GATGGAAGGCCACATGCATTGG

AAATAATAGC

391
PSMA4
NM_002789.4
879-
GAGGAAGAAGAAGCCAAAGCTG

b

978
AGCGTGAGAAGAAAGAAAAAGA

ACAGAAAGAAAAGGATAAATAG

AATCAGAGATTTTATTACTCATT

TGGGGCACCAT

392
PSMA6
NM_002791.2
218-
GGTCGGCTCTACCAAGTAGAATA

317
TGCTTTTAAGGCTATTAACCAGG

GTGGCCTTACATCAGTAGCTGTC

AGAGGGAAAGACTGTGCAGTAA

TTGTCACAC

393
PSMA6
NM_002791.2
866-
GATGCTCACCTTGTTGCTCTAGC

b

965
AGAGAGAGACTAAACATTGTCG

TTAGTTTACCAGATCCGTGATGC

CACTTACCTGTGTGTTTGGTAAC

AACAAACCA

394
PSMB1
NM_002793.3
687-
GCGGCTGGTGAAAGATGTCTTCA

786
TTTCTGCGGCTGAGAGAGATGTG

TACACTGGGGACGCACTCCGGAT

CTGCATAGTGACCAAAGAGGGC

ATCAGGGAG

395
PSMB7
NM_002799.2
421-
GTTACATTGGTGCAGCCCTAGTT

520
TTAGGGGGAGTAGATGTTACTGG

ACCTCACCTCTACAGCATCTATC

CTCATGGATCAACTGATAAGTTG

CCTTATGT

396
PSMB8
NM_004159.4
1216-
ACTCACAGAGACAGCTATTCTGG

1315
AGGCGTTGTCAATATGTACCACA

TGAAGGAAGATGGTTGGGTGAA

AGTAGAAAGTACAGATGTCAGT

GACCTGCTGC

397
PSMC1
NM_002802.2
1487-
CATCCTGTGTCTTTTGGAGTACG

1586
ATGTGTAAGTGCCCATTGGGTGG

CCTGTTGGTCACTGTGCAGCAGT

CTGCTTCCCAATAAAGCGTGCTC

TTTCACAA

398
PSMD7
NM_002811.4
1231-
GAGCTCTCTGCCTCCGGTCACTC

1330
TTGCTGTGGTGCTACGTGGAAGT

GAATGGAGACTGATCTCAAATCT

GAACTGCAGCTTTCGCTGCTGTG

AGTTGGGG

399
PSME3
NM_005789.3
3203-
TCCCGAGTGATACCCATGAACTG

3302
CCAGTAGAGGCTGCTATCGTTCC

ATGTGTAAGGAATGAACTGGTTC

AAGGCGCGTCCTACCCAGTCATT

TTCTTTAC

400
PTGDR
NM_000953.2
2341-
TATGATGACTGAAAGGGAAAAG

2440
TGGAGGAAACGCAGCTGCAACT

GAAGCGGAGACTCTAAACCCAG

CTTGCAGGTAAGAGCTTTCACCT

TTGGTAAAAGA

401
PTGDR2
NM_004778.1
1836-
GCCAATGCTTACTGCGCTAGACG

1935
CTTCATCCCACAATCTTAAGGGG

CAGCTTCTATTAGCCAGTCTTTA

CAGCTGAGCACATTCTGGCTCAG

GGAGGTTA

402
PUM1
NM_00102065
3753-
AAATGTTCTAGTGTAGAGTCTGA

8.1
3852
GACGGGCAAGTGGTTGCTCCAG

GATTACTCCCTCCTCCAAAAAAG

GAATCAAATCCACGAGTGGAAA

AGCCTTTGTA

403
QTRTD1
NM_024638.3
2508-
TTAGATTAGAGTCATAGCCTTAA

2607
TAGCCCTAGTTGTCATCCTGGGA

GACAGGCAACAGTAGAGATATT

TGAGAGCCTAAAGAGAGGTTTG

GCCTGTGGGT

404
RAB10
NM_016131.4
3593-
AGGGCTTTGCCCCTTTTCTGTAA

3692
GTCTCTTGGGATCCTGTGTAGAA

GCTGTTCTCATTAAACACCAAAC

AGTTAAGTCCATTCTCTGGTACT

AGCTACAA

405
RAG1
NM_000448.2
2301-
CAGTCTACATTTGTACTCTTTGT

2400
GATGCCACCCGTCTGGAAGCCTC

TCAAAATCTTGTCTTCCACTCTA

TAACCAGAAGCCATGCTGAGAAC

CTGGAACG

406
RASSF5
NM_182664.2
3061-
TCGTCCTGCATGTCTCTAACATT

3160
AATAGAAGGCATGGCTCCTGCTG

CAACCGCTGTGAATGCTGCTGAG

AACCTCCCTCTATGGGGATGGCT

ATTTTATT

407
RBM14
NM_006328.3
2661-
TGGTATGTATCCAAGTCCCTGCT

2760
GACCACTAATGTTCTAGCTGATG

GTGAGCGGCACAGTCCCACTTCC

CCATCTCCCCAAGTAGGTGGTGT

TAGAAAAC

408
RBM4B
NM_031492.3
1557-
TAGGAGTTGAATCCTTCTCCCTG

1656
CCTACCTGCAGCATCTCCTTTCC

CTTTAAAATGACCATGTAGTGGC

AAGCAGCCTTTTACTCTTCTGTT

AGCTCTGG

409
RBX1
NM_014248.3
158-
GATATTGTGGTTGATAACTGTGC

257
CATCTGCAGGAACCACATTATGG

ATCTTTGCATAGAATGTCAAGCT

AACCAGGCGTCCGCTACTTCAGA

AGAGTGTA

410
RELA
NM_021975.2
361-
GATGGCTTCTATGAGGCTGAGCT

460
CTGCCCGGACCGCTGCATCCACA

GTTTCCAGAACCTGGGAATCCAG

TGTGTGAAGAAGCGGGACCTGG

AGCAGGCTA

411
REPIN1
NM_014374.3
2491-
TGTGTCCAGGCTCTTGTCTGAAC

2590
ACCGCAGCCCCTCCTTCGCTCCT

TCCAGAGCTCAGCATGTCACGGC

AAGGACTGCCGCATTGGTGATGG

AGGGCCAG

412
REPS1
NM_00112861
1289-
CACCAACCAGTACTCTTTTAACC

7.2
1388
ATGCATCCTGCTTCTGTCCAGGA

CCAGACAACAGTACGAACTGTA

GCATCAGCTACAACTGCCATTGA

AATTCGTAG

413
RERE
NM_00104268
5916-
AACCCTCGACCCGAAACCCTCAC

2.1
6015
CAGATAAACTACAGTTTGTTTAG

GAGGCCCTGACCTTCATGGTGTC

TTTGAAGCCCAACCACTCGGTTT

CCTTCGGA

414
RERE
NM_012102.3
7734-
GCATTCTTGTTAGCTTTGCTTTT

b

7833
CTCCCCATATCCCAAGGCGAAGC

GCTGAGATTCTTCCATCTAAAAA

ACCCTCGACCCGAAACCCTCACC

AGATAAAC

415
RFWD2
NM_022457.6
2606-
TTTTCTTTTCCCTCCTTTATGAC

2705
CTTTGGGACATTGGGAATACCCA

GCCAACTCTCCACCATCAATGTA

ACTCCATGGACATTGCTGCTCTT

GGTGGTGT

416
RFX1
NM_002918.4
4187-
ATAAAAATCACTATTTTGTGTGC

4286
TCCGCGTGCTATAGCTTTTGGGG

CGGCCCTGCCCAGTCCCCGTGCC

CACGGGGCTCCCTCTCCCGGTGG

TGAAAGTG

417
RHOB
NM_004040.3
1707-
GGGAGGAGGGAGGATGCGCTGT

1806
GGGGTTGTTTTTGCCATAAGCGA

ACTTTGTGCCTGTCCTAGAAGTG

AAAATTGTTCAGTCCAAGAAACT

GATGTTATT

418
RHOG
NM_001665.3
1045-
CTTTCCACACAGTTGTTGCTGCC

1144
TATTGTGGTGCCGCCTCAGGTTA

GGGGCTCTCAGCCATCTCTAACC

TCTGCCCTCGCTGCTCTTGGAAT

TGCGCCCC

419
RHOU
NM_021205.5
4174-
TTGACAGACTCAAGAGAAACTA

4273
CCCAGGTATTACACAAGCCAAA

ATGGGAGCAAGGCCTTCTCTCCA

GACTATCGTAACCTGGTGCCTTA

CCAAGTTGTG

420
RNASE2
NM_002934.2
331-
TGACCTGTCCTAGTAACAAAACT

430
CGCAAAAATTGTCACCACAGTGG

AAGCCAGGTGCCTTTAATCCACT

GTAACCTCACAACTCCAAGTCCA

CAGAATAT

421
RNF114
NM_018683.3
2246-
AATTCAGATCATCTCAGAAGTCT

2345
GGAGGGAAATCTGGCGAAACCT

TCGTTTGAGGGACTGATGTGAGT

GTATGTCCACCTCACTGGTGGCA

CCGAGAAAC

422
RNF19B
NM_153341.3
2222-
CCCCAGAGCCCAAGGTGCACCG

2321
AGCCCAAGTGCCCATATGAACCT

CTCTGCCCTAGCCGAGGGACAAA

CTGTCTTGAAGCCAGAAGGTGGA

GAAGCCAGA

423
RNF214
NM_207343.3
2068-
ACCTGTAAGCTATGTCTAATGTG

2167
CCAGAAACTCGTCCAGCCCAGTG

AGCTGCATCCAATGGCGTGTACC

CATGTATTGCACAAGGAGTGTAT

CAAATTCT

424
RNF34
NM_025126.3
1619-
CTTCTGTCCTCTTTGGATGAGAT

1718
CAGTGTCCACAAGTGGCCGACAT

GGAACATGCTGAGCAGTGGCTCC

TCTGAATGTTCACTTTATTAGTC

ATGTATAT

425
C20orf
NM_080748.2
274-
CTCAGGATCGGAATGCGGGGTC

52

373
GAGAGCTGATGGGCGGCATTGG

GAAAACCATGATGCAGAGTGGC

GGCACCTTTGGCACATTCATGGC

CATTGGGATGG

426
RPL26
NM_016093.2
4-
CACTCAGGGTCTGAGGCAGCTAG

L1

103
TAGCCGGAGGGTCACCATGAAG

TTCAATCCCTTCGTTACCTCGGA

CCGCAGTAAAAACCGCAAACGT

CACTTCAATG

427
RPL3
NM_00103385
1072-
AGAAGAAAGCATTCATGGGACC

3.1
1171
ACTGAAGAAAGACCGAATTGCA

AAGGAAGAAGGAGCTTAATGCC

AGGAACAGATTTTGCAGTTGGTG

GGGTCTCAATA

428
RPL31
NM_000993.4
20-
CTTGCAACTGCGGCTTTCCTTCT

119
CCCACAATCCTTCGCGCTCTTCC

TTTCCAACTTGGACGCTGCAGAA

TGGCTCCCGCAAAGAAGGGTGGC

GAGAAGAA

429
RPL34
NM_000995.3
471-
ACCTCACCTCAGCTTGAGAGAGC

570
CAGTTGTGTGCATCTCTTTCCAG

TTTTGCATCCAGTGACGTCTGCT

TGGCATCTTGAGATTGTTATGGT

GAGAGTAT

430
RPL39L
NM_052969.1
139-
GCGGGTTCGGGTCGGTGACACGC

238
AGACCTGAGGGAGCTGGGCCCG

CCTTTTCCGCCCGCGCCCCAGGC

CCTTGCAGATCGAGATTTGCGTC

CTAGAGTGG

431
KIAA0
NM_015203.4
4795-
CCCCTTGGGTCCCTCACACAGAG

460

4894
ACACCATCAGCCGGAGTGGTATA

ATCTTACGGAGTCCCCGGCCAGA

CTTTCGGCCTAGGGAACCTTTTC

TCAGCAGA

432
RPS24
NM_001026.4
482-
ATGAAGAAAGTCAGGGGGACTG

581
CAAAGGCCAATGTTGGTGCTGGC

AAAAAGCCGAAGGAGTAAAGGT

GCTGCAATGATGTTAGCTGTGGC

CACTGTGGAT

433
RPS27L
NM_015920.3
241-
TAAAATGTCCAGGTTGCTACAAG

340
ATCACCACGGTTTTCAGCCATGC

TCAGACAGTGGTTCTTTGTGTAG

GTTGTTCAACAGTGTTGTGCCAG

CCTACAGG

434
RPS6
NM_001010.2
172-
GAATGGAAGGGTTATGTGGTCCG

271
AATCAGTGGTGGGAACGACAAA

CAAGGTTTCCCCATGAAGCAGGG

TGTCTTGACCCATGGCCGTGTCC

GCCTGCTAC

435
RSL24D
NM_016304.2
1232-
TGGAGTGACACTACACTCTAGAA

1

1331
TTTCCACTTTGGAGAATACTCAG

TTCCAACTTGTGATTCCTGATAG

AACAGACTTTACTTTTCTAGCCC

AGCATTGA

436
RWDD1
NM_00100746
998-
TGGAGGATGATGAAGATGATCC

4.2
1097
AGACTATAATCCTGCTGACCCAG

AGAGTGACTCAGCTGACTAATGG

ACTGTCCCCATCTGCAGAGAGGC

TTGACTGCC

437
RXRA
NM_002957.5
5301-
AGTAATTTTTAAAGCCTTGCTCT

5400
GTTGTGTCCTGTTGCCGGCTCTG

GCCTTCCTGTGACTGACTGTGAA

GTGGCTTCTCCGTACGATTGTCT

CTGAAACA

438
S100A
NM_005621.1
261-
CAAGATGAACAGGTCGACTTTCA

12 b

360
AGAATTCATATCCCTGGTAGCCA

TTGCGCTGAAGGCTGCCCATTAC

CACACCCACAAAGAGTAGGTAG

CTCTCTGAA

439
S100A8
NM_002964.4
366-
GTTAACTTCCAGGAGTTCCTCAT

465
TCTGGTGATAAAGATGGGCGTGG

CAGCCCACAAAAAAAGCCATGA

AGAAAGCCACAAAGAGTAGCTG

AGTTACTGGG

440
SAMSN1
NM_022136.3
1024-
ACCTGAGCCCCTATCCTTGAGCT

1123
CAGACATCTCCTTAAATAAGTCA

CAGTTAGATGACTGCCCAAGGG

ACTCTGGTTGCTATATCTCATCA

GGAAATTCA

441
SAP130
NM_024545.3
3091-
GATCTCCACCGAATAAACGAACT

b

3190
GATACAGGGAAATATGCAGAGG

TGTAAACTTGTGATGGATCAAAT

CAGTGAAGCCAGAGACTCCATG

CTTAAGGTTT

442
SAP130
NM_024545.3
3720-
CGGTTCTTCTGCCTGACCTTCAA

3819
ATGCCCATGTTGGCCTTTTACAG

CAGTGCCACGGCACCAAGCGAG

CTGCCACATCTCACACTCTAAAG

GGTTTGAAC

443
CIP29
NM_033082.3
622-
AACTGGAACCACAGAGGATACA

721
GAGGCAAAGAAGAGGAAAAGAG

CAGAGCGCTTTGGGATTGCCTGA

TGAAAAGTTCCTGATACTTTCTG

TTCTCCAGTG

444
SFRS
NM_004719.2
4203-
AGTTCTTCTCATGTAAGTAATAA

2IP

4302
CATGAGTACACCAGTTTTGCCTG

CTCCGACAGCAGCCCCAGGAAA

TACGGGAATGGTTCAGGGACCA

AGTTCTGGTA

445
SFRS15
NM_020706.2
3635-
GAGAGAAGGAAGAAGCCCGAGG

3734
AAAGGAAAAGCCTGAGGTGACA

GACAGGGCAGGTGGTAACAAAA

CCGTTGAACCTCCCATTAGCCAA

GTGGGAAATGT

446
RBM16
NM_014892.4
4111-
TGATTATTTTGAAGGGGCCACTT

4210
CTCAACGAAAAGGTGATAATGT

GCCTCAGGTTAATGGTGAAAATA

CAGAGAGACATGCTCAGCCACC

ACCTATACCA

447
SDHA
NM_004168.3
2042-
GTCACTCTGGAATATAGACCCGT

2141
GATCGACAAAACTTTGAACGAG

GCTGACTGTGCCACCGTCCCGCC

AGCCATTCGCTCCTACTGATGAG

ACAAGATGT

448
SEC24C
NM_198597.2
4194-
AGGCAGAGGCAGCTGGAGCGCC

4293
GTTCTCTCCTGCTGGGACACCGC

TTGGGCTTTGGTATTGACTGAGT

GGCTGACAGTTATCTTCCAACCC

CAACTGGCT

449
SEMG1
NM_003007.2
1291-
GGCAGACACCAACATGGATCTC

1390
ATGGGGGATTGGATATTGTAATT

ATAGAGCAGGAAGATGACAGTG

ATCGTCATTTGGCACAACATCTT

AACAACGACC

450
SERPIN
NM_005024.1
891-
AGACAGTTATGATCTCAAGTCAA

B10

990
CCCTGAGCAGTATGGGGATGAGT

GATGCCTTCAGCCAAAGCAAAG

CTGATTTCTCAGGAATGTCTTCA

GCAAGAAAC

451
SETD2
NM_014159.6
7956-
TGGTTAGAAGCCATCAGAGGTGC

8055
AAGGGCTTAGAAAAGACCCTGG

CCAGACCTGACTCCACTCTTAAA

CCTGGGTCTTCTCCTTGGCGGTG

CTGTCAGCG

452
SFMBT1
NM_00100515
2844-
AAGGATCGAAGTTGCTGAAAGG

8.2
2943
CTTCACCTGGACAGTAACCCCTT

GAAGTGGAGTGTGGCAGACGTT

GTGCGGTTCATCAGATCCACTGA

CTGTGCTCCA

453
SFPQ
NM_005066.2
2800-
GGTTATGTAAGCAAAGCTGAACT

2899
GTAAATCTTCAGGAATATGTATT

AAGATTGTGGAATGGGTGTAAG

ACAATTGGTAGGGGGTGAAAGT

GGGTTTGATT

454
SGK1
NM _05627.3
1622-
ACGAGCGTTAGAGTGCCGCCTTA

1721
GACGGAGGCAGGAGTTTCGTTA

GAAAGCGGACGCTGTTCTAAAA

AAGGTCTCCTGCAGATCTGTCTG

GGCTGTGATG

455
SGK
NM_005627.3
173-
GAAGCAGAGGAGGATGGGTCTG

272
AACGACTTTATTCAGAAGATTGC

CAATAACTCCTATGCATGCAAAC

ACCCTGAAGTTCAGTCCATCTTG

AAGATCTCC

456
SGK1 b
NM_005627.3
1814-
GGATATGCTGTGTGAACCGTCGT

1913
GTGAGTGTGGTATGCCTGATCAC

AGATGGATTTTGTTATAAGCATC

AATGTGACACTTGCAGGACACTA

CAACGTGG

457
SH2D3C
NM_170600.2
2795-
AGCACCCCAAGGACACTGTGATC

2894
AACCCGAGAATGTTCTGGGTTCA

ACTCAAGCATCTCCCTTGCACCT

CCAGGGTCCTGCGTGGACTCTGG

GTTCCATC

458
SIK1
NM_173354.3
4185-
TCGCTCATAAAGAAGTTTTTGGG

4284
ATGGGAGAGAATCCAGACCATC

TTGGGGCAGCCAGGCCCTTGCCT

TCATTTTTACAGAGGTAGCACAA

CTGATTCCA

459
SIN3A
NM_015477.2
4666-
TTTATTCCTGACGATTCCCTTGC

4765
TGCCTACCCTTTTCTCTCCTCTG

GTTCTCAACCTCAACGAGTTCAA

ATCAGTTGTCCTTTTTAGCTCCC

GTGGAACT

460
SLAMF8
NM_020125.2
3173-
AACAAATATTGATTGAGGGCGCT

3272
GCATGTGCTGGGTACATTTCTTG

GCACTTGGGAATCAGTAGTCAAG

CGAAACCCTTGCCTTTGAGAGTT

TATGGTCT

461
SLC11A
NM_000578.3
2072-
GCAGGATAGAGTGGGACAGTTC

1

2171
CTGAGACCAGCCAACCTGGGGG

CTTTAGGGACCTGCTGTTTCCTA

GCGCAGCCATGTGATTACCCTCT

GGGTCTCAGT

462
SLC15A
NM_021082.3
2548-
AACTCATTAAAACTTGTGCAGTG

2

2647
TTGCTGGAGCTGGCCTGGTGTCT

CCAAATGACCATGAAAATACAC

ACGTATAATGGAGATCATTCTCT

GTGGGTATG

463
SLC25A
NM_000387.5
1511-
ATCTTCTTCAGTCCCTAGCCAGG

20

1610
AATACCCATTTGATTTCCAGGGT

GCCATCTAATCCTGGGCTGTACA

TGTGGATATGGACTTGAGGCCCA

CCTCTGTG

464
SLC25A
NM_016612.2
1217-
TCCAGCCCCTTGCCCTCTCCTCA

37

1316
CACGTAGATCATTTTTTTTTTGC

AGGGTGCTGCCTATGGGCCCTCT

GCTCCCCAATGCCTTAGAGAGAG

GAGGGGAC

465
SLC45A
NM_033102.2
2455-
AGTTTCTAGGATGAAACACTCCT

3

2554
CCATGGGATTTGAACATATGAAA

GTTATTTGTAGGGGAAGAGTCCT

GAGGGGCAACACACAAGAACCA

GGTCCCCTC

466
SLC6A
NM_003044.4
3220-
GATATTGCTAACTGATCACAGAT

12

3319
TCTTTCCCACCTCACAATCCTTC

CGAATGTGCTCCAGGCAGCACCA

TTTGCCATCCTGCTTCTAACGCA

AACCCCTG

467
SLC6A6
NM_003043.5
4438-
ATTCTAGACCAAAGACACAGGC

4537
AGACCAAGTCCCCAGGCCCCGCC

TGGAAGGAAGTCGTTCCTCAACT

CTCCCCAAGGCACCTGTCTCCAA

TCAGAGCCC

468
SLC9A3
NM_004252.3
1811-
ATTAACATGATTTTCCTGGTTGT

R1

1910
TACATCCAGGGCATGGCAGTGGC

CTCAGCCTTAAACTTTTGTTCCT

ACTCCCACCCTCAGCGAACTGGG

CAGCACGG

469
C14orf
NM_031210.5
46-
CGGCCTCAGCAGCGAGAGGTGC

156

145
TGCGGCGCTGCGTAGAAGTATCA

ATCAGCCGGTTGCTTTTGTGAGA

AGAATTCCTTGGACTGCGGCGTC

GAGTCAGCT

470
SMARCC
NM_003074.3
5281-
CAATGGCCAGGGTTTTACCTACT

1

5380
TCCTGCCAGTCTTTCCCAAAGGA

AACTCATTCCAAATACTTCTTTT

TTCCCCTGGAGTCCGAGAAGGAA

AATGGAAT

471
SNORA
NR_002984.1
30-
CTCGTGGGACTCTAGAGGGAGTC

56

129
AGTCTGCAACAGTAAGTGGTGA

GTTCTTCTGTCCAGCGTCAGTAT

TTTGATGGTGGCTTTAGACTTGC

CAGATAACA

472
SNX11
NM_152244.1
2261-
CCCTCCCTGTCGCCCACTCCTCC

2360
CTCCTCTGGCTATCCTACCCTGT

CTGTGGGCTCTTTTACTACCAGC

CTATGCTGTGGGACTGTCATGGC

ATTTAGTT

473
SOCS1
NM_003745.1
1026-
TTAACTGTATCTGGAGCCAGGAC

1125
CTGAACTCGCACCTCCTACCTCT

TCATGTTTACATATACCCAGTAT

CTTTGCACAAACCAGGGGTTGGG

GGAGGGTC

474
SP2
NM_003110.5
2701-
GGGGGCAATGATGAGCATATGA

2800
ATTTTTTCTCACTCTAGCAATTC

CCTTTTCTAAATGACACAGCATT

TAAACTCAAATCTGGATTCAGAT

AACAGCACC

475
SPA17
NM_017425.3
176-
CAAGGATTTGGGAATCTTCTTGA

275
AGGGCTGACACGCGAGATTCTG

AGAGAGCAACCGGACAATATAC

CAGCTTTTGCAGCAGCCTATTTT

GAGAGCCTTC

476
SPEN
NM_015001.2
11995-
GTATTGCCCACTCATTTGTATAA

12094
GTGCGCTTCGGTACAGCACGGGT

CCTGCTCCCGCGATGTGGAAGTG

TCACACGGCACCTGTACAAAAA

GACTGGCTA

477
SPINK5
NM_006846.3
2596-
GAGCAATGACAAAGAGGATCTG

2695
TGTCGTGAATTTCGAAGCATGCA

GAGAAATGGAAAGCTTATCTGC

ACCAGAGAAAATAACCCTGTTCG

AGGCCCATAT

478
SPN
NM_003123.3
2346-
AGTGCCTGCGTGTGTCCACTCGT

2445
GGGTGTGGTTTGTGTGCAAGAGC

TGAGGATTTGGCGATGCTTGGGA

GGGGTAGTTGTGGGTACAGACG

GTGTGGGGG

479
SREBF1
NM_00100529
3985-
CCCCTCCTTGCTCTGCAGGCACC

1.2
4084
TTAGTGGCTTTTTTCCTCCTGTG

TACAGGGAAGAGAGGGGTACATT

TCCCTGTGCTGACGGAAGCCAAC

TTGGCTTT

480
SFRS4
NM_005626.4
2080-
TACTCATGGCCCACAGTAGAATA

2179
TCCAAAACGCCTTGGCTTTCAGG

CCTGGCCTTTCCTACAGGGAGCT

CAGTAACCTGGACGGCTCTAAGG

CTGGAATG

481
ST6GAL
NM_003032.2
3783-
CTGATTTTAATCTTCGAATCATG

1

3882
ACACTGAGTGCAGAGGAGGTGG

CATTCCGACAGCAGGACATACAT

GTTGGTGTGAAGACTGGGACGA

CACTGGGTAG

482
STAG3
NM_012447.3
3424-
AAGTGCCTGCAGCATGTCTCCCA

3523
GGCACCTGGCCATCCCTGGGGCC

CAGTCACCACCTACTGCCACTCC

CTCAGCCCTGTGGAGAACACAGC

AGAGACCA

483
STAMBP
NM_006463.4
1926-
TTTCCTGTGGTTTATGGCAATAT

2025
GAATGGAGCTTATTACTGGGGTG

AGGGACAGCTTACTCCATTTGAC

CAGATTGTTTGGCTAACACATCC

CGAAGAAT

484
STAT6
NM_003153.4
3725-
ACTGTGCCCAAGTGGGTCCAAGT

3824
GGCTGTGACATCTACGTATGGCT

CCACACCTCCAATGCTGCCTGGG

AGCCAGGGTGAGAGTCTGGGTC

CAGGCCTGG

485
STIP1
NM_006819.2
1906-
CCCGGGGAAGACACAGAGACTC

2005
GTACCTGCGCTGTTTGTGCCGCC

GCTGCCTCTGGGCCCTCCCAGCA

CACGCATGGTCTCTTCACCGCTG

CCCTCGAGT

486
STK16
NM_003691.2
1420-
GGGGTAGCGGGGTCAGGACAAT

1519
CATCTCAGTCCTGCATCTTTTCT

TCTGCTTTCTTCCCTCCAAGAGC

AAAACCTGGGCAAGGGGACTTAC

TGAGTGGGG

487
STK38
NM_007271.3
3269-
TTGTCAGTGAAACTACTTTGGAT

3368
TTTAACCTCTTAGAGGAAGAAAA

AAGGTTAGGGAAGTGTCAACTCT

GGATGAAGGTGATGTGTTTGCCT

CTCAGTCT

488
STOM
NM_004099.5
2953-
TTCTGCCTTGTGAATTCGTAGTC

3052
CAATCAGCTGAAATTAAATCACT

TGGGAGGGACGCATAGAAGGAG

CTCTAGGAACACAGTGCCAGTGC

AGAAGTTTC

489
SYNJ1
NM_003895.3
4746-
CCCTCTGCTCCCGCCCGGCACCA

4845
GCCCTCCAGTAGATCCTTTCACG

ACCTTGGCCTCTAAGGCTTCACC

CACACTGGACTTTACAGAAAGAT

AACGCCAT

490
TAPBP
NM_003190.4
3397-
CTTGCCCTCCCTGGGTCGCAGAC

3496
GAGGTCGGCCTCGTCATTCCCCG

CAGACCGCCGCGCGTCCCTCTTG

TGCGGTTCACCACAGTTGTATTT

AAGTGATC

491
TAX1BP
NM_00107986
2081-
CAGCCAGCCTGCTCGAAACTTTA

1
4.2
2180
GTCGGCCTGATGGCTTAGAGGAC

TCTGAGGATAGCAAAGAAGATG

AGAATGTGCCTACTGCTCCTGAT

CCTCCAAGT

492
TBC1D
NM_015188.1
5451-
TTCCAAGGAATGCACTAAGCCTT

12

5550
CAGTCTTTTTAGACTGACAGTAC

TGGCAGCTAAAATATTGTACTGT

ATCTTCTCTTGAGCCCAGTATGT

AGGAAATA

493
TBCE
NM_00107951
1541-
TATGCTGAAAAACCAGCTACTAA

5.2
1640
CACTGAAGATAAAATACCCTCAT

CAACTTGATCAGAAAGTCCTGGA

GAAACAACTGCCGGGCTCCATG

ACAATTCAA

494
TBK1
NM_013254.2
1611-
ACCAGTCTTCAGGATATCGACAG

1710
CAGATTATCTCCAGGTGGATCAC

TGGCAGACGCATGGGCACATCA

AGAAGGCACTCATCCGAAAGAC

AGAAATGTAG

495
TBP
NM_003194.4
1441-
TGTAAGTGCCCACCGCGGGATGC

1540
CGGGAAGGGGCATTATTTGTGCA

CTGAGAACACCGCGCAGCGTGA

CTGTGAGTTGCTCATACCGTGCT

GCTATCTGG

496
TCF20
NM_181492.2
6765-
CCAGGCCTGTGTTGCCAGAGCTG

6864
GCAGTGTGAGCTGTAGGCAGGG

ACGGGGAGGGACTGTCGCTGTG

ATCAGAGTGGGTTAAGCTGACCA

GGAACACCCA

497
TCF7L2
NM_030756.4
2067-
GGCCCACCTGTCCATGATGCCTC

2166
CGCCACCCGCCCTCCTGCTCGCT

GAGGCCACCCACAAGGCCTCCG

CCCTCTGTCCCAACGGGGCCCTG

GACCTGCCC

498
TCP1
NM_030752.2
254-
GTGTTCGGTGACCGCAGCACTGG

353
GGAAACGATCCGCTCCCAAAAC

GTTATGGCTGCAGCTTCGATTGC

CAATATTGTAAAAAGTTCTCTTG

GTCCAGTTG

499
TFCP2
NM_005653.4
2271-
CCTCTGAAAACGGCCCTCTTGAA

2370
GGGGGATATGAATGGAGATTTG

AAGGTCTGCAAGAACCTGACTCG

TCTGACTGTGTGTGGAGGAGTCC

AGGCCATGG

500
TGIF1
NM_003244.2
1041-
ACCTCAACCAGGACTTCAGTGGA

1140
TTTCAGCTTCTAGTGGATGTTGC

ACTCAAACGGGCTGCAGAGATG

GAGCTTCAGGCAAAACTTACAGC

TTAACCCAT

501
TGIF1
NM_173208.1
691-
CCCCGGGATCAGTTTTGGCTCGT

b

790
CCATCAGTGATCTGCCATACCAC

TGTGACTGCATTGAAAGATGTCC

CTTTCTCTCTCTGCCAGTCGGTC

GGTGTGGG

502
TIAM1
NM_003253.2
5293-
CCTAACTCTGCCCACCCTCCTGT

5392
ACCGTCGACAAGAATGTCCCCTT

AGGTCGCGCTCTTGCACACACGG

TTTTGGCAGCTGACTTGGTTCTG

AAGCCATG

503
TIMM8B
ENST0000050
339-
GAATGACAGAAGCAAAGGACTT

4148.1
438
GTTACTAAGCAGATTTAAGGGTC

AGTGGGGGAAGGCTATCAACCC

ATTGTCAGATCAGCATCAGGCTG

TTATCAAGTC

504
TM2D2
NM_078473.2
2970-
ACCCATCATCCATCTGCCCACAA

3069
ACCTGGCCAAATGTGATACAACC

TGAAAACCTGATGGACTAAAGG

AGTACTATTTAACAATTGATTGC

CTTTGCACT

505
TM9SF1
NM_006405.6
1996-
CGCTGGTGGTGGCGATCTGTGCT

2095
GAGTGTTGGCTCCACCGGCCTCT

TCATCTTCCTCTACTCAGTTTTC

TATTATGCCCGGCGCTCCAACAT

GTCTGGGG

506
CCDC72
NM_015933.4
124-
GAGGAGCAGAAGAAACTCGAGG

223
AGCTAAAAGCGAAGGCCGCGGG

GAAGGGGCCCTTGGCCACAGGT

GGAATTAAGAAATCTGGCAAAA

AGTAAGCTGTTC

507
TMBIM6
NM_003217.2
2282-
CTCTCCCTATTCACAACCAGTGC

2381
ACAGTTTGACACAGTGGCCTCAG

GTTCACAGTGCACCATGTCACTG

TGCTATCCTACGAAATCATTTGT

TTCTAAGT

508
TMC8
NM_152468.4
2238-
AGGCCAATGCCAGGGCCATCCA

2337
CAGGCTCCGGAAGCAGCTGGTGT

GGCAGGTTCAGGAGAAGTGGCA

CCTGGTGGAGGACCTGTCGCGAC

TGCTGCCGGA

509
TMCO1
NM_019026.3
992-
TCATTTACATAAGTATTTTCTGT

1091
GGGACCGACTCTCAAGGCACTGT

GTATGCCCTGCAAGTTGGCTGTC

TATGAGCATTTAGAGATTTAGAA

GAAAAATT

510
TMEM
NM_00110082
7652-
AGGAGAATAAATGTTGGAGGGG

170B
9.2
7751
TAATACACAAAAACAAAGGCAT

ATTTGATGAAGTACCCTGTGTTA

TGTGAACACAATTTCCCCTTCTG

TTAAGACTAT

511
TMEM
NM_00108054
1313-
GCTCTGTGAAGGCAATGAGTGTC

218
6.2
1412
ACTTCCCTCTGCTCTAATAAAGC

AATAAATAATAGCTAAAGGGCT

GACTTTCACTTCGAACTCTTGGC

CACGGCTTT

512
TMEM70
NM_017866.5
1952-
GGTGGTTAGCTATACGGGAAATG

2051
GTAAGTAGTGTTGTCTTCAGTAT

CTTAATTTGTTTCTGCAACTGTG

CACTCCTCCCTTGGTGGCACCCT

ATGGGTGT

513
TMSB4X
NM_021109.3
286-
TTAACTTTGTAAGATGCAAAGAG

385
GTTGGATCAAGTTTAAATGACTG

TGCTGCCCCTTTCACATCAAAGA

ACTACTGACAACGAAGGCCGCG

CCTGCCTTT

514
TNFRSF
NM_001561.5
1848-
GCCTGGAGGAAGTTTTGGAAAG

9

1947
AGTTCAAGTGTCTGTATATCCTA

TGGTCTTCTCCATCCTCACACCT

TCTGCCTTTGTCCTGCTCCCTTT

TAAGCCAGG

515
TNFSF
NM_003808.3
811-
AGTCAGAGAGCCGGCACTCTCA

13

910
GTTGCCCTCTGGTTGAGTTGGGG

GGCAGCTCTGGGGGCCGTGGCTT

GTGCCATGGCTCTGCTGACCCAA

CAAACAGAG

516
TNFSF8
NM_001244.3
519-
CCCTCAAAGGAGGAAATTGCTCA

618
GAAGACCTCTTATGTATCCTGAA

AAGGGCTCCATTCAAGAAGTCAT

GGGCCTACCTCCAAGTGGCAAA

GCATCTAAA

517
TOMM7
NM_019059.2
251-
TCTGGCTCGGATAAGAGATGGG

350
ACATCATTCAGTCACTAGTTGGA

TGGCACAAGGCTCTTCACAGACG

CATCTGTAGCAGAGTGGATCTTG

TACTAACTT

518
TP53BP
NM_005657.2
5591-
TACTTCCTGTGCCTTGCCAGTGG

1

5690
GATTCCTTGTGTGTCTCATGTCT

GGGTCCATGATAGTTGCCATGCC

AACCAGCTCCAGAACTACCGTAA

TTATCTGT

519
TPR
NM_003292.2
7194-
TCTCCCCTCCACCAGCCAGGATC

7293
CTCCTTCTAGCTCATCTGTAGAT

ACTAGTAGTAGTCAACCAAAGCC

TTTCAGACGAGTAAGACTTCAGA

CAACATTG

520
TPT1
NM_003295.3
18-
GCCTGCGTCGCTTCCGGAGGCGC

117
AGCGGGCGATGACGTAGAGGGA

CGTGCCCTCTATATGAGGTTGGG

GAGCGGCTGAGTCGGCCTTTTCC

GCCCGCTCC

521
TRAF
NM_147686.3
2449-
GCCAGTGTCCCATATGTTCCTCC

3IP2

2548
TGACAGTTTGATGTGTCCATTCT

GGGCCTCTCAGTGCTTAGCAAGT

AGATAATGTAAGGGATGTGGCA

GCAAATGGA

522
TRAF6
NM_145803.1
1840-
CACCCGCTTTGACATGGGTAGCC

1939
TTCGGAGGGAGGGTTTTCAGCCA

CGAAGTACTGATGCAGGGGTAT

AGCTTGCCCTCACTTGCTCAAAA

ACAACTACC

523
LBA1
NM_014831.2
10132-
CTGGGAAACCTTCATGCCTCTCT

10231
GATGGTTACTGCCCACCCTTACC

CCACCCCTCAGCTCAGCCTGGTA

TGGAAAGCAAGGTGCACGTTGG

TCTTTGATT

524
TRIM21
NM_003141.3
1637-
TCTGCAGAGGCATCCGGATCCCA

1736
GCAAGCGAGCTTTAGCAGGGAA

GTCACTTCACCATCAACATTCCT

GCCCCAGATGGCTTTGTGATTCC

CTCCAGTGA

525
TRIM32
NM_012210.3
2681-
GTGCTACCAAAGGGGATACACA

2780
AGCCCTTTAGGAAGCAGTACCTC

TCGCCTGGAGGATCTGTGCCATC

TTGGATTGAGAATTGCAGATGTG

ACAGAATGG

526
TRIM39
NM_021253.3
3141-
CTGCTATTCGGGTAATCTTCACA

3240
GAAATGACTGAGAGAAGAATCT

GCAGTTTACTGAGGGCATTTCAG

TTCCTCCTACCACCTCAACAGGA

CTTTGTCCA

527
TRIM39
NM_172016.2
2841-
CTCTATACCAATAAGTCAGTCAC

b

2940
CTTGCTCCTCTCCAGAGGCAAAG

TGGAAGAGATCCTGCAAGACAC

ATCTATCCTTTCACAGTGTTCCC

AAGGGAACT

528
TRRAP
NM_003496.3
12169-
AGTTGATGAACCCATCATGCTGG

12268
TTTTTCTCTGAGCACAAAGTTTT

AGGCTGTACACAGCCAGCCTTGG

GAATCTCGTTGAGCGTTCGGCGT

GGATCCAC

529
TSC1
NM_000368.4
8068-
CCCCAGACCAACCCTTCCCTCCC

8167
TTTCCCCACCTCTTACAGTGTTT

GGACAGGAGGGTATGGTGCTGCT

CTGTGTAGCAAGTACTTTGGCTT

ATGAAAGA

530
TTC9
NM_015351.1
4050-
TACTAATCAGGCATCTGACCTGC

4149
ACTGTCATCCCCTGCCTGGACTT

TTGCGATGGACTCTTTGGGGGAA

AAACTAACGCTTTTTAATTATTG

TGAAAGCA

531
TTN
NM_133378.4
850-
TCGACTGCTCAGATCTCAGAATC

949
AAGACAAACCCGAATTGAAAAG

AAGATTGAAGCCCACTTTGATGC

CAGATCAATTGCAACAGTTGAGA

TGGTCATAG

532
TUBB
NM_178014.2
2223-
CAAAAAAGAATGAACACCCCTG

2322
ACTCTGGAGTGGTGTATACTGCC

ACATCAGTGTTTGAGTCAGTCCC

CAGAGGAGAGGGGAACCCTCCT

CCATCTTTTT

533
TUG1
NR_002323.2
7082-
TAAGCTAGAGGTCATGGTCACTG

7181
AAATTACTTTCCAAAGTGGAAGA

CAAAATGAAACAGGAACTGAGG

GAATATTTAAGATCCCACAGAAG

CGTAAAAAT

534
TXN
NM_003329.3
152-
TTGGATCCATTTCCATCGGTCCT

251
TACAGCCGCTCGTCAGACTCCAG

CAGCCAAGATGGTGAAGCAGATC

GAGAGCAAGACTGCTTTTCAGGA

AGCCTTGG

535
TXNDC
NM_032731.3
378-
TCATCTACTGCCAAGTAGGAGAA

17

477
AAGCCTTATTGGAAAGATCCAAA

TAATGACTTCAGAAAAAACTTGA

AAGTAACAGCAGTGCCTACACTA

CTTAAGTA

536
TXNRD1
NM_00109377
3348-
CTCAGTTGCAGCACTGAGTGGTC

1.2
3447
AAAATACATTTCTGGGCCACCTC

AGGGAACCCATGCATCTGCCTGG

CATTTAGGCAGCAGAGCCCCTGA

CCGTCCCC

537
TXNRD1
NM_182743.2
2438-
TGTTGCATGGAAGGGATAGTTTG

b

2537
GCTCCCTTGGAGGCTATGTAGGC

TTGTCCCGGGAAAGAGAACTGTC

CTGCAGCTGAAATGGACTGTTCT

TTACTGAC

538
U2AF2
NM_007279.2
2871-
TTTATGGCCAAACTATTTTGAAT

2970
TTTGTTGTCCGGCCCTCAGTGCC

CTGCCCTCTCCCTTACCAGGACC

ACAGCTCTGTTCCTTCGGCCTCT

GGTCCTCT

539
UBA1
NM_003334.3
3307-
CCGCCACGTGCGGGCGCTGGTGC

3406
TTGAGCTGTGCTGTAACGACGAG

AGCGGCGAGGATGTCGAGGTTC

CCTATGTCCGATACACCATCCGC

TGACCCCGT

540
UBC
NM_021009.3
1876-
TGCAGATCTTCGTGAAGACCCTG

1975
ACTGGTAAGACCATCACTCTCGA

AGTGGAGCCGAGTGACACCATT

GAGAATGTCAAGGCAAAGATCC

AAGACAAGGA

541
UBE2G1
NM_003342.4
685-
ACGCTGGCTCCCTATCCACACTG

784
TGGAAACCATCATGATTAGTGTC

ATTTCTATGCTGGCAGACCCTAA

TGGAGACTCACCTGCTAATGTTG

ATGCTGCG

542
UBE2I
NM_194259.2
288-
CTGCTCTGCTGACTGGGGAAGTC

387
ATCGTGCCACCCAGAACCTGAGT

GCGGGCCTCTCAGAGCTCCTTCG

TCCGTGGGTCTGCCGGGGACTGG

GCCTTGTC

543
UBTF
NM_00107668
2724-
GGGGGTCCCAAAGAGTTTGATG

3.1
2823
AGGCCCTCCACACCTGCGGCCCA

ATCCAAGGTGGGGTGGAAGCTT

GGGGAAGACCCATTCCTTCCCAG

AGGGGCCTGC

544
UQCRQ
NM_014402.4
97-
TGACGCGGATGCGGCATGTGATC

196
AGCTACAGCTTGTCACCGTTCGA

GCAGCGCGCCTATCCGCACGTCT

TCACTAAAGGAATCCCCAATGTT

CTGCGCCG

545
USP16
NM_00103241
2487-
TCTATTCCTTATATGGAGTTGTT

0.1
2586
GAACACAGTGGTACTATGAGGTC

GGGGCATTACACTGCCTATGCCA

AGGCAAGAACCGCAAATAGTCAT

CTCTCTAA

546
USP21
NM_012475.4
1499-
CCTTTTCACTAAGGAAGAAGAGC

1598
TAGAGTCGGAGAATGCCCCAGT

GTGTGACCGATGTCGGCAGAAA

ACTCGAAGTACCAAAAAGTTGA

CAGTACAAAGA

547
USP34
NM_014709.3
10104-
AGGAGCACACTGTAGACAGCTG

10203
CATCAGTGACATGAAAACAGAA

ACCAGGGAGGTCCTGACCCCAA

CGAGCACTTCTGACAATGAGACC

AGAGACTCCTC

548
USP5
NM_003481.2
2720-
AGAGCAGAGGGGCAGCGATAGA

2819
CTCTGGGGATGGAGCAGGACGG

GGACGGGAGGGGCCGGCCACCT

GTCTGTAAGGAGACTTTGTTGCT

TCCCCTGCCCC

549
USP9Y
NM_004654.3
86-
GGTGTGGAAAGACTTTTCTGGGC

185
TCAGAGGTGAAACTGACCCTTGT

GTATCAGCAGCATTTCTGACTGA

CTGAGAGAGTGTAGTGATTAACA

GAGTTGTG

550
VPS37C
NM_017966.4
2579-
TTATAAAGAGAAATCACTAATGG

2678
ACTCTACTGGTTTGAGTGCTTCT

GAGCTGGATGACCGACCGCCTGT

ATGTTTGTGTAATTAATTGCCAT

AATAAACT

551
WDR1
NM_005112.4
2325-
AACTGTTGCCTGTCAGTGTTTAC

2424
AAACTAGTGCGTTGACGGCACCG

TGTCCAAGTTTTTAGAACCCTTG

TTAGCCAGACCGAGGTGTCCTGG

TCACCGTT

552
WDR91
NM_014149.3
2777-
CAGGCTCTCCTGTTGCTTTGCCA

2876
TGGAGCCAGGTCAGCTCTCTGTC

TGTTCTGCTGGGTAACAAGGTTT

GGCAGTTCCTGTTTCTCTGGGCT

TAAGTCAA

553
XCL2
NM_003175.3
378-
GTAGTCTCTGGCACCCTGTCCGT

477
CTCCAGCCAGCCAGCTCATTTCA

CTTTACACCCTCATGGACTGAGA

TTATACTCACCTTTTATGAAAGC

ACTGCATG

554
XPC
NR_027299.1
3168-
CTGGATGGTGGTGCATCCGTGAA

3267
TGCGCTGATCGTTTCTTCCAGTT

AGAGTCTTCATCTGTCCGACAAG

TTCACTCGCCTCGGTTGCGGACC

TAGGACCA

555
YPEL1
NM_013313.4
3672-
GCTCATTTTTAAACCAAATGAAC

3771
AGACCATGAGCTGGCTTCAGGG

GAAGTGCTATTCACAGGACCATA

TCCACCACCCTCTTAAATTCCTA

AACAATATC

556
ZMIZ1
NM_020338.3
7171-
ATGATCACAGGTGATTCACACGT

7270
ACACACATAAACACACCCACCA

GTGCAGCCTGAAGTAACTCCCAC

AGAAACCATCATCGTCTTTGTAC

ATCGTATGT

557
ZNF143
NM_003442.5
2292-
TATCAGATCACAAACTCCTAGAG

2391
TCTACATGCAAGACTAGTAAAGT

CTTATGGAGTCTTATGATGGATT

TTTAACTTCCCGTGGAAAAAAAA

ATAAAGGC

558
ZNF239
NM_00109928
1496-
AGAGCTCCAACCTTCACATCCAC

3.1
1595
CAGCGGGTTCACAAGAAAGATC

CTCGCTAACTGACATTAGCCCAT

TCAGGTCTTCACAGCGCTCATAC

TGTAAAAAC

559
ZNF341
NM_032819.4
3247-
CAGACGGTTCCCCACAGCATCCT

3346
CAGACAGCTCTGTGATGTAGCTT

TTAGGAGGCACTCAGGTGTCACG

GCTAGACTGCAGCTATGAGACA

GATCTGGCT

C. Polymerase Chain Reaction (PCR) Techniques

Another suitable quantitative method is RT-PCR, which can be used to compare mRNA levels in different sample populations, in normal and tumor tissues, to characterize patterns of gene expression, to discriminate between closely related mRNAs, and to analyze RNA structure. The first step is the isolation of mRNA from a target sample (e.g., typically total RNA isolated from human PBMC). mRNA can be extracted, for example, from frozen or archived paraffin-embedded and fixed (e.g. formalin-fixed) tissue samples.

General methods for mRNA extraction are well known in the art, such standard textbooks of molecular biology. In particular, RNA isolation can be performed using a purification kit, buffer set and protease from commercial manufacturers, according to the manufacturer's instructions. Exemplary commercial products include TRI-REAGENT, Qiagen RNeasy mini-columns, MASTERPURE Complete DNA and RNA Purification Kit (EPICENTRE®, Madison, Wis.), Paraffin Block RNA Isolation Kit (Ambion, Inc.) and RNA Stat-60 (Tel-Test). Conventional techniques such as cesium chloride density gradient centrifugation may also be employed.

The first step in gene expression profiling by RT-PCR is the reverse transcription of the RNA template into cDNA, followed by its exponential amplification in a PCR reaction. The two most commonly used reverse transcriptases are avilo myeloblastosis virus reverse transcriptase (AMV-RT) and Moloney murine leukemia virus reverse transcriptase (MMLV-RT). The reverse transcription step is typically primed using specific primers, random hexamers, or oligo-dT primers, depending on the circumstances and the goal of expression profiling. See, e.g., manufacturer's instructions accompanying the product GENEAMP RNA PCR kit (Perkin Elmer, Calif., USA). The derived cDNA can then be used as a template in the subsequent RT-PCR reaction.

The PCR step generally uses a thermostable DNA-dependent DNA polymerase, such as the Taq DNA polymerase, which has a 5′-3′ nuclease activity but lacks a 3′-5′ proofreading endonuclease activity. Thus, TAQMAN® PCR typically utilizes the 5′-nuclease activity of Taq or Tth polymerase to hydrolyze a hybridization probe bound to its target amplicon, but any enzyme with equivalent 5′ nuclease activity can be used. Two oligonucleotide primers are used to generate an amplicon typical of a PCR reaction. In one embodiment, the target sequence is shown in Table III. A third oligonucleotide, or probe, is designed to detect nucleotide sequence located between the two PCR primers. The probe is non-extendible by Taq DNA polymerase enzyme, and is labeled with a reporter fluorescent dye and a quencher fluorescent dye. Any laser-induced emission from the reporter dye is quenched by the quenching dye when the two dyes are located close together as they are on the probe. During the amplification reaction, the Taq DNA polymerase enzyme cleaves the probe in a template-dependent manner. The resultant probe fragments disassociate in solution, and signal from the released reporter dye is free from the quenching effect of the second fluorophore. One molecule of reporter dye is liberated for each new molecule synthesized, and detection of the unquenched reporter dye provides the basis for quantitative interpretation of the data.

TaqMan® RT-PCR can be performed using commercially available equipment. In a preferred embodiment, the 5′ nuclease procedure is run on a real-time quantitative PCR device such as the ABI PRISM 7900® Sequence Detection System®. The system amplifies samples in a 96-well format on a thermocycler. During amplification, laser-induced fluorescent signal is collected in real-time through fiber optic cables for all 96 wells, and detected at the CCD. The system includes software for running the instrument and for analyzing the data. 5′-Nuclease assay data are initially expressed as Ct, or the threshold cycle. As discussed above, fluorescence values are recorded during every cycle and represent the amount of product amplified to that point in the amplification reaction. The point when the fluorescent signal is first recorded as statistically significant is the threshold cycle (C_t).

To minimize errors and the effect of sample-to-sample variation, RT-PCR is usually performed using an internal standard. The ideal internal standard is expressed at a constant level among different tissues, and is unaffected by the experimental treatment. RNAs most frequently used to normalize patterns of gene expression are mRNAs for the housekeeping genes glyceraldehyde-3-phosphate-dehydrogenase (GAPDH) and β-actin.

Real time PCR is comparable both with quantitative competitive PCR, where internal competitor for each target sequence is used for normalization, and with quantitative comparative PCR using a normalization gene contained within the sample, or a housekeeping gene for RT-PCR.

In another PCR method, i.e., the MassARRAY-based gene expression profiling method (Sequenom, Inc., San Diego, Calif.), following the isolation of RNA and reverse transcription, the obtained cDNA is spiked with a synthetic DNA molecule (competitor), which matches the targeted cDNA region in all positions, except a single base, and serves as an internal standard. The cDNA/competitor mixture is PCR amplified and is subjected to a post-PCR shrimp alkaline phosphatase (SAP) enzyme treatment, which results in the dephosphorylation of the remaining nucleotides. After inactivation of the alkaline phosphatase, the PCR products from the competitor and cDNA are subjected to primer extension, which generates distinct mass signals for the competitor- and cDNA-derived PCR products. After purification, these products are dispensed on a chip array, which is pre-loaded with components needed for analysis with matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) analysis. The cDNA present in the reaction is then quantified by analyzing the ratios of the peak areas in the mass spectrum generated.

Still other embodiments of PCR-based techniques which are known to the art and may be used for gene expression profiling include, e.g., differential display, amplified fragment length polymorphism (iAFLP), and BeadArray™ technology (Illumina, San Diego, Calif.) using the commercially available Luminex100 LabMAP system and multiple color-coded microspheres (Luminex Corp., Austin, Tex.) in a rapid assay for gene expression; and high coverage expression profiling (HiCEP) analysis.

D. Microarrays

Differential gene expression can also be identified, or confirmed using the microarray technique. Thus, the expression profile of lung cancer-associated genes can be measured in either fresh or paraffin-embedded tissue, using microarray technology. In this method, polynucleotide sequences of interest (including cDNAs and oligonucleotides) are plated, or arrayed, on a microchip substrate. The arrayed sequences are then hybridized with specific DNA probes from cells or tissues of interest. Just as in the other methods and compositions herein, the source of mRNA is total RNA isolated from whole blood of controls and patient subjects.

In one embodiment of the microarray technique, PCR amplified inserts of cDNA clones are applied to a substrate in a dense array. In one embodiment, all 559 nucleotide sequences from Table III are applied to the substrate. The microarrayed genes, immobilized on the microchip, are suitable for hybridization under stringent conditions. Fluorescently labeled cDNA probes may be generated through incorporation of fluorescent nucleotides by reverse transcription of RNA extracted from tissues of interest. Labeled cDNA probes applied to the chip hybridize with specificity to each spot of DNA on the array. After stringent washing to remove non-specifically bound probes, the chip is scanned by confocal laser microscopy or by another detection method, such as a CCD camera. Quantitation of hybridization of each arrayed element allows for assessment of corresponding mRNA abundance. With dual color fluorescence, separately labeled cDNA probes generated from two sources of RNA are hybridized pairwise to the array. The relative abundance of the transcripts from the two sources corresponding to each specified gene is thus determined simultaneously. The miniaturized scale of the hybridization affords a convenient and rapid evaluation of the expression pattern for large numbers of genes. Such methods have been shown to have the sensitivity required to detect rare transcripts, which are expressed at a few copies per cell, and to reproducibly detect at least approximately two-fold differences in the expression levels. Microarray analysis can be performed by commercially available equipment, following manufacturer's protocols.

Other useful methods summarized by U.S. Pat. No. 7,081,340, and incorporated by reference herein include Serial Analysis of Gene Expression (SAGE) and Massively Parallel Signature Sequencing (MPSS). Briefly, serial analysis of gene expression (SAGE) is a method that allows the simultaneous and quantitative analysis of a large number of gene transcripts, without the need of providing an individual hybridization probe for each transcript. First, a short sequence tag (about 10 to 14 bp) is generated that contains sufficient information to uniquely identify a transcript, provided that the tag is obtained from a unique position within each transcript. Then, many transcripts are linked together to form long serial molecules, that can be sequenced, revealing the identity of the multiple tags simultaneously. The expression pattern of any population of transcripts can be quantitatively evaluated by determining the abundance of individual tags, and identifying the gene corresponding to each tag. For more details see, e.g. Velculescu et al., Science 270:484 487 (1995); and Velculescu et al., Cell 88:243 51 (1997), both of which are incorporated herein by reference.

Gene Expression Analysis by Massively Parallel Signature Sequencing (MPSS), described by Brenner et al., Nature Biotechnology 18:630 634 (2000) (which is incorporated herein by reference), is a sequencing approach that combines non-gel-based signature sequencing with in vitro cloning of millions of templates on separate 5 μm diameter microbeads. First, a microbead library of DNA templates is constructed by in vitro cloning. This is followed by the assembly of a planar array of the template-containing microbeads in a flow cell at a high density (typically greater than 3×10⁶microbeads/cm²). The free ends of the cloned templates on each microbead are analyzed simultaneously, using a fluorescence-based signature sequencing method that does not require DNA fragment separation. This method has been shown to simultaneously and accurately provide, in a single operation, hundreds of thousands of gene signature sequences from a yeast cDNA library.

E. Immunohistochemistry

Immunohistochemistry methods are also suitable for detecting the expression levels of the gene expression products of the informative genes described for use in the methods and compositions herein. Antibodies or antisera, preferably polyclonal antisera, and most preferably monoclonal antibodies, or other protein-binding ligands specific for each marker are used to detect expression. The antibodies can be detected by direct labeling of the antibodies themselves, for example, with radioactive labels, fluorescent labels, hapten labels such as, biotin, or an enzyme such as horse radish peroxidase or alkaline phosphatase. Alternatively, unlabeled primary antibody is used in conjunction with a labeled secondary antibody, comprising antisera, polyclonal antisera or a monoclonal antibody specific for the primary antibody. Protocols and kits for immunohistochemical analyses are well known in the art and are commercially available.

III. COMPOSITIONS OF THE INVENTION

The methods for diagnosing lung cancer described herein which utilize defined gene expression profiles permit the development of simplified diagnostic tools for diagnosing lung cancer, e.g., NSCLC vs. non-cancerous nodule. Thus, a composition for diagnosing lung cancer in a mammalian subject as described herein can be a kit or a reagent. For example, one embodiment of a composition includes a substrate upon which said polynucleotides or oligonucleotides or ligands or ligands are immobilized. In another embodiment, the composition is a kit containing the relevant 5 or more polynucleotides or oligonucleotides or ligands, optional detectable labels for same, immobilization substrates, optional substrates for enzymatic labels, as well as other laboratory items. In still another embodiment, at least one polynucleotide or oligonucleotide or ligand is associated with a detectable label.

In one embodiment, a composition for diagnosing lung cancer in a mammalian subject includes 5 or more PCR primer-probe sets. Each primer-probe set amplifies a different polynucleotide sequence from a gene expression product of 5 or more informative genes found in the blood of the subject. These informative genes are selected to form a gene expression profile or signature which is distinguishable between a subject having lung cancer and a subject having a non-cancerous nodule. Changes in expression in the genes in the gene expression profile from that of a reference gene expression profile are correlated with a lung cancer, such as non-small cell lung cancer (NSCLC).

In one embodiment of this composition, the informative genes are selected from among the genes identified in Table I. In another embodiment of this composition, the informative genes are selected from among the genes identified in Table II. This collection of genes is those for which the gene product expression is altered (i.e., increased or decreased) versus the same gene product expression in the blood of a reference control (i.e., a patient having a non-cancerous nodule). In one embodiment, polynucleotide or oligonucleotide or ligands, i.e., probes, are generated to 5 or more informative genes from Table I or Table II for use in the composition (the CodeSet). An example of such a composition contains probes to a targeted portion of the 559 genes of Table I. In another embodiment, probes are generated to all 559 genes from Table I for use in the composition. In another embodiment, probes are generated to the first 539 genes from Table I for use in the composition. In another embodiment, probes are generated to the first 3 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 5 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 10 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 15 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 20 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 25 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 30 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 35 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 40 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 45 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 50 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 60 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 65 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 70 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 75 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 80 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 85 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 90 genes from Table I or Table II for use in the composition. In yet another embodiment, probes are generated to the first 95 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 100 genes from Table I or Table II for use in the composition. In another embodiment, probes are generated to the first 200 genes from Table I for use in the composition. In yet another embodiment, probes are generated to 300 genes from Table I for use in the composition. Still other embodiments employ probes to a targeted portion of other combinations of the genes in Table I or Table II. The selected genes from the Table need not be in rank order; rather any combination that clearly shows a difference in expression between the reference control to the diseased patient is useful in such a composition.

In one embodiment of the compositions described above, the reference control is a non-healthy control (NHC) as described above. In other embodiments, the reference control may be any class of controls as described above in “Definitions”.

The compositions based on the genes selected from Table I or Table II described herein, optionally associated with detectable labels, can be presented in the format of a microfluidics card, a chip or chamber, or a kit adapted for use with the Nanostring, PCR, RT-PCR or Q PCR techniques described above. In one aspect, such a format is a diagnostic assay using TAQMAN® Quantitative PCR low density arrays. In another aspect, such a format is a diagnostic assay using the Nanostring nCounter platform.

For use in the above-noted compositions the PCR primers and probes are preferably designed based upon intron sequences present in the gene(s) to be amplified selected from the gene expression profile. Exemplary target sequences are shown in Table III. The design of the primer and probe sequences is within the skill of the art once the particular gene target is selected. The particular methods selected for the primer and probe design and the particular primer and probe sequences are not limiting features of these compositions. A ready explanation of primer and probe design techniques available to those of skill in the art is summarized in U.S. Pat. No. 7,081,340, with reference to publically available tools such as DNA BLAST software, the Repeat Masker program (Baylor College of Medicine), Primer Express (Applied Biosystems); MGB assay-by-design (Applied Biosystems); Primer3 (Steve Rozen and Helen J. Skaletsky (2000) Primer3 on the WWW for general users and for biologist programmers.

In general, optimal PCR primers and probes used in the compositions described herein are generally 17-30 bases in length, and contain about 20-80%, such as, for example, about 50-60% G+C bases. Melting temperatures of between 50 and 80° C., e.g. about 50 to 70° C. are typically preferred.

In another aspect, a composition for diagnosing lung cancer in a mammalian subject contains a plurality of polynucleotides immobilized on a substrate, wherein the plurality of genomic probes hybridize to 100 or more gene expression products of 100 or more informative genes selected from a gene expression profile in the blood of the subject, the gene expression profile comprising genes selected from Table I. In another embodiment, a composition for diagnosing lung cancer in a mammalian subject contains a plurality of polynucleotides immobilized on a substrate, wherein the plurality of genomic probes hybridize to 10 or more gene expression products of 10 or more informative genes selected from a gene expression profile in the blood of the subject, the gene expression profile comprising genes selected from Table I or Table II. This type of composition relies on recognition of the same gene profiles as described above for the Nanostring compositions but employs the techniques of a cDNA array. Hybridization of the immobilized polynucleotides in the composition to the gene expression products present in the blood of the patient subject is employed to quantitate the expression of the informative genes selected from among the genes identified in Tables I or Table II to generate a gene expression profile for the patient, which is then compared to that of a reference sample. As described above, depending upon the identification of the profile (i.e., that of genes of Table I or subsets thereof, that of genes of Table II or subsets thereof), this composition enables the diagnosis and prognosis of NSCLC lung cancers. Again, the selection of the polynucleotide sequences, their length and labels used in the composition are routine determinations made by one of skill in the art in view of the teachings of which genes can form the gene expression profiles suitable for the diagnosis and prognosis of lung cancers.

In yet another aspect, a composition or kit useful in the methods described herein contain a plurality of ligands that bind to 100 or more gene expression products of 100 or more informative genes selected from a gene expression profile in the blood of the subject. In another embodiment, a composition or kit useful in the methods described herein contain a plurality of ligands that bind to 10 or more gene expression products of 10 or more informative genes selected from a gene expression profile in the blood of the subject. The gene expression profile contains the genes of Table I or Table II, as described above for the other compositions. This composition enables detection of the proteins expressed by the genes in the indicated Tables. While preferably the ligands are antibodies to the proteins encoded by the genes in the profile, it would be evident to one of skill in the art that various forms of antibody, e.g., polyclonal, monoclonal, recombinant, chimeric, as well as fragments and components (e.g., CDRs, single chain variable regions, etc.) may be used in place of antibodies. Such ligands may be immobilized on suitable substrates for contact with the subject's blood and analyzed in a conventional fashion. In certain embodiments, the ligands are associated with detectable labels. These compositions also enable detection of changes in proteins encoded by the genes in the gene expression profile from those of a reference gene expression profile. Such changes correlate with lung cancer in a manner similar to that for the PCR and polynucleotide-containing compositions described above.

For all of the above forms of diagnostic/prognostic compositions, the gene expression profile can, in one embodiment, include at least the first 25 of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 10 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 15 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 20 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 30 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 40 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 50 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 60 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 70 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 80 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 90 or more of the informative genes of Table I or Table II. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include all 100 of the informative genes of Table II. In one embodiment, for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include at least the first 100 of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 200 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 300 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 400 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 500 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include 539 or more of the informative genes of Table I. In another embodiment for all of the above forms of diagnostic/prognostic compositions, the gene expression profile can include all 559 of the informative genes of Table I.

These compositions may be used to diagnose lung cancers, such as stage I or stage II NSCLC. Further these compositions are useful to provide a supplemental or original diagnosis in a subject having lung nodules of unknown etiology.

IV. DIAGNOSTIC METHODS OF THE INVENTION

All of the above-described compositions provide a variety of diagnostic tools which permit a blood-based, non-invasive assessment of disease status in a subject. Use of these compositions in diagnostic tests, which may be coupled with other screening tests, such as a chest X-ray or CT scan, increase diagnostic accuracy and/or direct additional testing.

Thus, in one aspect, a method is provided for diagnosing lung cancer in a mammalian subject. This method involves identifying a gene expression profile in the blood of a mammalian, preferably human, subject. In one embodiment, the gene expression profile includes 100 or more gene expression products of 100 or more informative genes having increased or decreased expression in lung cancer. The gene expression profiles are formed by selection of 100 or more informative genes from the genes of Table I. In another embodiment, the gene expression profile includes 10 or more gene expression products of 10 or more informative genes having increased or decreased expression in lung cancer. The gene expression profiles are formed by selection of 10 or more informative genes from the genes of Table I. In another embodiment, the gene expression profiles are formed by selection of 10 or more informative genes from the genes of Table II. In another embodiment, the gene expression profile includes 10 or more gene expression products of 5 or more informative genes having increased or decreased expression in lung cancer. The gene expression profiles are formed by selection of 5 or more informative genes from the genes of Table I. In another embodiment, the gene expression profiles are formed by selection of 5 or more informative genes from the genes of Table II. Comparison of a subject's gene expression profile with a reference gene expression profile permits identification of changes in expression of the informative genes that correlate with a lung cancer (e.g., NSCLC). This method may be performed using any of the compositions described above. In one embodiment, the method enables the diagnosis of a cancerous tumor from a benign nodule.

In another aspect, use of any of the compositions described herein is provided for diagnosing lung cancer in a subject.

The diagnostic compositions and methods described herein provide a variety of advantages over current diagnostic methods. Among such advantages are the following. As exemplified herein, subjects with cancerous tumors are distinguished from those with benign nodules. These methods and compositions provide a solution to the practical diagnostic problem of whether a patient who presents at a lung clinic with a small nodule has malignant disease. Patients with an intermediate-risk nodule would clearly benefit from a non-invasive test that would move the patient into either a very low-likelihood or a very high-likelihood category of disease risk. An accurate estimate of malignancy based on a genomic profile (i.e. estimating a given patient has a 90% probability of having cancer versus estimating the patient has only a 5% chance of having cancer) would result in fewer surgeries for benign disease, more early stage tumors removed at a curable stage, fewer follow-up CT scans, and reduction of the significant psychological costs of worrying about a nodule. The economic impact would also likely be significant, such as reducing the current estimated cost of additional health care associated with CT screening for lung cancer, i.e., $116,000 per quality adjusted life-year gained. A non-invasive blood genomics test that has a sufficient sensitivity and specificity would significantly alter the post-test probability of malignancy and thus, the subsequent clinical care.

A desirable advantage of these methods over existing methods is that they are able to characterize the disease state from a minimally-invasive procedure, i.e., by taking a blood sample. In contrast, current practice for classification of cancer tumors from gene expression profiles depends on a tissue sample, usually a sample from a tumor. In the case of very small tumors a biopsy is problematic and clearly if no tumor is known or visible, a sample from it is impossible. No purification of tumor is required, as is the case when tumor samples are analyzed. A recently published method depends on brushing epithelial cells from the lung during bronchoscopy, a method which is also considerably more invasive than taking a blood sample. Blood samples have an additional advantage, which is that the material is easily prepared and stabilized for later analysis, which is important when messenger RNA is to be analyzed.

The 559 classifier described herein showed a ROC-AUC of 0.81 over all tested samples. In one embodiment, when the sensitivity is about 90%, the specificity is about 46%. When the nodule classification accuracy is assessed by size without using a specific threshold for sensitivity, as nodules size and the cancer risk factor increases, the number of benign nodules classified as cancer increases. In one embodiment, the accuracy of the gene classifier is about 89% for nodules ≤8 mm. In another embodiment, the accuracy of the gene classifier is about 75% for nodules >8 to about ≤12 mm. In yet another embodiment, the accuracy of the gene classifier is about 68% for nodules >12 to about ≤16 mm. In another embodiment, the accuracy of the gene classifier is about 53% for >16 mm. See examples below.

In one embodiment, for nodules about <10 mm, the specificity is about 54% and the ROC-AUC to 0.85 at about 90% sensitivity. In another embodiment, for larger nodules, about >10 mm, the specificity is about 24% and the ROC-AUC about 0.71 at about 90% sensitivity.

The 100 Classifier described herein showed a ROC-AUC of 0.82 over all tested samples. In one embodiment, when the sensitivity is about 90%, the specificity is about 62%. In another embodiment, when the sensitivity is about 79%, the specificity is about 68%. In one embodiment, when the sensitivity is about 71%, the specificity is about 75%. See examples below.

These compositions and methods allow for more accurate diagnosis and treatment of lung cancer. Thus, in one embodiment, the methods described include treatment of the lung cancer. Treatment may removal of the neoplastic growth, chemotherapy and/or any other treatment known in the art or described herein.

In one embodiment, a method for diagnosing the existence or evaluating a lung cancer in a mammalian subject is provided, which includes identifying changes in the expression of 5, 10, 15 or more genes in the sample of said subject, said genes selected from the genes of Table I or the genes of Table II. The subject's gene expression levels are compare with the levels of the same genes in a reference or control, wherein changes in expression of the subject's genes from those of the reference correlates with a diagnosis or evaluation of a lung cancer.

In one embodiment, the diagnosis or evaluation comprise one or more of a diagnosis of a lung cancer, a diagnosis of a benign nodule, a diagnosis of a stage of lung cancer, a diagnosis of a type or classification of a lung cancer, a diagnosis or detection of a recurrence of a lung cancer, a diagnosis or detection of a regression of a lung cancer, a prognosis of a lung cancer, or an evaluation of the response of a lung cancer to a surgical or non-surgical therapy. In another embodiment, the changes comprise an upregulation of one or more selected genes in comparison to said reference or control or a downregulation of one or more selected genes in comparison to said reference or control.

In one embodiment, the method includes the size of a lung nodule in the subject. The specificity and sensitivity may be variable based on the size of the nodule. In one embodiment, the specificity is about 46% at about 90% sensitivity. In another embodiment, the specificity is about 54% at about 90% sensitivity for nodules <10 mm. In yet another embodiment, the accuracy is about 88% for nodules ≤8 mm, about 75% for nodules >8 mm and ≤12 mm, about 68% for nodules >12 mm and ≤16 mm, and about 53% for nodules >16 mm.

In another embodiment, the reference or control comprises three or more genes of Table I sample of at least one reference subject. The reference subject may be selected from the group consisting of: (a) a smoker with malignant disease, (b) a smoker with non-malignant disease, (c) a former smoker with non-malignant disease, (d) a healthy non-smoker with no disease, (e) a non-smoker who has chronic obstructive pulmonary disease (COPD), (f) a former smoker with COPD, (g) a subject with a solid lung tumor prior to surgery for removal of same; (h) a subject with a solid lung tumor following surgical removal of said tumor; (i) a subject with a solid lung tumor prior to therapy for same; and (j) a subject with a solid lung tumor during or following therapy for same. In one embodiment, the reference or control subject (a)-(j) is the same test subject at a temporally earlier timepoint.

The sample is selected from those described herein. In one embodiment, the sample is peripheral blood. The nucleic acids in the sample are, in some embodiments, stabilized prior to identifying changes in the gene expression levels. Such stabilization may be accomplished, e.g., using the Pax Gene system, described herein.

In one embodiment, the method of detecting lung cancer in a patient includes

a. obtaining a sample from the patient; and

b. detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product.

In another embodiment, the method of diagnosing lung cancer in a subject includes

a. obtaining a blood sample from a subject;

b. detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 100 gene of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product; and

c. diagnosing the subject with cancer when changes in expression of the subject's genes from those of the reference are detected.

In yet another embodiment, the method includes

a. obtaining a blood sample from a subject;

b. detecting a change in expression in at least 10 genes selected from Table I or Table II in the patient sample as compared to a control by contacting the sample with a composition comprising oligonucleotides, polynucleotides or ligands specific for each different gene transcript or expression product of the at least 10 genes of Table I or Table II and detecting binding between the oligonucleotide, polynucleotide or ligand and the gene product or expression product;

c. diagnosing the subject with cancer when changes in expression of the subject's genes from those of the reference are detected; and

d. removing the neoplastic growth.

V. EXAMPLES

The invention is now described with reference to the following examples. These examples are provided for the purpose of illustration only and the invention should in no way be construed as being limited to these examples but rather should be construed to encompass any and all variations that become evident as a result of the teaching provided herein.

Example 1: Patient Population—Analysis A

For development of the gene classifier described herein, blood samples and clinical information were collected from 150 subjects, 73 having a diagnosis of lung cancer and 77 having a diagnosis of benign nodule. Patient characteristics are shown in FIG. 1.

Patients with lung cancer included newly diagnosed male and female patients with early stage lung cancer. They were in moderately good health (ambulatory), although with medical illness. They were excluded if they have had previous cancers, chemotherapy, radiation, or cancer surgery. They must have had a lung cancer diagnosis within preceding 6 months, histologic confirmation, and no systemic therapy, such as chemotherapy, radiation therapy or cancer surgery as biomarker levels may change with therapy. Thus the majority of the cancer patients were early stage (i.e., Stage I and Stage II).

The “control” cohort was derived from patients with benign lung nodules (e.g. ground glass opacities, single nodules, granulomas or hamartomas). These patients were evaluated at pulmonary clinics, or underwent thoracic surgery for a lung nodule. All samples were collected prior to surgery.

Example 2: Patient Population—Analysis B

Further blood samples and clinical information were collected from 120 subjects, 60 having a diagnosis of lung cancer and 60 having a diagnosis of benign nodule. Patients with lung cancer included newly diagnosed male and female patients with early stage lung cancer. They were in moderately good health (ambulatory), although with medical illness. They were excluded if they have had previous cancers, chemotherapy, radiation, or cancer surgery. They must have had a lung cancer diagnosis within preceding 6 months, histologic confirmation, and no systemic therapy, such as chemotherapy, radiation therapy or cancer surgery as biomarker levels may change with therapy. Thus the majority of the cancer patients were early stage (i.e., Stage I and Stage II).

The “control” cohort was derived from patients with benign lung nodules (e.g. granulomas or hamartomas). These patients were evaluated at pulmonary clinics, or underwent thoracic surgery for a lung nodule. All samples were collected prior to surgery.

Example 3: Sample Collection Protocols and Processing

Blood samples were collected in the clinic by the tissue acquisition technician. Blood samples were drawn directly into PAXgene Blood RNA Tubes via standard phlebotomy technique. These tubes contain a proprietary reagent that immediately stabilizes intracellular RNA, minimizing the ex-vivo degradation or up-regulation of RNA transcripts. The ability to eliminate freezing, batch samples, and to minimize the urgency to process samples following collection, greatly enhances lab efficiency and reduces costs.

Example 4—RNA Purification and Quality Assessment

PAXgene RNA is prepared using a standard commercially available kit from Qiagen™ that allows purification of mRNA. The resulting RNA is used for mRNA profiling. The RNA quality is determined using a Bioanalyzer. Only samples with RNA Integrity numbers >3 were used.

Briefly, RNA is isolated as follows. Turn shaker-incubator on and set to 55° C. before beginning. Unless otherwise noted, all steps in this protocol including centrifugation steps, should be carried out at room temp (15-25° C.). This protocol assumes samples are stores at −80° C. Unfrozen samples that have been left a RT per the Qiagen protocol of a minimum of 2 hours should be processed in the same way.

Thaw Paxgene tubes upright in a plastic rack. Invert tubes at least 10 times to mix before starting isolation. Prepare all necessary tubes. For each sample, the following are needed: 2 numbered 1.5 ml Eppendorf tubes; 1 Eppendorf tube with the sample information (this is the final tube); 1 Lilac Paxgene spin column; 1 Red Paxgene Spin column; and 5 Processing tubes.

Centrifuge the PAXgene Blood RNA Tube for 10 minutes at 5000×g using a swing-out rotor in Qiagen centrifuge. (Sigma 4-15° C. Centrifuge., Rotor: Sigma Nr. 11140, 7/01, 5500/min, Holder: Sigma 13115, 286 g 14/D, Inside tube holder: 18010, 125 g). Note: After thawed, ensure that the blood sample has been incubated in the PAXgene Blood RNA Tube for a minimum of 2 hours at room temperature (15-25° C.), in order to achieve complete lysis of blood cells.

Under the hood—remove the supernatant by decanting into bleach. When the supernatant is decanted, take care not to disturb the pellet, and dry the rim of the tube with a clean paper towel. Discard the decanted supernatant by placing the clotted blood into a bag and then into the infectious waste and discard the fluid portion down the sink and wash down with a lot of water. Add 4 ml RNase-free water to the pellet, and close the tube using a fresh secondary Hemogard closure.

Vortex until the pellet is visibly dissolved. Weigh the tubes in the centrifuge holder again to ensure they are balanced, and centrifuge for 10 minutes at 5000×g using a swing-out rotor Qiagen centrifuge Small debris remaining in the supernatant after vortexing but before centrifugation will not affect the procedure.

Remove and discard the entire supernatant. Leave tube upside-down for 1 min to drain off all supernatant. Incomplete removal of the supernatant will inhibit lysis and dilute the lysate, and therefore affect the conditions for binding RNA to the PAXgene membrane.

Add 350 μl Buffer BM1 and pipet up and down lyse the pellet.

Pipet the re-suspended sample into a labeled 1.5 ml microcentrifuge tube. Add 300 μl Buffer BM2. Then add 40 μl proteinase K. Mix by vortexing for 5 seconds, and incubate for 10 minutes at 55° C. using a shaker-incubator at the highest possible speed, 800 rpm on Eppendorf thermomixer. (If using a shaking water bath instead of a thermomixer, quickly vortex the samples every 2-3 minutes during the incubation. Keep the vortexer next to the incubator).

Pipet the lysate directly into a PAXgene Shredder spin column (lilac tube) placed in a 2 ml processing tube, and centrifuge for 3 minutes at 24 C at 18,500×g in the TOMY Microtwin centrifuge. Carefully pipet the lysate into the spin column and visually check that the lysate is completely transferred to the spin column. To prevent damage to columns and tubes, do not exceed 20,000×g.

Carefully transfer the entire supernatant of the flow-through fraction to a fresh 1.5 ml microcentrifuge tube without disturbing the pellet in the processing tube. Discard the pellet in the processing tube.

Add 700 μl isopropanol (100%) to the supernatant. Mix by vortexing.

Pipet 690 μl sample into the PAXgene RNA spin column (red) placed in a 2 ml processing tube, and centrifuge for 1 minute at 10,000×g. Place the spin column in a new 2 ml processing tube, and discard the old processing tube containing flow-through.

Pipet the remaining sample into the PAXgene RNA spin column (red), and centrifuge for 1 minute at 18,500×g. Place the spin column in a new 2 ml processing tube, and discard the old processing tube containing flow-through. Carefully pipet the sample into the spin column and visually check that the sample is completely transferred to the spin column.

Pipet 350 μl Buffer BM3 into the PAXgene RNA spin column. Centrifuge for 15 sec at 10,000×g. Place the spin column in a new 2 ml processing tube, and discard the old processing tube containing flow-through.

Prepare DNase I incubation mix for step 13. Add 10 μl DNase I stock solution to 70 μl Buffer RDD in a 1.5 ml microcentrifuge tube. Mix by gently flicking the tube, and centrifuge briefly to collect residual liquid from the sides of the tube.

Pipet the DNase I incubation mix (80 μl) directly onto the PAXgene RNA spin column membrane, and place on the benchtop (20-30° C.) for 15 minutes. Ensure that the DNase I incubation mix is placed directly onto the membrane. DNase digestion will be incomplete if part of the mix is applied to and remains on the walls or the O-ring of the spin column.

Pipet 350 μl Buffer BM3 into the PAXgene RNA spin column, and centrifuge for 15 sec at 18,500×g. Place the spin column in a new 2 ml processing tube, and discard the old processing tube containing flow-through.

Pipet 500 μl Buffer BM4 to the PAXgene RNA spin column, and centrifuge for 15 sec at 10,000×g. Place the spin column in a new 2 ml processing tube, and discard the old processing tube containing flow-through.

Add another 500 μl Buffer BM4 to the PAXgene RNA spin column. Centrifuge for 2 minutes at 18,500×g.

Discard the tube containing the flow-through, and place the PAXgene RNA spin column in a new 2 ml processing tube. Centrifuge for 1 minute at 18,500×g.

Discard the tube containing the flow-through. Place the PAXgene RNA spin column in a labeled 1.5 ml microcentrifuge tube (final tube), and pipet 40 μl Buffer BR5 directly onto the PAXgene RNA spin column membrane. Centrifuge for 1 minute at 10,000×g to elute the RNA. It is important to wet the entire membrane with Buffer BR5 in order to achieve maximum elution efficiency.

Repeat the elution step as described, using 40 μl Buffer BR5 and the same microcentrifuge tube. Centrifuge for 1 minute at 20,000×g to elute the RNA.

Incubate the eluate for 5 minutes at 65° C. in the shaker-incubator without shaking. After incubation, chill immediately on ice. This incubation at 65° C. denatures the RNA for downstream applications. Do not exceed the incubation time or temperature.

If the RNA samples will not be used immediately, store at −20° C. or −70° C. Since the RNA remains denatured after repeated freezing and thawing, it is not necessary to repeat the incubation at 65° C.

Example 5: Measurement of RNA Levels

To provide a biomarker signature that can be used in clinical practice to diagnose lung cancer, a gene expression profile with the smallest number of genes that maintain satisfactory accuracy is provided by the use of 100 more of the genes identified in Table I as well as by the use of 10 or more of the genes identified in Table II. These gene profiles or signatures permit simpler and more practical tests that are easy to use in a standard clinical laboratory. Because the number of discriminating genes is small enough, NanoString nCounter® platforms are developed using these gene expression profiles.

A. Nanostring nCounter® Platform Gene Expression Assay Protocol

Total RNA was isolated from whole blood using the Paxgene Blood miRNA Kit, as described above, and samples were checked for RNA quality. Samples were analyzed with the Agilent 2100 Bioanalyzer on a RNA Nano chip, using the RIN score and electropherogram picture as indicators for good sample integrity. Samples were also quantitated on the Nanodrop (ND-1000 Spectrophotometer) where 260/280 and 260/230 readings were recorded and evaluated for Nanostring-compatibility. From the concentrations taken by Nanodrop, total RNA samples were normalized to contain 100 ng in 5 μL, using Nuclease-free water as diluent, into Nanostring-provided tube strips. An 8 μL aliquot of a mixture of the Nanostring nCounter Reporter CodeSet and Hybridization Buffer (70 μL Hybridization Buffer, 42 μL Reporter CodeSet per 12 assays) and 2 μL of Capture ProbeSet was added to each 5 μL RNA sample. Samples were hybridized for 19 hours at 65° C. in the Thermocycler (Eppendorf). During hybridization, Reporter Probes, which have fluorescent barcodes specific to each mRNA of interest to the user, and biotinylated Capture Probes bound to their associated target mRNA to create target-probe complexes. After hybridization was complete, samples were then transferred to the nCounter Prep Station for processing using the Standard Protocol setting (Run Time: 2 hr35 min). The Prep Station robot, during the Standard Protocol, washed samples to remove excess Reporter and Capture Probes. Samples were moved to a streptavidin-coated cartridge where purified target-probe complexes were immobilized in preparation for imaging by the nCounter Digital Analyzer. Upon completion, the cartridge was sealed and placed in the Digital Analyzer using a Field of View (FOV) setting at 555. A fluorescent microscope tabulated the raw counts for each unique barcode associated with a target mRNA. Data collected was stored in .csv files and then transferred to the Bioinformatics Facility for analysis according to the manufacturer's instructions.

Example 6: Biomarker Selection

Support Vector Machine (SVM) can be applied to gene expression datasets for gene function discovery and classification. SVM has been found to be most efficient at distinguishing the more closely related cases and controls that reside in the margins. Primarily SVM-RFE (48, 54) was used to develop gene expression classifiers which distinguish clinically defined classes of patients from clinically defined classes of controls (smokers, non-smokers, COPD, granuloma, etc). SVM-RFE is a SVM based model utilized in the art that removes genes, recursively based on their contribution to the discrimination, between the two classes being analyzed. The lowest scoring genes by coefficient weights were removed and the remaining genes were scored again and the procedure was repeated until only a few genes remained. This method has been used in several studies to perform classification and gene selection tasks. However, choosing appropriate values of the algorithm parameters (penalty parameter, kernel-function, etc.) can often influence performance.

SVM-RCE is a related SVM based model, in that it, like SVM-RFE assesses the relative contributions of the genes to the classifier. SVM-RCE assesses the contributions of groups of correlated genes instead of individual genes. Additionally, although both methods remove the least important genes at each step, SVM-RCE scores and removes clusters of genes, while SVM-RFE scores and removes a single or small numbers of genes at each round of the algorithm.

The SVM-RCE method is briefly described here. Low expressing genes (average expression less than 2× background) were removed, quantile normalization performed, and then “outlier” arrays whose median expression values differ by more than 3 sigma from the median of the dataset were removed. The remaining samples were subject to SVM-RCE using ten repetitions of 10-fold cross-validation of the algorithm. The genes were reduced by t-test (applied on the training set) to an experimentally determined optimal value which produces highest accuracy in the final result. These starting genes were clustered by K-means into clusters of correlated genes whose average size is 3-5 genes. SVM classification scoring was carried out on each cluster using 3-fold resampling repeated 5 times, and the worst scoring clusters eliminated. Accuracy is determined on the surviving pool of genes using the left-out 10% of samples (testing set) and the top-scoring 100 genes were recorded. The procedure was repeated from the clustering step to an end point of 2 clusters. The optimal gene panel was taken to be the minimal number of genes which gives the maximal accuracy starting with the most frequently selected gene. The identity of the individual genes in this panel is not fixed, since the order reflects the number of times a given gene was selected in the top 100 informative genes and this order is subject to some variation.

A. Biomarker Selection.

Genes which score highest (by SVM) in discriminating cancerous tumors from benign nodules were examined for their utility for clinical tests. Factors considered include, higher differences in expression levels between classes, and low variability within classes. When selecting biomarkers for validation an effort was made to select genes with distinct expression profiles to avoid selection of correlated genes and to identify genes with differential expression levels that were robust by alternative techniques including PCR and/or immuno-histochemistry.

B. Validation.

Three methods of validation were considered.

Cross-Validation: To minimize over-fitting within a dataset, K-fold cross-validation (K usually equal to 10) was used, when the dataset is split on K parts randomly and K−1 parts were used for training and 1 for testing. Thus, for K=10 the algorithm was trained on a random selection of 90% of the patients and 90% of the controls and then tested on the remaining 10%. This was repeated until all of the samples have been employed as test subjects and the cumulated classifier makes use of all of the samples, but no sample is tested using a training set of which it is a part. To reduce the randomization impact, K-fold separation was performed M times producing different combinations of patients and controls in each of K folds each time. Therefore, for individual dataset M*K rounds of permuted selection of training and testing sets were used for each set of genes.

Independent Validation: To estimate the reproducibility of the data and the generality of the classifier, one needs to examine the classifier that was built using one dataset and tested using another dataset to estimate the performance of the classifier. To estimate the performance, validation on the second set was performed using the classifier developed with the original dataset.

Resampling (permutation): To demonstrate dependence of the classifier on the disease state, patients and controls from the dataset were chosen at random (permuted) and the classification was repeated. The accuracy of classification using randomized samples was compared to the accuracy of the developed classifier to determine the p value for the classifier, i.e., the possibility that the classifier might have been chosen by chance. In order to test the generality of a classifier developed in this manner, it was used to classify independent sets of samples that were not used in developing the classifier. The cross-validation accuracies of the permuted and original classifier were compared on independent test sets to confirm its validity in classifying new samples.

C. Classifier Performance

Performance of each classifier was estimated by different methods and several performance measurements were used for comparing classifiers between each other. These measurements include accuracy, area under ROC curve, sensitivity, specificity, true positive rate and true negative rate. Based on the required properties of the classification of interest, different performance measurements can be used to pick the optimal classifier, e.g. classifier to use in screening of the whole population would require better specificity to compensate for small (˜1%) prevalence of the disease and therefore avoid large number of false positive hits, while a diagnostic classifier of patients in hospital should be more sensitive.

For diagnosing cancerous tumors from benign nodules, higher sensitivity is more desirable than specificity, as the patients are already at high risk.

Example 7: Testing of the Classifiers

Peripheral blood samples were all collected in PAXgene RNA stabilizations tubes and RNA was extracted according to the manufacturer. Samples were tested on a Nanostring nCounter™ (as described above) against a custom panel of 559 probes (Table III). In addition, they were tested against a 100 probe subset of 559 marker panel.

For the 559 Classifier, 432 were selected based on previous microarray data, 107 probes were selected from Nanostring studies and 20 were housekeeping genes. We analyzed 610 PAXgene RNA samples (278 cancers, 332 controls) derived from 5 collection sites. For QC, a Universal RNA standard (Agilent) was included in each batch of 36 samples tested. Probe expression values were normalized using the 20 housekeeping genes as well as spike-in positive and negative controls supplied by Nanostring (included in classifier). Zscores were calculated for probe count values and served as the input to a Support Vector Machine (SVM) classifier using a polynomial kernel. Classification performance was evaluated by 10-fold cross-validation of the samples.

A. 559 Classifier

As shown in FIGS. 2A to 2B, the 559 classifier developed on all the samples showed a ROC-AUC of 0.81 (FIG. 2A). With the Sensitivity set at 90%, the specificity is 46%. When performed on a balanced set of 556 samples (278 cancer, 278 nodule), similar performance is shown (FIG. 2B). For both sets, UHR controls, post samples, and patients with other cancers were excluded.

When nodule classification accuracy is assessed by size without using a specific threshold for sensitivity, we find that as nodules size and the cancer risk factor increases, the number of benign nodules classified as cancer increases. FIG. 3. In this analysis, nodules ≤8 mm were correctly classified 88.9% of the time, for nodules >8, ≤12 mm accuracy was 75%, for nodules >12, ≤16 mm accuracy was 68%, for nodules >16 mm accuracy is 53.6%. See Table IV below.

TABLE IV

Nodule Size
Correct
Incorrect
Total
Specificity

<=5 mm
108
19
127
85.0%

>5, <=8 mm
88
11
99
88.9%

>8, <=12 mm
40
13
53
75.5%

>12, <=16 mm
17
8
25
68.0%

>16 mm
15
13
28
53.6%

Total
268
64
332
80.7%

A second set of nodules was tested and the accuracy of the classifier for size groups was determined by sample group (cancer vs benign nodule). Similarly, as nodule size and the cancer risk factor increases, the number of benign nodules classified as cancer increases (FIGS. 4A to 4C). For cancers >5 mm and higher, r=0.95. For nodules of all sizes, r=0.97. The chart shows the sensitivity and specificity of the classification of cancers and nodules based on lesion size. These numbers are shown in bar graph form below.

Since classification accuracy was found to be negatively correlated with benign nodule size, we reanalyzed the data using only nodules <10 mm (n=244) (FIG. 5A) and sensitivity fixed at 90%, in this case the specificity rises to 54% and the ROC-AUC to 0.85. For larger nodules, >10 mm (n=88) the specificity drops to 24% and the ROC-AUC drops to 0.71 (FIG. 5B). See Table V below.

TABLE V

Small
Large

≤10 mm
>10 mm
All nodules

N (nodules)
244
88
332

min
1
10.4
1

max
10
90
90

mean
6.07
17.8
8.7

median
6
15
6

std
1.73
10.6
7.13

ROC Area
0.85
0.71
0.81

Specificity at
54%
42%
46%

90% Sensitivity

B. 100 Marker Classifier

We now reanalyzed the data from the 633 samples analyzed by W559 on the Nanostring platform in order to identify the minimal number of probes required to maintain performance attained with the whole panel. We used SVM-RFE for probe selection as previously described. We used 75% of the data for the training set with SVM-RFE and the tested the performance of top 100 probes (Table II) selected by this process on an independent testing set composed of 25% of the samples. Samples were randomly selected for training and testing sets Table VI below. The accuracy obtained on the testing set is shown in FIG. 6. In this analysis, at a sensitivity of 90%, specificity was 62%; at a sensitivity of 79%, specificity was 68%; and at a sensitivity of 71%, specificity was 75% (FIG. 6). In summary the ROC-AUC is 0.82 and at a sensitivity of 0.90 we achieve a specificity of 0.62.

TABLE VI

nodules
cancer

>
<=
n
>
<=
n

0
5
130
0
14
86

5
8
109
14
22
75

8
12.5
65
22
33
64

12.5

57
33

47

Each and every patent, patent application, and publication, including the priority application, U.S. Provisional Patent Application No. 62/352,865, filed Jun. 21, 2017, and publically available gene sequence cited throughout the disclosure is expressly incorporated herein by reference in its entirety. While this invention has been disclosed with reference to specific embodiments, it is apparent that other embodiments and variations of this invention are devised by others skilled in the art without departing from the true spirit and scope of the invention. The appended claims include such embodiments and equivalent variations.

Number	Name	Date	Kind
6582908	Fodor	Jun 2003	B2
7081340	Baker et al.	Jul 2006	B2
20090317392	Nakamura et al.	Dec 2009	A1
20110201517	Kolman et al.	Aug 2011	A1
20120021946	Nakamura et al.	Jan 2012	A1
20140005065	Showe et al.	Jan 2014	A1
20150315643	O'Garra et al.	Nov 2015	A1
20210079479	Showe et al.	Mar 2021	A1

Number	Date	Country
WO-2007141004	Dec 2007	WO
WO-2009075799	Jun 2009	WO
WO-201003 0697	Mar 2010	WO
WO 2012006632	Jan 2012	WO
WO-2012150275	Nov 2012	WO
WO-2012150275	Nov 2012	WO
WO-2012150275	Nov 2012	WO
WO-2013153130	Oct 2013	WO
WO-2016011068	Jan 2016	WO

Compositions and methods for diagnosing lung cancers using gene expression profiles

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Term Extension

Abstract

Description

Claims

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

PCT Information

US Referenced Citations (8)

Foreign Referenced Citations (9)

Non-Patent Literature Citations (14)

Related Publications (1)

Provisional Applications (1)

Entry
NCBI (2009) “PREDICTED: Homo sapiens similar to CG10103 (LOC728533), miscRNA” (Year: 2009).
NEB catalog (1998/1999), pp. 121, 284 (Year: 1999).
Geiss et al., “Direct multiplexed measurement of gene expression with color-coded probe pairs”, (2008) Nature Biotechnology 26(3): 317-325 (Year: 2008).
Ahern, “Biochemical, reagent kits offer scientists good return on investment”, (1995) The Scientist 9(15): 1-5 (Year: 1995).
Silvestri et al., A Bronchial Genomic Classifier for the Diagnostic Evaluation of Lung Cancer, N Engl J Med., vol. 373(3): 243-251, Jul. 2015.
Velculescu et al., Characterization of the yeast transcriptome, Cell, vol. 88(2):243-51, Jan. 1997.
Geiss et al., Direct multiplexed measurement of gene expression with color-coded probe pairs, NatBiotechnol., vol. 26(3):317-25, Mar. 2008 (Epub Feb. 17, 2008).
Velculescu et al., Serial analysis of gene expression, Science, vol. 270(523 5):484-487, 1995.
Brenner et al., Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays, Nature Biotechnology, vol. 18(6):630-634, Jun. 2000.
International Search Report and Written Opinion issued on International Patent Application No. PCT/US2017/038571, dated Nov. 30, 2017.
GA Accession No. GPL96, Mar. 2002.
Rooney et al., AACR 104th Annual Meeting Abstract 2407: Expression profiling of FGF-receptor pathway genes in squamous NSCLC tissue by Nanostring, Cancer Research, vol. 73(8):6-10, Apr. 2013.
Extended Search Report issued in corresponding European Patent Application No. 17816148,5, dated Jan. 21, 2020.
Search Report and Written Opinion issued in corresponding Singapore Patent Application No. 11201810914V, dated Apr. 18, 2020.