The invention describes methods of identifying proteins and posttranslational modification of proteins specifically associated with a chromatin region.
It has long been appreciated that chromatin-associated proteins and epigenetic factors play central roles in gene regulation. Mis-regulation of chromatin structure and post-translational modification of histones (PTMs) is linked to cancer and other epigenetic diseases. The field of epigenomics has been transformed by chromatin immunoprecipitation approaches that provide for the localization of a defined protein or post-translationally modified protein to specific chromosomal sites. However, the hierarchy of chromatin-templated events orchestrating the formation and inheritance of different epigenetic states remains poorly understood at a molecular level; there are no current methodologies that allow for determination of all proteins present at a defined, small region of chromatin. Chromatin immunoprecipitation (ChIP) assay shave allowed better understand genome-wide distribution of proteins and histone modifications within a genome at the nucleosome level. However, ChIP assays are largely confined to examining singular histone PTMs or proteins rather than simultaneous profiling of multiple targets, the inability to determine the co-occupancy of particular histone PTMs, and that ChIP is reliant on the previous identification of the molecular target. Other chromatin immunoprecipitation methodologies do not provide a mechanism for determining the specificity of protein interactions, or do not enrich for a small integrated genomic locus and cannot detect protein contamination in purified material. Therefore, there is a need for methods that allow for determination of all proteins and protein posttranslational modifications specifically associated at a defined, small region of chromatin.
The application file contains at least one photograph executed in color. Copies of this patent application publication with color photographs will be provided by the Office upon request and payment of the necessary fee.
A method of isolating and identifying proteins associated with a target region of chromatin in a cell has been discovered. The method may also be used to identify post-translational modifications (PTMs) of proteins associated with a target chromatin in a cell. Advantageously, the method may be used to determine whether the association of the identified proteins with a chromatin in a cell is specific or non-specific. As used herein, “specifically associated” or “specific association” of a protein with a target chromatin refers to any protein in a cell that normally associates with a chromatin in a cell. In addition, and as illustrated in the examples, the method may be used to determine the role of proteins and post-translational modifications (PTMs) of proteins in chromatin function, including regulatory mechanisms of transcription, and the role of epigenomic factors in controlling chromatin function.
In some aspects, the invention provides methods of isolating and identifying proteins specifically associated with a target chromatin. As described in Example 1 and
To determine which of the identified proteins and posttranslational modifications of proteins associated with a target chromatin isolated from a cell are specifically or non-specifically associated with the target chromatin, a method of the invention provides two cell samples, or lysates derived from two cell samples, comprising the target chromatin, wherein proteins in one cell sample, but not both of the cell samples are metabolically labeled. Typically, the two cell samples are grown identically. In addition, the target chromatin in one of the cell samples or an extract from one of the cell samples is tagged. The two cell samples, or lysates derived from the cell samples of the invention are combined. The tagged target chromatin is isolated in the presence of the other cell sample or an extract from the other cell sample. Therefore, if a target chromatin of the invention is tagged in the unlabeled cell sample, proteins specifically associated with the tagged chromatin are unlabeled, and will be isolated in the presence of labeled proteins from the labeled cell sample. Alternatively, if a target chromatin of the invention is tagged in the labeled cell sample, the proteins associated with the tagged chromatin are labeled, and will be isolated in the presence of unlabeled proteins from the unlabeled cell sample.
As such, determining if a certain identified protein associated with the target chromatin is labeled, unlabeled, or a combination of labeled and unlabeled may determine if the protein was specifically associated with a target chromatin of the invention. If an identified protein comprises a mixture of labeled and unlabeled proteins, then that protein became associated with a target chromatin during the chromatin isolation procedure, and association of that protein with the target chromatin is not specific. If a target chromatin of the invention is isolated from the unlabeled cell sample, only unlabeled identified proteins associated with the target chromatin are specifically associated with the target chromatin. Alternatively, if a target chromatin of the invention is isolated from the labeled cell sample, only labeled identified proteins associated with the target chromatin are specifically associated with the target chromatin.
In some embodiments, a tagged target chromatin of the invention is isolated from an unlabeled cell sample, and unlabeled proteins associated with the target chromatin are specifically associated with the target chromatin. In other embodiments, a tagged target chromatin of the invention is isolated from a labeled cell sample, and labeled proteins associated with the target chromatin are specifically associated with the target chromatin.
A target nucleic acid sequence may be isolated from any cell comprising the target nucleic acid sequence of the invention. A cell may be an archaebacterium, a eubacterium, or a eukaryotic cell. For instance, a cell of the invention may be a methanogen, a halophile or a thermoacidophile archaeabacterium, a gram positive, a gram negative, a cyanobacterium, a spirochaete, or a firmicute bacterium, a fungal cell, a moss cell, a plant cell, an animal cell, or a protist cell.
In some embodiments, a cell of the invention is a cell from an animal. A cell from an animal cell may be a cell from an embryo, a juvenile, or an adult. Suitable animals include vertebrates such as mammals, birds, reptiles, amphibians, and fish. Examples of suitable mammals include without limit rodents, companion animals, livestock, and primates. Non-limiting examples of rodents include mice, rats, hamsters, gerbils, and guinea pigs. Suitable companion animals include but are not limited to cats, dogs, rabbits, hedgehogs, and ferrets. Non-limiting examples of livestock include horses, goats, sheep, swine, cattle, llamas, and alpacas. Suitable primates include but are not limited to humans, capuchin monkeys, chimpanzees, lemurs, macaques, marmosets, tamarins, spider monkeys, squirrel monkeys, and vervet monkeys. Non-limiting examples of birds include chickens, turkeys, ducks, and geese. In some embodiments, a cell is a cell from a human.
In some embodiments, a cell may be from a model organism commonly used in laboratory research. For instance, a cell of the invention may be an E. coli, a Bacillus subtilis, a Caulobacter crescentus, a Mycoplasma genitalium, an Aliivibrio fischeri, a Synechocystis, or a Pseudomonas fluorescens bacterial cell; a Chlamydomonas reinhardtii, a Dictyostelium discoideum, a Tetrahymena thermophila, an Emiliania huxleyi, or a Thalassiosira pseudonana protist cell; an Ashbya gossypii, an Aspergillus nidulans, a Coprinus cinereus, a Cunninghamella elegans, a Neurospora crassa, a Saccharomyces cerevisiae, a Schizophyllum commune, a Schizosaccharomyces pombe, or an Ustilago maydis fungal cell; an Arabidopsis thaliana, a Selaginella moellendorffii, a Brachypodium distachyon, a Lotus japonicus, a Lemna gibba, a Zea mays, a Medicago truncatula, a Mimulus, a tobacco, a rice, a Populus, or a Nicotiana benthamiana plant cell; a Physcomitrella patens moss; an Amphimedon queenslandica sponge, an Arbacia punctulata sea urchin, an Aplysia sea slug, a Branchiostoma floridae deuterostome, a Caenorhabditis elegans nematode, a Ciona intestinalis sea squirt, a Daphnia spp. crustacean, a Drosophila fruit fly, a Euprymna scolopes squid, a Hydra Cnidarian, a Loligo pealei squid, a Macrostomum lignano flatworm, a Mnemiopsis leidyicomb jelly, a Nematostella vectensis sea anemone, an Oikopleura dioica free-swimming tunicate, an Oscarella carmela sponge, a Parhyale hawaiensis crustacean, a Platynereis dumerilii marine polychaetous annelid, a Pristionchus pacificus roundworm, a Schmidtea mediterranea freshwater planarian, a Stomatogastric ganglion of various arthropod species, a Strongylocentrotus purpuratus sea urchin, a Symsagittifera roscoffensis flatworm, a Tribolium castaneum beetle, a Trichoplax adhaerens Placozoa, a Tubifex tubifex oligochaeta, a laboratory mouse, a Guinea pig, a Chicken, a Cat, a Dog, a Hamster, a Lamprey, a Medaka fish, a Rat, a Rhesus macaque, a Cotton rat, a Zebra finch, a Takifugu pufferfish, an African clawed frog, or a Zebrafish. In exemplary embodiments, a cell is a Saccharomyces cerevisiae yeast cell. In particularly exemplary embodiments, a cell is a Saccharomyces cerevisiae W303a yeast cell.
A cell of the invention may be derived from a tissue or from a cell line grown in tissue culture. A cell line may be adherent or non-adherent, or a cell line may be grown under conditions that encourage adherent, non-adherent or organotypic growth using standard techniques known to individuals skilled in the art. Cell lines and methods of culturing cell lines are known in the art. Non-limiting examples of cell lines commonly cultured in a laboratory may include HeLa, a cell line from the National Cancer Institute's 60 cancer cell lines, DU145 (prostate cancer), Lncap (prostate cancer), MCF-7 (breast cancer), MDA-MB-438 (breast cancer), PC3 (prostate cancer), T47D (breast cancer), THP-1 (acute myeloid leukemia), U87 (glioblastoma), SHSYSY Human neuroblastoma cells, Saos-2 cells (bone cancer), Vero, GH3 (pituitary tumor), PC12 (pheochromocytoma), MC3T3 (embryonic calvarium), Tobacco BY-2 cells, Zebrafish ZF4 and AB9 cells, Madin-Darby canine kidney (MDCK), or Xenopus A6 kidney epithelial cells.
As described in Section (I) above, two cell samples, or lysates derived from two cell samples are combined, and a tagged target chromatin of the invention is isolated from the combined cells or combined cell lysates. Typically, cells in two cell samples of the invention are from the same type of cells or they may be derived from the same type of cells. For instance, cells may comprise a heterologous nucleic acid in a target chromatin, and may also comprise a heterologous protein expressed in a cell of the invention. The heterologous nucleic acid in a target chromatin may be used for tagging a chromatin of the invention, and the heterologous protein expressed in a cell may be used for tagging a target chromatin as described in Section I(d). In some embodiments, cells in two cell samples of the invention are from the same type of cells. In other embodiments, cells in the first cell sample are derived from the same cell type as cells in the second cell sample.
Two cell samples of the invention may be from the same genus, species, variety or strain of cells. In preferred embodiments, two cell samples of the invention are Saccharomyces cerevisiae yeast cells or derivatives of Saccharomyces cerevisiae yeast cells. In exemplary embodiments, two cell samples of the invention are Saccharomyces cerevisiae W303a yeast cells or derivatives of Saccharomyces cerevisiae W303a yeast cells. In exemplary embodiments, two cell samples of the invention are derivatives of Saccharomyces cerevisiae W303a yeast cells comprising the lexA binding site upstream of the GAL1 transcription start site, wherein protein A is expressed in one of the cell samples of derived Saccharomyces cerevisiae W303a yeast cells.
According to the invention, a metabolically labeled cell sample and an unlabeled cell sample are combined to generate a combined cell sample, or lysates derived from the two cell samples are combined to generate a combined cell lysate. Cell samples may be combined in a weight to weight (w/w) ratio of about 1:100 to about 100:1, about 1:50 to about 50:1, about 1:25 to about 25:1, preferably about 1:10 to about 10:1, and more preferably about 1:5 to about 5:1. In preferred embodiments, cell samples are combined in a w/w ratio of about 1:5 to about 5:1, about 1:2 to about 2:1, about 1:1.5 to about 1.5:1, or about 1:1. In exemplary embodiments, cell samples are combined in a w/w ratio of about 1:1. If cell lysates derived from two cell samples of the invention are combined, lysates derived from cell ratios described herein are combined. Individuals of ordinary skill in the art will recognize that ratios of cell samples or lysates derived from cell samples described herein may be subject to statistical confidence limits of actual cell weight. For instance, the ratio may be based on 85, 90, 95% or more confidence limits on cell weight.
The number of cells in a cell sample can and will vary depending on the type of cells, the abundance of a target chromatin in a cell, and the method of protein identification used, among other variables. For instance, if a cell of the invention is Saccharomyces cerevisiae, about 5×1011 to about 5×1012, more preferably, about 1×1011 to about 1×1012 cells may be used in a cell sample. In some embodiments, about 1×1011 to about 1×1012 Saccharomyces cerevisiae cells are used in a cell sample.
Two cell samples of the invention are typically grown identically. Identically grown cell samples minimizes potential structural or functional differences at a target chromatin present in both cell samples. As used herein, “grown identically” refers to cultured cell samples grown using similar culture condition, or cells from a tissue harvested using identical harvesting techniques. As described below, the two cell samples of the invention are grown identically in a manner that allows the metabolic labeling of proteins in one of the cell samples. For instance, the two cell samples of the invention are grown identically, except that one of the cell samples may be grown in the presence of a labeled amino acid as described in the examples, to generate a cell sample with metabolically labeled proteins.
Proteins in a cell sample are metabolically labeled. Methods of metabolically labeling proteins in a cell are known in the art and may comprise culturing a cell in the presence of at least one labeled analogue of a biomolecule that is metabolized by a cell of the invention. When the labeled analog of a biomolecule is supplied to cells in culture instead of the unlabeled biomolecule, the labeled biomolecule is incorporated into all newly synthesized proteins. After a number of cell divisions, each instance of this particular labeled biomolecule will be replaced by its labeled analog. Since there is hardly any chemical difference between the labeled biomolecule and the unlabeled biomolecule, the cells behave exactly like the control cell population grown in the presence of unlabeled biomolecule. As such, up to 100% of the particular biomolecule in a cell may be labeled. In some embodiments, up to 10, 20, 30, 40, 50, 60, 70, 80, 90 or up to 100% of the particular biomolecule in a cell is labeled. In preferred embodiments, up to 50, 60, 70, 80, 90 or up to 100%, and more preferably up to 90 or up to 100% of the particular biomolecule in a cell is labeled. In preferred embodiments, up to 100% of the particular biomolecule in a cell is labeled.
A cell may be labeled by culturing a cell in the presence of one or more than one labeled biomolecule. For instance, a cell may be cultured in the presence of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more labeled biomolecules. In some embodiments, a cell may be cultured in the presence of 1, 2, 3, 4, or 5 labeled biomolecules. In other embodiments, a cell may be cultured in the presence of 5, 6, 7, 8, 9, or 10 labeled biomolecules. In preferred embodiments, a cell may be cultured in the presence of 1 or 2 labeled biomolecules.
Non-limiting examples of a biomolecule that may be labeled and is metabolized by a cell of the invention may include an amino acid, a nucleic acid, a carbohydrate or a labeled molecule that may be incorporated into an amino acid, a nucleic acid, or a carbohydrate. Non-limiting examples of a labeled molecule that may be incorporated into an amino acid, a nucleic acid, a carbohydrate may include labeled ammonium sulfate, and labeled ammonium chloride. A labeled biomolecule may be a component of a cell culture medium such as a food source, e.g., glucose, sera or cell extracts. In some embodiments, a labeled biomolecule that is metabolized by a cell of the invention is a labeled nucleic acid. In other embodiments, a labeled biomolecule that is metabolized by a cell of the invention is a labeled carbohydrate such as [13C]glucose.
In preferred embodiments, a biomolecule that is metabolized by a cell of the invention is a labeled amino acid. In general, a labeled amino acid of the invention may be a labeled L-amino acid, a labeled D-amino acid or a mixture thereof. In preferred embodiments, a labeled amino acids is a labeled L-amino acids. A labeled amino acid may be a free amino acid or an amino acid salt. A labeled amino acid may also be in the form of intact protein or peptide, provided that the protein or peptide comprises a labeled amino acid of the invention. In some preferred embodiments, a labeled amino acid that may be used for metabolically labeling a cell of the invention may be a labeled L-Lysine, L-Arginine, L-Methionine, L-Tyrosine, or combinations thereof.
A labeled biomolecule may be labeled using a heavy isotope of one or more atoms of the biomolecule. Non limiting examples of a heavy isotope of one or more atoms of a biomolecule may include heavy hydrogen, carbon, nitrogen, phosphorous, oxygen, or sulfur. A labeled biomolecule may be about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 18, 19 or 20Da or more heavier than an unlabeled biomolecule. In some embodiments, a labeled biomolecule is about 1, 2, 3, 4, or 5Da heavier than an unlabeled biomolecule. In other embodiments, a labeled biomolecule is about 5, 6, 7, 8, 9, or 10Da heavier than an unlabeled biomolecule. In yet other embodiments, a labeled biomolecule is about 10, 11, 12, 13, 14, or 15Da heavier than an unlabeled biomolecule. In additional embodiments, a labeled biomolecule is about 15, 16, 17 18, 19 or 20Da heavier than an unlabeled biomolecule. In preferred embodiments, a labeled biomolecule is about 4, 5, 6, 7, 8, 9, or 10Da heavier than an unlabeled biomolecule.
In preferred embodiments, a labeled biomolecule is a labeled amino acid that may be used for metabolically labeling a cell of the invention may be a heavy analog of L-Lysine, L-Arginine, L-Methionine, L-Tyrosine, or combinations thereof. Non limiting examples of heavy analogs of L-Lysine, L-Arginine, L-Methionine, L-Tyrosine may include, [13C6]-L-Lysine, [13C6, 15N2]-L-Lysine, [13C6, 15N2, D9]-L-Lysine, [15N2, D9]-L-Lysine, [4,4,5,5-D4]-L-Lysine, [15N2]-L-Lysine, [13C6, 15N2]-L-Lysine, [13C6]-L-Arginine, [U-13C6, 15N4]-L-Arginine, [U-13C6, 15N4, D7]-L-Arginine, [15N4, D7]-L-Arginine, [15N4]-L-Arginine, [13C6, 15N4]-L-Arginine, [1-13C, methyl-D3]-L-Methionine, [13C9; 9 Da]-L-Tyrosine, [15N]-L-Tyrosine, and [13C9, 15N]-L-Tyrosine. In an exemplary embodiment, a labeled amino acid used to metabolically label a cell of the invention is [13C6, 15N4]-L-Arginine.
A method of the invention comprises identification of a protein and post-translational modification of a protein associated with a target chromatin. Generally, chromatin refers to the combination of nucleic acids and proteins in the nucleus of a eukaryotic cell. However, it is contemplated that the term “chromatin” may also refer to the combination of any nucleic acid sequence and proteins associated with the nucleic acid sequence in any cell.
A chromatin of the invention may comprise single stranded nucleic acid, double stranded nucleic acid, or a combination thereof. In some embodiments, a chromatin comprises single stranded nucleic acid. In other embodiments, a chromatin comprises a combination of single stranded and double stranded nucleic acids. In yet other embodiments, a chromatin comprises double stranded nucleic acid.
A chromatin of the invention may comprise a ribonucleic acid (RNA), a deoxyribonucleic acid (DNA), or a combination of RNA and DNA. In some embodiments, a chromatin of the invention comprises a combination of a RNA sequence and proteins associated with the RNA sequence in a cell. Non-limiting examples of RNA sequences may include mRNA, and non-coding RNA such as tRNA, rRNA, snoRNAs, microRNAs, siRNAs, piRNAs and the long noncoding RNA (IncRNA). In preferred embodiments, a chromatin of the invention comprises a combination of a DNA sequence and proteins associated with the DNA sequence in a cell. In other preferred embodiments, a chromatin of the invention comprises a combination of RNA and DNA sequences, and proteins associated with the RNA and DNA sequence in a cell. Non limiting examples of chromatin that may comprise a combination of RNA and DNA may include genomic DNA undergoing transcription, or genomic DNA comprising non-coding RNA such as IncRNA.
A chromatin of the invention may be genomic chromatin such as, chromatin from a chromosome of a cell, or chromatin from an organelle in the cell. Alternatively, a chromatin may be chromatin from an extrachromosomal nucleic acid sequence. In some embodiments, a chromatin of the invention is chromatin from an organelle in the cell. Non-limiting examples of a chromatin from an organelle may include mitochondrial nucleic acid sequence in plant and animal cells, and a chloroplast nucleic acid sequence in plant cells. In some embodiments, a nucleic acid sequence of the invention is a mitochondrial nucleic acid sequence. In other embodiments, a nucleic acid sequence of the invention is a chloroplast nucleic acid sequence.
In some embodiments, a chromatin of the invention is chromatin from an extrachromosomal nucleic acid sequence. The term “extrachromosomal,” as used herein, refers to any nucleic acid sequence not contained within the cell's genomic nucleic acid sequence. An extrachromosomal nucleic acid sequence may comprise some sequences that are identical or similar to genomic sequences in the cell, however, an extrachromosomal nucleic acid sequence as used herein does not integrate with genomic sequences of the cell. Non-limiting examples of an extrachromosomal nucleic acid sequence may include a plasmid, a virus, a cosmid, a phasmid, and a plasmid.
In some preferred embodiments, a chromatin of the invention is genomic chromatin. In exemplary embodiments, a chromatin of the invention is genomic chromatin of a eukaryotic cell. A eukaryotic cell of the invention may be as described in Section I(a) above.
Primary functions of genomic chromatin of a eukaryotic cell may be DNA packaging into a smaller volume to fit in the cell, strengthening of the DNA to allow mitosis, prevent DNA damage, and to control gene expression and DNA replication. As described above, genomic chromatin of a eukaryotic cell may comprise DNA sequences and a plurality of DNA-binding proteins as well as certain RNA sequences, assembled into higher order structural or functional regions. As used herein, a “structural or functional feature of a chromatin”, refers to a chromatin feature characterized by, or encoding, a function such as a regulatory function of a promoter, terminator, translation initiation, enhancer, etc., or a structural feature such as heterochromatin, euchromatin, a nucleosome, a telomere, or a centromere. A physical feature of a nucleic acid sequence may comprise a functional role and vice versa. As described below, a chromatin of the invention may be a chromatin fragment, and as such may comprise a fragment of a physical or functional feature of a chromatin, or no physical or functional features or known physical or functional features.
The primary protein components of genomic eukaryotic chromatin are histones that compact the DNA into a nucleosome. The nucleosome comprises an octet of histone proteins around which is wound a stretch of double stranded DNA sequence of about 150 to about 250 bp in length. Histones H2A, H2B, H3 and H4 are part of the nucleosome while histone H1 may act to link adjacent nucleosomes together into a higher order structure. Histones are subject to post translational which may affect their function in regulating chromatin function. Such modifications may include methylation, citrullination, acetylation, phosphorylation, SUMOylation, ubiquitination, and ADP-ribosylation.
Many further polypeptides and protein complexes interact with the nucleosome and the histones to regulate chromatin function. A “polypeptide complex” as used herein, is intended to describe proteins and polypeptides that assemble together to form a unitary association of factors. The members of a polypeptide complex may interact with each other via non-covalent or covalent bonds. Typically members of a polypeptide complex will cooperate to enable binding either to a nucleic acid sequence or to polypeptides and proteins already associated with or bound to a nucleic acid sequence in chromatin. Chromatin associated polypeptide complexes may comprise a plurality of proteins and/or polypeptides which each serve to interact with other polypeptides that may be permanently associated with the complex or which may associate transiently, dependent upon cellular conditions and position within the cell cycle. Hence, particular polypeptide complexes may vary in their constituent members at different stages of development, in response to varying physiological conditions or as a factor of the cell cycle. By way of example, in animals, polypeptide complexes with known chromatin remodelling activities include Polycomb group gene silencing complexes as well as Trithorax group gene activating complexes.
Additionally, a protein associated with a chromatin of the invention may be a protein normally expressed in a cell, or may be an exogenous heterologous protein expressed in a cell. In some embodiments, a protein associated with a chromatin of the invention is a protein normally expressed in a cell. In other embodiments, a protein associated with a chromatin of the invention is a protein normally expressed in a cell.
A chromatin of the invention may be an intact and complete chromatin from the cell, or may be a fragment of a chromatin in a cell. In some embodiments, a chromatin of the invention is an intact chromatin isolated from a cell. For instance, a chromatin of the invention may be a plasmid, a cosmid, or a phage chromatin or a complete organellar chromatin. In preferred embodiments, a chromatin of the invention is a fragment of a chromatin from a cell. In exemplary embodiments, a chromatin of the invention is a fragment of a genomic chromatin from a cell.
When a chromatin of the invention is a fragment of a chromatin in a cell, any method of fragmenting a chromatin known in the art may be used. Such methods may include physical methods of fragmenting a chromatin, or enzymatic digestion of a nucleic acid sequence of a chromatin. In some embodiments, a fragment of a chromatin may be generated using enzymatic digestion of a nucleic acid sequence in chromatin. Non-limiting examples of enzymatic digestion may include random or sequence specific enzymatic digestion using restriction enzymes, nucleases, combinations of restriction enzymes and nucleases, or combinations of nicking and other nucleases such as NEBNext™ fragmentase, which comprises a nicking enzyme that randomly generates nicks in double stranded DNA and another enzyme that cuts the strand opposite to the generated nicks.
In other embodiments, a fragment of a chromatin may be generated using a physical method of fragmenting a chromatin. Non-limiting examples of physical fragmenting methods that may be used to fragment a chromatin of the invention may include nebulization, sonication, and hydrodynamic shearing. In some embodiments, a fragment of a chromatin may be generated using nebulization. In other embodiments, a fragment of a chromatin may be generated using hydrodynamic shearing. In preferred embodiments, a fragment of a chromatin may be generated using sonication. During sonication, a sample comprising chromatin is subjected to ultrasonic waves, whose vibrations produce gaseous cavitations in the liquid that shear or break high molecular weight molecules such as chromatin through resonance vibration. Sonication methods that may be used to generate a chromatin of the invention are known in the art A fragment of a chromatin of the invention may comprise a nucleic acid sequence fragment and may be about 10, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, 2000, 2050, 2100, 2150, 2200, 2250, 2300, 2350, 2400, 2450, 2500, 2550, 2600, 2650, 2700, 2750, 2800, 2850, 2900, 2950, 3000, 4000, 5000, 6000, 7000, 8000, 9000, or about 10000 bases long or more. In some embodiments, a chromatin of the invention may comprise a nucleic acid sequence fragment of about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, or about 500 bases long. In other embodiments, a chromatin of the invention may comprise a nucleic acid sequence fragment of about 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980, 990, or about 1000 bases long. In yet other embodiments, a chromatin of the invention may comprise a nucleic acid sequence fragment of about 1000, 1010, 1020, 1030, 1040, 1050, 1060, 1070, 1080, 1090, 1100, 1110, 1120, 1130, 1140, 1150, 1160, 1170, 1180, 1190, 1200, 1210, 1220, 1230, 1240, 1250, 1260, 1270, 1280, 1290, 1300, 1310, 1320, 1330, 1340, 1350, 1360, 1370, 1380, 1390, 1400, 1410, 1420, 1430, 1440, 1450, 1460, 1470, 1480, 1490, or about 1500 bases long. In other embodiments, a chromatin of the invention may comprise a nucleic acid sequence fragment of about 1500, 1510, 1520, 1530, 1540, 1550, 1560, 1570, 1580, 1590, 1600, 1610, 1620, 1630, 1640, 1650, 1660, 1670, 1680, 1690, 1700, 1710, 1720, 1730, 1740, 1750, 1760, 1770, 1780, 1790, 1800, 1810, 1820, 1830, 1840, 1850, 1860, 1870, 1880, 1890, 1900, 1910, 1920, 1930, 1940, 1950, 1960, 1970, 1980, 1990, or about 2000 bases long. In additional embodiments, a chromatin of the invention may comprise a nucleic acid sequence fragment of about 2000, 2100, 2150, 2200, 2250, 2300, 2350, 2400, 2450, or about 2500 bases long. In other embodiments, a chromatin of the invention may comprise a nucleic acid sequence fragment of about 2000, 2050, 2100, 2150, 2200, 2250, 2300, 2350, 2400, 2450, or about 2500 bases long. In still other embodiments, a chromatin of the invention may comprise a nucleic acid sequence fragment of about 2500, 2550, 2600, 2650, 2700, 2750, 2800, 2850, 2900, 2950, 3000, 4000, 5000, 6000, 7000, 8000, 9000, or about 10000 bases long or more.
In some preferred embodiments, a chromatin fragment of the invention may comprise a nucleic acid sequence fragment of about 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980, 990, 1000, 1010, 1020, 1030, 1040, 1050, 1060, 1070, 1080, 1090, 1100, 1110, 1120, 1130, 1140, 1150, 1160, 1170, 1180, 1190, 1200, 1210, 1220, 1230, 1240, or about 1250 bases long. In a preferred embodiment, a chromatin of the invention may comprise a nucleic acid sequence fragment of about 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, or about 850 bases long. In another preferred embodiment, a chromatin of the invention may comprise a nucleic acid sequence fragment of about 950, 960, 970, 980, 990, 1000, 1010, 1020, 1030, 1040, or about 1050 bases long.
In other preferred embodiments, a chromatin fragment of the invention may comprise a nucleic acid sequence fragment of about 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980, 990, 1000, 1010, 1020, 1030, 1040, 1050, 1060, 1070, 1080, 1090, 1100, 1110, 1120, 1130, 1140, 1150, 1160, 1170, 1180, 1190, 1200, 1210, 1220, 1230, 1240, 1250, 1260, 1270, 1280, 1290, 1300, 1310, 1320, 1330, 1340, 1350, 1360, 1370, 1380, 1390, 1400, 1410, 1420, 1430, 1440, 1450, 1460, 1470, 1480, 1490, or about 1500 bases long. In a preferred embodiment, a chromatin of the invention may comprise a nucleic acid sequence fragment of about 950, 960, 970, 980, 990, 1000, 1010, 1020, 1030, 1040, or about 1050 bases long. In another preferred embodiment, a chromatin of the invention may comprise a nucleic acid sequence fragment of about 1200, 1210, 1220, 1230, 1240, 1250, 1260, 1270, 1280, 1290, or about 1300 bases long.
As described in this section above, a chromatin of the invention may comprise one or more nucleosomes. As such, a chromatin fragment of the invention may comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or about 20 nucleosomes. In some embodiments, a chromatin fragment of the invention may comprise about 1, 2, 3, 4, or about 5 nucleosomes. In other embodiments, a chromatin fragment of the invention may comprise about 5, 6, 7, 8, 9, or about 10 nucleosomes. In yet other embodiments, a chromatin fragment of the invention may comprise about 10, 11, 12, 13, 14, or about 15 nucleosomes. In other embodiments, a chromatin fragment of the invention may comprise about 15, 16, 17, 18, 19, or about 20 nucleosomes. In preferred embodiments, a chromatin fragment of the invention may comprise about 4 nucleosomes. In other preferred embodiments, a chromatin fragment of the invention may comprise about 5 nucleosomes.
A target chromatin fragment of the invention may comprise a structural or a functional feature of chromatin as described above, a fragment of a physical or functional feature, or no physical or functional features or known physical or functional features. In some embodiments, a target chromatin fragment of the invention comprises a structural feature of chromatin. In other embodiments, a target chromatin fragment of the invention comprises no physical or functional features or known physical or functional features. In yet other embodiments, a target chromatin fragment of the invention comprises a functional feature of chromatin. In exemplary embodiments, a functional feature of chromatin is a promoter. In particularly exemplary embodiments, a functional feature of chromatin is a GAL1 promoter of Saccharomyces cerevisiae.
A target chromatin is isolated from a combined cell lysate. A combined cell lysate comprises a lysate of two combined cell samples, or a combination of two cell lysates derived from two cell samples, wherein a target chromatin is tagged in one of the cell samples. Irrespective of whether one cell sample or a combined cell sample is lysed, a skilled practitioner of the art will appreciate that structural and functional features of a target chromatin must be preserved during cell lysis and isolation of the target chromatin. The association of proteins with a target chromatin may be preserved during cell lysis and isolation of the target chromatin using methods known in the art for preserving a complex of proteins with a nucleic acid sequence. For instance, lysing of a cell and isolation of a target chromatin may be performed under refrigeration or using cryogenic methods and buffer conditions capable of preserving association of proteins and nucleic acid sequences. In addition, a complex of proteins with a nucleic acid may be preserved by crosslinking protein and nucleic acid complexes in a cell prior to lysing and isolating a chromatin. Crosslinking protein and nucleic acid complexes in a cell may also capture, or preserve, transient protein-protein and protein-nucleic acid interactions.
In some embodiments, a complex of proteins with a nucleic acid may be preserved by crosslinking protein and nucleic acid complexes in a chromatin prior to lysing a cell and isolating the chromatin. Crosslinking is the process of joining two or more molecules such as two proteins or a protein and a nucleic acid molecule, by a covalent bond. Molecules may be crosslinked by irradiation with ultraviolet light, or by using chemical crosslinking reagents. Chemical crosslinking reagents capable of crosslinking proteins and nucleic acids are known in the art and may include crosslinking reagents that target amines, sulfhydryls, carboxyls, carbonyls or hydroxyls; omobifunctional or heterobifunctional crosslinking reagent, variable spacer arm length or zero-length crosslinking reagents, cleavable or non-cleavable crosslinking reagents, and photoreactive crosslinking reagents. Non-limiting examples of crosslinking reagents that may be used to crosslink protein complexes and/or protein complexes and nucleic acids may include formaldehyde, glutaraldehyde, disuccinimidyl glutarate, disuccinimidyl suberate, a photoreactive amino acid such as photo-leucine or photo-methionine, and succinimidyl-diazirine. The degree of crosslinking can and will vary depending on the application of a method of the invention, and may be experimentally determined.
In a preferred embodiment, a complex of proteins with a nucleic acid in a chromatin of the invention may be preserved by crosslinking protein and nucleic acid complexes in a cell prior to lysing using formaldehyde. In an exemplary embodiment, a complex of proteins with a nucleic acid in a chromatin of the invention may be preserved by crosslinking protein and nucleic acid complexes in a cell prior to lysing using formaldehyde as described in the examples.
A skilled practitioner of the art will appreciate that protocols for lysing a cell can and will vary depending on the type of cell, the target chromatin of the invention, and the specific application of a method of the invention. Non limiting examples of methods that may be used to lyse a cell of the invention may include cell lysis using a detergent, an enzyme such as lysozyme, incubation in a hypotonic buffer which causes a cell to swell and burst, mechanical disruption such as liquid homogenization by forcing a cell through a narrow space, sonication, freeze/thaw, mortar and pestle, glass beads, and combinations thereof. In some embodiments, when a cell of the invention is a yeast cell, the cell may be cryogenically lysed under liquid nitrogen temperature with glass beads. In exemplary embodiments, when a cell of the invention is a yeast cell, the cell may be cryogenically lysed under liquid nitrogen temperature with glass beads as described in the examples.
Buffer conditions used during lysing and isolation of a chromatin of the invention can and will be altered to control stringent conditions during cell lysis and isolation to preserve association of proteins and nucleic acid sequences of a chromatin. “Stringent conditions” in the context of chromatin isolation are conditions capable of preserving specific association of proteins and nucleic acids of a chromatin, but minimizing non-specific association of proteins and nucleic acids. Stringent condition can and will vary depending on the application of a method of the invention, the target chromatin of the invention, the nucleic acid sequence in a target chromatin, the proteins or protein complexes associated with a target chromatin of the invention, whether or not proteins, protein complexes and nucleic acid sequences are crosslinked, and the conditions used for crosslinking proteins, protein complexes and nucleic acid sequences of a target chromatin. For instance, more stringent buffer conditions may be used in a method of the invention wherein proteins, protein-protein complexes, and protein-nucleic acid complexes are crosslinked compared to a method of the invention wherein proteins, protein-protein complexes, and protein-nucleic acid complexes are not crosslinked. As such, stringent buffer conditions used during cell lysis and isolation of a nucleic acid sequence of the invention may be experimentally determined for each application wherein a method of the invention is used. Buffer conditions that may alter stringent conditions during cell lysis and isolation may include pH and salt concentration. In preferred embodiments, proteins, protein-protein complexes, and protein-nucleic acid complexes of a target chromatin of the invention are crosslinked, and stringent buffer conditions are used during lysis and isolation of a chromatin of the invention. In exemplary embodiments, proteins, protein-protein complexes, and protein-nucleic acid complexes of a target chromatin of the invention are crosslinked, and stringent buffer conditions are used during lysis and isolation of a chromatin of the invention and are as described in the examples.
According to the invention, a tagged target chromatin is isolated from a combined cell lysate. As described in Sections I(a) and I(c) above, a combined cell lysate comprises a lysate of two combined cell samples, or a combination of two cell lysates derived from two cell samples, wherein a target chromatin is tagged in one of the lysates, or one of the cell samples. As such, a target chromatin is isolated from a cell lysate comprising a combination of a tagged target chromatin and an untagged target chromatin. The ratio of tagged target chromatin to untagged target chromatin reflects the ratio at which the two cell samples or the lysates derived from the two cell sample are combined. In addition, proteins in one of the cell samples or lysate derived from one of the cell samples are metabolically labeled. Therefore, when a tagged target chromatin is from a cell sample wherein proteins are metabolically labeled, a cell lysate of the invention comprises a combination of a tagged target chromatin comprising metabolically labeled proteins, and an untagged target chromatin comprising unlabeled proteins. Conversely, when a tagged target chromatin is from a cell sample wherein proteins are unlabeled, a cell lysate of the invention comprises a combination of a tagged target chromatin comprising unlabeled proteins, and an untagged target chromatin comprising labeled proteins.
A target chromatin may be isolated from a mixture of chromatins or chromatin fragments in a cell lysate as described in this section. As used herein, a target nucleic acid sequence is said to be “isolated” or “purified” when it is substantially free of proteins not associated with the target chromatin, nucleic acid sequences other than the nucleic acid sequences associated with the target chromatin, and other cell debris and cell contents resulting from extraction and preparation of the target chromatin from a cell. A target chromatin of the present invention may be purified to homogeneity or other degrees of purity. In general, the level of purity of an isolated target chromatin can and will vary depending on the cell type, the specific chromatin to be isolated, and the intended use of a target chromatin of the invention. The level of purity of an isolated target chromatin may be determined using methods known in the art. For instance, the level of purity of an isolated target chromatin may be determined by determining the level of purity of a nucleic acid sequence associated with a target chromatin, by determining the level of purity of a protein associated with a target chromatin, or by determining the level of enrichment of a target chromatin, compared to a non-target chromatin in a cell. In preferred embodiments, the level of purity of an isolated target chromatin is determined by determining the level of enrichment of a target chromatin, compared to a non-target chromatin in a cell. Determining the level of enrichment of a target chromatin, compared to a non-target chromatin in a cell may be as described in this section below.
A target chromatin of the invention may be isolated using methods known in the art, such as electrophoresis, molecular, immunological and chromatographic techniques, ion exchange, hydrophobic, affinity, and reverse-phase HPLC chromatography, size exclusion chromatography, precipitation, dialysis, chromatofocusing, ultrafiltration and diafiltration techniques, and combinations thereof. For general guidance in suitable purification techniques, see Scopes, R., Protein Purification, Springer-Vertag, NY (1982).
In general, a method of the invention comprises isolating a target chromatin by affinity purification, or affinity purification in combination with other methods of isolating chromatin described above. In a preferred embodiment, a method of the invention comprises isolating a target chromatin by affinity purification. Non limiting examples of affinity purification techniques that may be used to isolate a target chromatin of the invention may include affinity chromatography, immunoaffinity chromatography, size exclusion chromatography, and combinations thereof. See, for example, Roe (ed), Protein Purification Techniques: A Practical Approach, Oxford University Press, 2nd edition, 2001.
In essence, affinity purification of a target chromatin may comprise tagging a target chromatin by contacting the target chromatin of the invention with a tag capable of specifically recognizing and binding one or more portions of a target chromatin. As described in Section (I), two cell samples, or lysates derived from the cell samples of the invention are combined, and a target chromatin in one of the cell samples or an extract from one of the cell samples is tagged. In addition, proteins in one cell sample, but not both of the cell samples are metabolically labeled. As such, a target chromatin may be tagged in a cell or an extract from a cell wherein proteins are metabolically labeled, and proteins specifically associated with an isolated target chromatin are metabolically labeled. Alternatively, a target chromatin may be tagged in a cell or an extract from a cell wherein proteins are not metabolically labeled, and proteins specifically associated with an isolated target chromatin are not metabolically labeled. In some embodiments, a target chromatin is tagged in a cell or an extract from a cell wherein proteins are metabolically labeled. In other embodiments, a target chromatin is tagged in a cell or an extract from a cell wherein proteins are metabolically labeled.
A tag may be capable of specifically recognizing and binding 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 components of a target chromatin. In preferred embodiments, a tag is capable of specifically recognizing and binding one component of a target chromatin.
A tag may be capable of specifically recognizing and binding a component in a target chromatin. A component in a target chromatin may be a nucleic acid sequence in a nucleic acid associated with a target chromatin, a protein associated with a target chromatin, or a chromatin structural or functional feature in a target chromatin. In some embodiments, a tag is capable of specifically recognizing and binding a protein associated with a target chromatin. In other embodiments, a tag is capable of specifically recognizing and binding a chromatin structural or functional feature in a target chromatin. In preferred embodiments, a tag is capable of specifically recognizing and binding a nucleic acid sequence associated with a target chromatin.
A nucleic acid sequence associated with a target chromatin that may be specifically recognized and bound by a tag of the invention may be a nucleic acid sequence normally found in a chromatin of a cell of the invention. Alternatively, a nucleic acid sequence associated with a target chromatin that may be specifically recognized and bound by a tag of the invention may be an exogenous nucleic acid sequence introduced into a cell to facilitate tagging a target chromatin of the invention. In some embodiments, a nucleic acid sequence that may be recognized and bound by a tag is a nucleic acid sequence normally found in a chromatin of a cell of the invention. In other embodiments, a nucleic acid sequence that may be recognized and bound by a tag of the invention is an exogenous nucleic acid sequence introduced into a cell of the invention to facilitate tagging a chromatin of the invention. Non limiting examples of an exogenous nucleic acid sequence introduced into a cell to facilitate tagging a target chromatin of the invention may be the lexA binding sequence, and the Lac operator. In a preferred embodiment, a heterologous nucleic acid sequence introduced into a cell to facilitate tagging a target nucleic acid sequence of the invention is the lexA binding sequence. In an exemplary embodiment, a heterologous nucleic acid sequence introduced into a cell to facilitate tagging a target nucleic acid sequence of the invention is the lexA binding sequence immediately upstream of the transcription start site.
Individuals of ordinary skill in the art will recognize that an exogenous chromatin component introduced into a cell to facilitate tagging a target chromatin of the invention cannot and will not disrupt a target chromatin, or a structural or functional feature of a target chromatin. Methods of designing a chromatin component and a tag capable of binding the chromatin component that do not disrupt a chromatin of the invention may depend on the particular application of a method of the invention, and may be determined experimentally. For instance, if an application of a method of the invention comprises promoter function, a tag may be designed to bind anywhere adjacent to the promoter, but without disrupting the promoter.
A tag of the invention may further comprise one or more affinity handles. As used herein, the term “affinity handle” may refer to any handle that may be bound by a substrate for affinity purification, as described below. A tag may comprise one or more than one affinity handle. The inclusion of more than one affinity handle in a tag of the invention may significantly increase the efficiency of affinity purification for a low copy number chromatin target. As such, a tag may further comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more affinity handles. In a preferred embodiment, a tag of the invention comprises one affinity handle.
Affinity handles may include any affinity handle for which a cognate binding agent is readily available. An affinity handle may be an aptamer, an antibody, an antibody fragment, a double-stranded DNA sequence, modified nucleic acids and nucleic acid mimics such as peptide nucleic acids, locked nucleic acids, phosphorodiamidate morpholino oligomers (PMO), a ligand, a ligand fragment, a receptor, a receptor fragment, a polypeptide, a peptide, a coenzyme, a coregulator, an allosteric molecule, non-immunoglobulin scaffolds such as Affibodies, Anticalins, designed Ankyrin repeat proteins and others, an ion, or a small molecule for which a cognate binding agent is readily available. The term “aptamer” refers to a polypeptide or a polynucleotide capable of binding to a target molecule at a specific region. It is generally accepted that an aptamer, which is specific in its binding to any polypeptide, may be synthesized and/or identified by in vitro evolution methods. Non limiting examples of handles that may be suitable for isolating a chromatin may include biotin or a biotin analogue such as desthiobiotin, digoxigenin, dinitrophenol or fluorescein, a macromolecule that binds to a nucleic acid or a nucleic acid binding protein such as the Lac repressor, a zinc finger protein, a transcription activator protein capable of binding a nucleic acid, or a transcription activator-like (TAL) protein, antigenic polypeptides such as protein A, or peptide ‘tags’ such as polyhistidine, FLAG, HA and Myc tags. In preferred embodiments, a tag of the invention comprises an antigenic polypeptide. In exemplary embodiments, a tag of the invention comprises the protein A antigenic polypeptide, or derivatives thereof. Protein A is capable of binding the lexA binding site, and comprises an affinity handle capapble of binding IgG. As such, protein A may be used as an affinity purification tag for purifying a target chromatin comprising a lexA binding tag.
In some embodiments, a tag of the invention is a nucleic acid tag capable of binding a nucleic acid sequence component of a chromatin, wherein the nucleic acid sequence component of the chromatin is introduced into a cell of the invention. In some embodiments, a tag of the invention is a nucleic acid tag capable of binding a nucleic acid sequence component of a chromatin, wherein the nucleic acid sequence component of the chromatin is normally present in a cell of the invention. Non-limiting examples of nucleic acid tags capable of binding a nucleic acid sequence component of a chromatin include antisense RNA or DNA nucleic acid tags, and tags comprising modified nucleic acids and nucleic acid mimics such as peptide nucleic acids, locked nucleic acids, phosphorodiamidate morpholino oligomers (PMO). In some embodiments, a tag of the invention is a nucleic acid tag comprising locked nucleotides. For instance, a nucleic acid tag comprising locked nucleotides may be as described in US20110262908 or US20120040857, and a peptide nucleic acid tag may be as described in Boffa et al. 1995 PNAS 92:1901-1905, the disclosures of all of which are incorporated herein in their entirety.
In some preferred embodiments, a tag of the invention is a protein tag capable of binding a nucleic acid sequence component of a chromatin, wherein the nucleic acid sequence component of the chromatin is a nucleic acid sequence normally found in a chromatin of a cell of the invention. Non limiting examples of a protein tag capable of binding a nucleic acid sequence normally found in a chromatin of a cell may be a nucleic acid binding protein such as protein A, the Lac repressor, a zinc finger protein, a transcription activator protein capable of binding a nucleic acid, or a transcription activator-like (TAL) protein. In one embodiment, a tag of the invention is a transcription activator protein capable of binding a nucleic acid sequence normally found in a chromatin of a cell of the invention. In another embodiment, a tag of the invention is a zinc finger protein capable of binding a nucleic acid sequence normally found in a chromatin of a cell of the invention. In yet another embodiment, a tag of the invention is a transcription activator-like (TAL) protein capable of binding a nucleic acid sequence normally found in a chromatin of a cell of the invention.
A nucleic acid binding protein tag of the invention may be a wild type nucleic acid binding protein capable of binding a nucleic acid sequence normally found in a target chromatin. Alternatively, a nucleic acid binding protein tag of the invention may be engineered to have binding specificity for a nucleic acid sequence component normally found in a target chromatin of the invention. Individuals of ordinary skill in the art will recognize that nucleic acid binding proteins such as zinc finger proteins, transcription activator proteins, and transcription activator-like (TAL) proteins may be engineered to have novel nucleic acid binding specificity compared to naturally-occurring forms of the proteins. See, for example, U.S. Pat. Nos. 6,453,242 and 6,534,261, and U.S. Pate. Appl. Nos 20110239315, 20120110685, and 20120270273, the disclosures of which are incorporated by reference herein in their entireties. In some embodiments, a nucleic acid binding protein tag of the invention is a wild type nucleic acid binding protein capable of binding a nucleic acid sequence normally found in a target chromatin. In other embodiments, a nucleic acid binding protein tag of the invention is a nucleic acid binding protein engineered to have binding specificity for a nucleic acid sequence component of a target chromatin of the invention. In a preferred embodiment, a nucleic acid binding protein tag of the invention is a zinc finger protein engineered to have binding specificity for a nucleic acid sequence component of a target chromatin of the invention. In another preferred embodiment, a nucleic acid binding protein tag of the invention is a TAL protein engineered to have binding specificity for a nucleic acid sequence component of a target chromatin of the invention.
In other preferred embodiments, a tag of the invention is a protein tag capable of binding a nucleic acid sequence component of a chromatin, wherein the nucleic acid sequence component of the chromatin is an exogenous nucleic acid sequence introduced into a cell of the invention. In exemplary embodiments, a tag of the invention is a protein A tag capable of binding the lexA exogenous nucleic acid sequence introduced in a cell of the invention. In an exemplary embodiment, a tag of the invention is a protein A tag capable of binding the lexA exogenous nucleic acid sequence introduced upstream of the transcriptional start site of the GAL1 promoter of a S. cereviseae cell as described in the examples.
A target chromatin may be contacted with a tag at any time during a method of the invention leading to isolation of target chromatin. For instance, a target chromatin may be contacted with a protein tag during cell culture by expressing the protein tag in a cell of the invention. Alternatively, a target chromatin may be contacted with a tag after cell culture but before cell lysis, after cell lysis, or after fragmentation of chromatin to generate chromatin fragments comprising a target chromatin.
In some embodiments, a target chromatin is contacted with a tag after cell culture but before cell lysis. As such, a tag may be introduced into a cell before cell lysis. Methods of introducing a tag into a cell of the invention can and will vary depending on the type of cell, the tag, and the application of a method of the invention. For instance, a nucleic acid tag may be electroporated into a cell after culture. In other embodiments, a target chromatin is contacted with a tag after cell lysis. In yet other embodiments, a target chromatin is contacted with a tag after cell lysis and chromatin fragmentation. In preferred embodiments, a target chromatin is contacted with a tag during cell culture by expressing the tag in a cell of the invention during cell culture. In exemplary embodiments, a target chromatin comprises the lexA binding site, and the lexA binding site is contacted with a protein A tag during cell culture by expressing the protein A in a cell of the invention during cell culture. In an exemplary embodiment, a target chromatin comprises the lexA binding site, and the lexA binding site is contacted with a protein A tag during cell culture by expressing the protein A in a yeast cell of the invention during cell culture as described in the examples.
A target chromatin contacted and bound by a tag as described above may be isolated using an affinity handle of the tag. The term “isolated”, may be used herein to describe a purified preparation of a target chromatin that is enriched for the target chromatin, but wherein the target chromatin is not necessarily in a pure form. That is, an isolated target chromatin is not necessarily 100% pure, but may be about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% or 90% pure. An isolated target chromatin may be enriched for the target chromatin, relative to a chromatin in the lysed preparation that was not contacted by a tag of the invention. An isolated target chromatin may be enriched by 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 fold relative to a chromatin that is not contacted by a tag of the invention. In some embodiments, an isolated target chromatin is enriched by 2, 3, 4, or 5 fold relative to a chromatin that was not contacted by a tag of the invention. In other embodiments, an isolated target chromatin is enriched by 5, 6, 7, 8, 9, or 10 fold relative to a chromatin that was not contacted by a tag of the invention. In an exemplary embodiment, an isolated target chromatin is enriched 4, 5, or 6 fold relative to a chromatin that was not contacted by a tag of the invention.
A target chromatin contacted and bound by a tag as described above may be isolated using any affinity purification method known in the art. In short, a tagged target chromatin is bound to a substrate capable of binding the affinity handle. The substrate comprising a bound target chromatin may then be washed to remove non-target chromatin and other cell debris, and the target chromatin may be released from substrate. Methods of affinity purification of material comprising an affinity handle are known in the art and may include binding the affinity handle to a substrate capable of binding the affinity handle. The substrate may be a gel matrix such as gel beads, the surface of a container, or a chip. The tagged target chromatin bound to the substrate may then be purified. Methods of purifying tagged molecules are known in the art and will vary depending on the target molecule, the tag, and the substrate. For instance, if the tag is a protein A tag bound to a lexA binding site in a target chromatin, the target chromatin may be bound to a magnetic bead substrate comprising IgG, and purified using a magnet.
Proteins and peptides associated with an isolated target chromatin are extracted from the isolated target chromatin. Methods of extracting proteins from chromatin are generally known in the art of protein biochemistry. Generally, any extraction protocol suitable for isolating proteins and known to those of skill in the art may be used. Extracted proteins may also be further purified before protein identification. For instance, protein extracts may be further purified by differential precipitation, differential solubilization, ultracentrifugation, using chromatographic methods such as size exclusion chromatography, hydrophobic interaction chromatography, ion exchange chromatography, affinity chromatography, metal binding, immunoaffinity chromatography, HPLC, or gel electrophoriesis such as SDS-PAGE and QPNC-PAGE. In a preferred embodiment, extracted proteins are further purified using SDS-PAGE.
Extracted and purified intact proteins and post-translational modification of proteins may then be identified. Alternatively, extracted and purified intact proteins may be further digested, and the resulting peptide fragments are identified. In some embodiments, intact extracted proteins are identified. In preferred embodiments, extracted proteins are further digested, and the resulting peptide fragments are identified. For instance, protein extracts may be fragmented by enzymatically digesting the proteins using a protease such as trypsin. In exemplary embodiments, extracted proteins are further digested as described in the examples.
Methods of identifying proteins or protein fragments are known in the art and may include mass spectrometry (MS) analysis, or a combination of mass spectrometry with a chromatographic technique. Non limiting examples of mass spectrometer techniques may include tandem mass spectrometry (MS/MS), matrix-assisted laser desorption/ionization source with a time-of-flight mass analyzer (MALDI-TOF), inductively coupled plasma-mass spectrometry (ICP-MS), accelerator mass spectrometry (AMS), thermal ionization-mass spectrometry (TIMS), isotope ratio mass spectrometry (IRMS), and spark source mass spectrometry (SSMS). Chromatographic techniques that may be used with MS may include gas chromatography, liquid chromatography, and ion mobility spectrometry. In a preferred embodiment, proteins may be identified using tandem mass spectrometry in combination with liquid chromatography (LC-MS/MS). In another preferred embodiment, post-translational modification of proteins may be identified using tandem mass spectrometry in combination with liquid chromatography (LC-MS/MS).
As described above, proteins isolated with a chromatin of the invention may be labeled, unlabeled or a combination of labeled and unlabeled proteins. As described in Section I(d), if a target chromatin is tagged in a cell or an extract from a cell wherein proteins are metabolically labeled, proteins specifically associated with an isolated target chromatin are metabolically labeled, whereas unlabeled proteins, or proteins comprising a combination of labeled and unlabeled proteins are not specifically associated with the target chromatin. Alternatively, if a target chromatin may be tagged in a cell or an extract from a cell wherein proteins are not metabolically labeled, proteins specifically associated with an isolated target chromatin are metabolically labeled, whereas unlabeled proteins, or proteins comprising a combination of labeled and unlabeled proteins are not specifically associated with the target chromatin.
When an isolated and identified protein is a combination of labeled and unlabeled protein, the ratio of labeled to unlabeled protein may reflect a ratio at which a metabolically labeled cell sample and an unlabeled cell sample are combined to generate a combined cell sample, or lysates derived from the two cell samples are combined to generate a combined cell lysate. For instance, if a metabolically labeled cell sample and an unlabeled cell sample, or lysates derived from the two cell samples, are combined at a ratio of 1:1, the ratio of labeled to unlabeled isolated protein may be 1:1.
However, since the ratio of labeled to unlabeled isolated protein depends on the rate of exchange of the identified protein during extraction and processing of a cell sample, a ratio of labeled to unlabeled isolated protein may differ from the ratio at which a metabolically labeled cell sample and an unlabeled cell sample are combined to generate a combined cell sample, or lysates derived from the two cell samples are combined to generate a combined cell lysate. For example, if a metabolically labeled cell sample and an unlabeled cell sample, or lysates derived from the two cell samples, are combined at a ratio of 1:1, a ratio of labeled to unlabeled isolated protein may deviate from a ratio of 1:1. As such, a ratio of labeled to unlabeled isolated protein may be compared to a baseline for non-specifically associated proteins. For instance, a baseline for non-specifically associated proteins may be a ratio of labeled to unlabeled of one or more proteins in a combined lysate, wherein the one or more proteins are not associated with a chromatin. Non-limiting examples of proteins not associated with a chromatin may include enzymes required for metabolism, receptors, and ribosomoal proteins. In preferred embodiments, proteins not associated with a chromatin are ribosomal proteins, and a baseline for non-specifically associated proteins is a ratio of a labeled to unlabeled ribosomal protein, or an average of ratios of labeled to unlabeled ribosomal proteins. In a preferred embodiment, proteins not associated with a chromatin are 20 ribosomal proteins, and a baseline for non-specifically associated proteins is an average of ratios of the 20 labeled to unlabeled ribosomal proteins.
Isolated proteins with a ratio of labeled to unlabeled isolated protein may be specifically associated with a chromatin if the ratio of labeled to unlabeled isolated protein is significantly different from a baseline ratio. A significantly different ratio may be a ratio of labeled to unlabeled isolated protein greater than about 1, 2, 3, 4, 5, or more standard deviations than a baseline ratio. In some embodiments, a significantly different ratio is a ratio of labeled to unlabeled isolated protein greater than about 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, or more standard deviations than a baseline ratio. In other embodiments, a significantly different ratio is a ratio of labeled to unlabeled isolated protein greater than about 1, 1.5, 2, or about 2.5 standard deviations than a baseline ratio. In preferred embodiments, a significantly different ratio is a ratio of labeled to unlabeled isolated protein greater than about 1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9 or about 3 standard deviations than a baseline ratio. In exemplary embodiments, a significantly different ratio is a ratio of labeled to unlabeled isolated protein greater than about 1.5, 1.6, 1.7, 1.8, 1.9, 2, 2.1, 2.2, 2.3, 2.4, or about 2.5 standard deviations than a baseline ratio.
Methods of determining if a protein or a protein fragment is labeled can and will vary depending on the type of label. For instance, if a protein is labeled using a tag, labeling may be determined using methods designed to detect the tag. For example, determining if a protein comprising a his-tag is tagged, untagged, or a combination of tagged and untagged may be by detecting the proteins comprising the his tag. If a protein is labeled using a radioactive isotope, labeling may be determined by determining the degree of radioactivity of isolated proteins or protein fragments. Alternatively, if a protein is labeled using a heavy isotope, MS analysis may be used to determine if a protein or a protein fragment is labeled or unlabeled. Advantageously, when a protein is labeled using a heavy isotope, MS analysis may be used to identify a protein or a protein fragment as described above, and to derive the MS data to determine if a protein or a protein fragment is labeled, unlabeled, or a combination of labeled and unlabeled protein or protein fragment.
In preferred embodiments, a protein is labeled using a heavy isotope, and MS analysis is used to identify a protein or a protein fragment, and to determine if a protein or a protein fragment is labeled, unlabeled, or a combination of labeled and unlabeled protein or protein fragment. Methods of deriving MS data to determine if a protein or a protein fragment is labeled, unlabeled, or a combination of labeled and unlabeled protein or protein fragment are known in the art, and may include using known computational techniques to distill MS data such as Mascot Distiller, Rosetta Elucidator, and MaxQuant. In some embodiments, MS data is derived using Rosetta Elucidator. In other embodiments, MS data is derived using MaxQuant. In preferred embodiments, MS data is derived using Mascot Distiller.
A method of the invention may be used for any application wherein a determination of chromatin structure or function may be required. For instance, a method of the invention may be used to determine rearrangement in chromatin structure, genome metabolism, epigenetic regulatory mechanisms, transient association of proteins with chromatin, initiation or silencing of expression of a nucleic acid sequence, identify proteins transiently associated with a chromatin, or post-translational modification of proteins associated with a chromatin or chromatin rearrangement. An application of a method of the invention may include determining changes in chromatin function and structure in response to changing growth conditions, exposure to a drug or small molecule, or during stages of cell cycles.
In some embodiments, a method of the invention is used to determine differences in chromatin structure and function between a transcriptionally silent and a transcriptionally active state of a genomic locus. As such, proteins specifically associated with a genomic locus, and post-translational modifications of proteins associated with a chromatin comprising the genomic locus may be determined in cells comprising a transcriptionally silent state of a genomic locus, and in cells comprising a transcriptionally active state of a genomic locus. In preferred embodiments, a method of the invention is used to determine differences in chromatin structure and function between a transcriptionally silent and a transcriptionally active state of a Saccharomyces cerevisiae GAL1 genomic locus.
In some aspects, the invention provides methods of isolating and identifying proteins specifically associated with a target chromatin. As described in Example 4 and
To determine which of the identified proteins and posttranslational modifications of proteins associated with a target chromatin isolated from a cell are specifically or non-specifically associated with the target chromatin, a method of high-resolution mass spectrometry coupled with label-free proteomics was used. One with skill in the art will appreciate that label-free quantitative proteomics methods include the following fundamental steps: (i) sample preparation including protein extraction, reduction, alkylation, and digestion; (ii) sample separation by liquid chromatography (LC or LC/LC) and analysis by MS/MS; (iii) data analysis including peptide/protein identification, quantification, and statistical analysis. A method of the invention provides two cell samples, or lysates derived from two cell samples, comprising the target chromatin, wherein the target chromatin in one cell sample, but not both of the cell samples is tagged. With label-free quantitiative methods, each sample is separately prepared, then subjected to individual LC-MS/MS or LC/LC-MS/MS runs. As reviewed in Zhu et al., J Biomed Biotechnol 2010, and incorporated by reference herein, protein quantification is generally based on two categories of measurements. In the first are the measurements of ion intensity changes such as peptide peak areas or peak heights in chromatography. The second is based on spectral counting of identified proteins after MS/MS analysis. Peptide peak intensity or spectral count is measured for individual LC-MS/MS or LC/LC-MS/MS runs and changes in protein abundance are calculated via a direct comparison between different analyses.
In the present invention, the method of spectral counting is used to categorize whether proteins enriched with a section of chromatin are specific or contaminant. As such, determining the abundance of an identified protein in a tagged chromatin sample compared to the same protein in an untagged chromatin sample, may determine if the protein was specifically associated with the target chromatin of the invention. If a protein associated with a target chromatin is enriched in a tagged chromatin sample compared to the same protein in an untagged chromatin sample, then the protein is specifically associated with the target chromatin. If an identified protein is not enriched in a tagged chromatin sample compared to an untagged chromatin sample, then association of that protein with the target chromatin is not specific.
In the present invention, to measure enrichment of a protein, the normalized spectral abundance factor (NSAF) is calculated for each protein in each lane of an SDS-PAGE gel by dividing the number of spectral counts (normalized for the size of the protein) of a given protein by the sum of all normalized spectral counts of all proteins in the gel lane. The enrichment level for each protein is identified by calculating the fold enrichment (tagged chromatin/untagged chromatin) using the NSAF values.
A target nucleic acid sequence may be isolated from any cell comprising the target nucleic acid sequence of the invention. A cell may be an archaebacterium, a eubacterium, or a eukaryotic cell. For instance, a cell of the invention may be a methanogen, a halophile or a thermoacidophile archaeabacterium, a gram positive, a gram negative, a cyanobacterium, a spirochaete, or a firmicute bacterium, a fungal cell, a moss cell, a plant cell, an animal cell, or a protist cell.
In some embodiments, a cell of the invention is a cell from an animal. A cell from an animal cell may be a cell from an embryo, a juvenile, or an adult. Suitable animals include vertebrates such as mammals, birds, reptiles, amphibians, and fish. Examples of suitable mammals include without limit rodents, companion animals, livestock, and primates. Non-limiting examples of rodents include mice, rats, hamsters, gerbils, and guinea pigs. Suitable companion animals include but are not limited to cats, dogs, rabbits, hedgehogs, and ferrets. Non-limiting examples of livestock include horses, goats, sheep, swine, cattle, llamas, and alpacas. Suitable primates include but are not limited to humans, capuchin monkeys, chimpanzees, lemurs, macaques, marmosets, tamarins, spider monkeys, squirrel monkeys, and vervet monkeys. Non-limiting examples of birds include chickens, turkeys, ducks, and geese. In some embodiments, a cell is a cell from a human.
In some embodiments, a cell may be from a model organism commonly used in laboratory research. For instance, a cell of the invention may be an E. coli, a Bacillus subtilis, a Caulobacter crescentus, a Mycoplasma genitalium, an Aliivibrio fischeri, a Synechocystis, or a Pseudomonas fluorescens bacterial cell; a Chlamydomonas reinhardtii, a Dictyostelium discoideum, a Tetrahymena thermophila, an Emiliania huxleyi, or a Thalassiosira pseudonana protist cell; an Ashbya gossypii, an Aspergillus nidulans, a Coprinus cinereus, a Cunninghamella elegans, a Neurospora crassa, a Saccharomyces cerevisiae, a Schizophyllum commune, a Schizosaccharomyces pombe, or an Ustilago maydis fungal cell; an Arabidopsis thaliana, a Selaginella moellendorffii, a Brachypodium distachyon, a Lotus japonicus, a Lemna gibba, a Zea mays, a Medicago truncatula, a Mimulus, a tobacco, a rice, a Populus, or a Nicotiana benthamiana plant cell; a Physcomitrella patens moss; an Amphimedon queenslandica sponge, an Arbacia punctulata sea urchin, an Aplysia sea slug, a Branchiostoma floridae deuterostome, a Caenorhabditis elegans nematode, a Ciona intestinalis sea squirt, a Daphnia spp. crustacean, a Drosophila fruit fly, a Euprymna scolopes squid, a Hydra Cnidarian, a Loligo pealei squid, a Macrostomum lignano flatworm, a Mnemiopsis leidyicomb jelly, a Nematostella vectensis sea anemone, an Oikopleura dioica free-swimming tunicate, an Oscarella carmela sponge, a Parhyale hawaiensis crustacean, a Platynereis dumerilii marine polychaetous annelid, a Pristionchus pacificus roundworm, a Schmidtea mediterranea freshwater planarian, a Stomatogastric ganglion of various arthropod species, a Strongylocentrotus purpuratus sea urchin, a Symsagittifera roscoffensis flatworm, a Tribolium castaneum beetle, a Trichoplax adhaerens Placozoa, a Tubifex tubifex oligochaeta, a laboratory mouse, a Guinea pig, a Chicken, a Cat, a Dog, a Hamster, a Lamprey, a Medaka fish, a Rat, a Rhesus macaque, a Cotton rat, a Zebra finch, a Takifugu pufferfish, an African clawed frog, or a Zebrafish. In exemplary embodiments, a cell is a Saccharomyces cerevisiae yeast cell. In particularly exemplary embodiments, a cell is a Saccharomyces cerevisiae W303a yeast cell.
A cell of the invention may be derived from a tissue or from a cell line grown in tissue culture. A cell line may be adherent or non-adherent, or a cell line may be grown under conditions that encourage adherent, non-adherent or organotypic growth using standard techniques known to individuals skilled in the art. Cell lines and methods of culturing cell lines are known in the art. Non-limiting examples of cell lines commonly cultured in a laboratory may include HeLa, a cell line from the National Cancer Institute's 60 cancer cell lines, DU145 (prostate cancer), Lncap (prostate cancer), MCF-7 (breast cancer), MDA-MB-438 (breast cancer), PC3 (prostate cancer), T47D (breast cancer), THP-1 (acute myeloid leukemia), U87 (glioblastoma), SHSYSY Human neuroblastoma cells, Saos-2 cells (bone cancer), Vero, GH3 (pituitary tumor), PC12 (pheochromocytoma), MC3T3 (embryonic calvarium), Tobacco BY-2 cells, Zebrafish ZF4 and AB9 cells, Madin-Darby canine kidney (MDCK), or Xenopus A6 kidney epithelial cells.
As described in Section (II) above, two cell samples, or lysates derived from two cell samples may be subjected to mass-spectrometry coupled with label-free proteomics, one sample of which contains a tagged target chromatin of the invention. Typically, cells in two cell samples of the invention are from the same type of cells or they may be derived from the same type of cells. For instance, cells may comprise a heterologous protein expressed in a cell of the invention. The heterologous protein expressed in a cell may be used for tagging a target chromatin as described in Section II(d). In some embodiments, cells in two cell samples of the invention are from the same type of cells. In other embodiments, cells in the first cell sample are derived from the same cell type as cells in the second cell sample.
Two cell samples of the invention may be from the same genus, species, variety or strain of cells. In preferred embodiments, two cell samples of the invention are Saccharomyces cerevisiae yeast cells or derivatives of Saccharomyces cerevisiae yeast cells. In exemplary embodiments, two cell samples of the invention are Saccharomyces cerevisiae W303a yeast cells or derivatives of Saccharomyces cerevisiae W303a yeast cells. In exemplary embodiments, two cell samples of the invention are derivatives of Saccharomyces cerevisiae W303a yeast cells, wherein protein A tagged transcription activator-like (TAL) protein engineered to bind upstream of the GAL1 transcription start site is expressed in one of the cell samples of derived Saccharomyces cerevisiae W303a yeast cells.
The number of cells in a cell sample can and will vary depending on the type of cells, the abundance of a target chromatin in a cell, and the method of protein identification used, among other variables. For instance, if a cell of the invention is Saccharomyces cerevisiae, about 5×1011 to about 5×1012, more preferably, about 1×1011 to about 1×1012 cells may be used in a cell sample. In some embodiments, about 1×1011 to about 1×1012 Saccharomyces cerevisiae cells are used in a cell sample.
Two cell samples of the invention are typically grown identically. Identically grown cell samples minimizes potential structural or functional differences at a target chromatin present in both cell samples. As used herein, “grown identically” refers to cultured cell samples grown using similar culture condition, or cells from a tissue harvested using identical harvesting techniques.
A method of the invention comprises identification of a protein and post-translational modification of a protein associated with a target chromatin. Generally, chromatin refers to the combination of nucleic acids and proteins in the nucleus of a eukaryotic cell. However, it is contemplated that the term “chromatin” may also refer to the combination of any nucleic acid sequence and proteins associated with the nucleic acid sequence in any cell.
Chromatin of the invention may be as described in Section I(b) above.
A target chromatin is isolated from a cell lysate derived from a cell sample, wherein a target chromatin is tagged in the cell sample. The method of isolating a target chromatin is also performed on a cell lysate derived from a cell sample, wherein a target chromatin is untagged in the cell sample. A skilled practitioner of the art will appreciate that structural and functional features of a target chromatin must be preserved during cell lysis and isolation of the target chromatin. The association of proteins with a target chromatin may be preserved during cell lysis and isolation of the target chromatin using methods known in the art for preserving a complex of proteins with a nucleic acid sequence. For instance, lysing of a cell and isolation of a target chromatin may be performed under refrigeration or using cryogenic methods and buffer conditions capable of preserving association of proteins and nucleic acid sequences. In addition, a complex of proteins with a nucleic acid may be preserved by crosslinking protein and nucleic acid complexes in a cell prior to lysing and isolating a chromatin. Crosslinking protein and nucleic acid complexes in a cell may also capture, or preserve, transient protein-protein and protein-nucleic acid interactions.
In some embodiments, a complex of proteins with a nucleic acid may be preserved by crosslinking protein and nucleic acid complexes in a chromatin prior to lysing a cell and isolating the chromatin. Crosslinking is the process of joining two or more molecules such as two proteins or a protein and a nucleic acid molecule, by a covalent bond. Molecules may be crosslinked by irradiation with ultraviolet light, or by using chemical crosslinking reagents. Chemical crosslinking reagents capable of crosslinking proteins and nucleic acids are known in the art and may include crosslinking reagents that target amines, sulfhydryls, carboxyls, carbonyls or hydroxyls; omobifunctional or heterobifunctional crosslinking reagent, variable spacer arm length or zero-length crosslinking reagents, cleavable or non-cleavable crosslinking reagents, and photoreactive crosslinking reagents. Non-limiting examples of crosslinking reagents that may be used to crosslink protein complexes and/or protein complexes and nucleic acids may include formaldehyde, glutaraldehyde, disuccinimidyl glutarate, disuccinimidyl suberate, a photoreactive amino acid such as photo-leucine or photo-methionine, and succinimidyl-diazirine. The degree of crosslinking can and will vary depending on the application of a method of the invention, and may be experimentally determined.
In a preferred embodiment, a complex of proteins with a nucleic acid in a chromatin of the invention may be preserved by crosslinking protein and nucleic acid complexes in a cell prior to lysing using formaldehyde. In an exemplary embodiment, a complex of proteins with a nucleic acid in a chromatin of the invention may be preserved by crosslinking protein and nucleic acid complexes in a cell prior to lysing using formaldehyde as described in the examples.
A skilled practitioner of the art will appreciate that protocols for lysing a cell can and will vary depending on the type of cell, the target chromatin of the invention, and the specific application of a method of the invention. Non limiting examples of methods that may be used to lyse a cell of the invention may include cell lysis using a detergent, an enzyme such as lysozyme, incubation in a hypotonic buffer which causes a cell to swell and burst, mechanical disruption such as liquid homogenization by forcing a cell through a narrow space, sonication, freeze/thaw, mortar and pestle, glass beads, and combinations thereof. In some embodiments, when a cell of the invention is a yeast cell, the cell may be cryogenically lysed under liquid nitrogen temperature with glass beads. In exemplary embodiments, when a cell of the invention is a yeast cell, the cell may be cryogenically lysed under liquid nitrogen temperature with glass beads as described in the examples.
Buffer conditions used during lysing and isolation of a chromatin of the invention can and will be altered to control stringent conditions during cell lysis and isolation to preserve association of proteins and nucleic acid sequences of a chromatin. “Stringent conditions” in the context of chromatin isolation are conditions capable of preserving specific association of proteins and nucleic acids of a chromatin, but minimizing non-specific association of proteins and nucleic acids. Stringent condition can and will vary depending on the application of a method of the invention, the target chromatin of the invention, the nucleic acid sequence in a target chromatin, the proteins or protein complexes associated with a target chromatin of the invention, whether or not proteins, protein complexes and nucleic acid sequences are crosslinked, and the conditions used for crosslinking proteins, protein complexes and nucleic acid sequences of a target chromatin. For instance, more stringent buffer conditions may be used in a method of the invention wherein proteins, protein-protein complexes, and protein-nucleic acid complexes are crosslinked compared to a method of the invention wherein proteins, protein-protein complexes, and protein-nucleic acid complexes are not crosslinked. As such, stringent buffer conditions used during cell lysis and isolation of a nucleic acid sequence of the invention may be experimentally determined for each application wherein a method of the invention is used. Buffer conditions that may alter stringent conditions during cell lysis and isolation may include pH and salt concentration. In preferred embodiments, proteins, protein-protein complexes, and protein-nucleic acid complexes of a target chromatin of the invention are crosslinked, and stringent buffer conditions are used during lysis and isolation of a chromatin of the invention. In exemplary embodiments, proteins, protein-protein complexes, and protein-nucleic acid complexes of a target chromatin of the invention are crosslinked, and stringent buffer conditions are used during lysis and isolation of a chromatin of the invention and are as described in the examples.
According to the invention, the method of isolating a target chromatin is performed on cell lysates derived from cell samples, wherein one sample comprises a target chromatin that is tagged in the cell sample and one sample comprises a target chromatin that is untagged in the cell sample. As described in Sections II(a) and II(c) above, a cell lysate comprises a lysate of a cell sample, wherein a target chromatin is tagged in one of the lysates, or one of the cell samples. A cell lysate also comprises a lysate of a cell sample, wherein a target chromatin is not tagged in one of the lysates, or one of the cell samples.
A target chromatin may be isolated from a mixture of chromatins or chromatin fragments in a cell lysate as described in this section. As used herein, a target nucleic acid sequence is said to be “isolated” or “purified” when it is substantially free of proteins not associated with the target chromatin, nucleic acid sequences other than the nucleic acid sequences associated with the target chromatin, and other cell debris and cell contents resulting from extraction and preparation of the target chromatin from a cell. A target chromatin of the present invention may be purified to homogeneity or other degrees of purity. In general, the level of purity of an isolated target chromatin can and will vary depending on the cell type, the specific chromatin to be isolated, and the intended use of a target chromatin of the invention. The level of purity of an isolated target chromatin may be determined using methods known in the art. For instance, the level of purity of an isolated target chromatin may be determined by determining the level of purity of a nucleic acid sequence associated with a target chromatin, by determining the level of purity of a protein associated with a target chromatin, or by determining the level of enrichment of a target chromatin, compared to a non-target chromatin in a cell. In preferred embodiments, the level of purity of an isolated target chromatin is determined by determining the level of enrichment of a target chromatin, compared to a non-target chromatin in a cell. Determining the level of enrichment of a target chromatin, compared to a non-target chromatin in a cell may be as described in this section below.
A target chromatin of the invention may be isolated using methods known in the art, such as electrophoresis, molecular, immunological and chromatographic techniques, ion exchange, hydrophobic, affinity, and reverse-phase HPLC chromatography, size exclusion chromatography, precipitation, dialysis, chromatofocusing, ultrafiltration and diafiltration techniques, and combinations thereof. For general guidance in suitable purification techniques, see Scopes, R., Protein Purification, Springer-Vertag, NY (1982).
In general, a method of the invention comprises isolating a target chromatin by affinity purification, or affinity purification in combination with other methods of isolating chromatin described above. In a preferred embodiment, a method of the invention comprises isolating a target chromatin by affinity purification. Non limiting examples of affinity purification techniques that may be used to isolate a target chromatin of the invention may include affinity chromatography, immunoaffinity chromatography, size exclusion chromatography, and combinations thereof. See, for example, Roe (ed), Protein Purification Techniques: A Practical Approach, Oxford University Press, 2nd edition, 2001.
In essence, affinity purification of a target chromatin may comprise tagging a target chromatin by contacting the target chromatin of the invention with a tag capable of specifically recognizing and binding one or more portions of a target chromatin. As described in Section (II), a target chromatin from one cell sample, or lysate derived from the cell sample of the invention, but not both of the cell samples, is tagged.
A tag may be capable of specifically recognizing and binding 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 components of a target chromatin. In preferred embodiments, a tag is capable of specifically recognizing and binding one component of a target chromatin.
A tag may be capable of specifically recognizing and binding a component in a target chromatin. A component in a target chromatin may be a nucleic acid sequence in a nucleic acid associated with a target chromatin, a protein associated with a target chromatin, or a chromatin structural or functional feature in a target chromatin. In some embodiments, a tag is capable of specifically recognizing and binding a protein associated with a target chromatin. In other embodiments, a tag is capable of specifically recognizing and binding a chromatin structural or functional feature in a target chromatin. In preferred embodiments, a tag is capable of specifically recognizing and binding a nucleic acid sequence associated with a target chromatin.
A nucleic acid sequence associated with a target chromatin that may be specifically recognized and bound by a tag of the invention may be a nucleic acid sequence normally found in a chromatin of a cell of the invention. In some embodiments, a nucleic acid sequence that may be recognized and bound by a tag is a nucleic acid sequence normally found in a chromatin of a cell of the invention.
Individuals of ordinary skill in the art will recognize that an exogenous chromatin component introduced into a cell to facilitate tagging a target chromatin of the invention cannot and will not disrupt a target chromatin, or a structural or functional feature of a target chromatin. Methods of designing a chromatin component and a tag capable of binding the chromatin component that do not disrupt a chromatin of the invention may depend on the particular application of a method of the invention, and may be determined experimentally. For instance, if an application of a method of the invention comprises promoter function, a tag may be designed to bind anywhere adjacent to the promoter, but without disrupting the promoter.
A tag of the invention may further comprise one or more affinity handles. As used herein, the term “affinity handle” may refer to any handle that may be bound by a substrate for affinity purification, as described below. A tag may comprise one or more than one affinity handle. The inclusion of more than one affinity handle in a tag of the invention may significantly increase the efficiency of affinity purification for a low copy number chromatin target. As such, a tag may further comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more affinity handles. In a preferred embodiment, a tag of the invention comprises one affinity handle.
Affinity handles may include any affinity handle for which a cognate binding agent is readily available. An affinity handle may be an aptamer, an antibody, an antibody fragment, a double-stranded DNA sequence, modified nucleic acids and nucleic acid mimics such as peptide nucleic acids, locked nucleic acids, phosphorodiamidate morpholino oligomers (PMO), a ligand, a ligand fragment, a receptor, a receptor fragment, a polypeptide, a peptide, a coenzyme, a coregulator, an allosteric molecule, non-immunoglobulin scaffolds such as Affibodies, Anticalins, designed Ankyrin repeat proteins and others, an ion, or a small molecule for which a cognate binding agent is readily available. The term “aptamer” refers to a polypeptide or a polynucleotide capable of binding to a target molecule at a specific region. It is generally accepted that an aptamer, which is specific in its binding to any polypeptide, may be synthesized and/or identified by in vitro evolution methods. Non limiting examples of handles that may be suitable for isolating a chromatin may include biotin or a biotin analogue such as desthiobiotin, digoxigenin, dinitrophenol or fluorescein, a macromolecule that binds to a nucleic acid or a nucleic acid binding protein such as the Lac repressor, a zinc finger protein, a transcription activator protein capable of binding a nucleic acid, or a transcription activator-like (TAL) protein, antigenic polypeptides such as protein A, or peptide ‘tags’ such as polyhistidine, FLAG, HA and Myc tags. In preferred embodiments, a tag of the invention comprises an antigenic polypeptide. In other preferred embodiments, a tag of the invention comprises the protein A tagged TAL protein, or derivatives thereof. The TAL protein can be engineered to have binding specificity for a nucleic acid sequence component of a target chromatin of the invention. As such, TAL may be used as an affinity purification tag for purifying a target chromatin. Protein A comprises an affinity handle capable of binding IgG. In exemplary embodiments, a tag of the invention comprises the protein A tagged TAL protein engineered to bind upstream of the GAL1 transcription start site.
In some embodiments, a tag of the invention is a nucleic acid tag capable of binding a nucleic acid sequence component of a chromatin, wherein the nucleic acid sequence component of the chromatin is introduced into a cell of the invention. In some embodiments, a tag of the invention is a nucleic acid tag capable of binding a nucleic acid sequence component of a chromatin, wherein the nucleic acid sequence component of the chromatin is normally present in a cell of the invention. Non-limiting examples of nucleic acid tags capable of binding a nucleic acid sequence component of a chromatin include antisense RNA or DNA nucleic acid tags, and tags comprising modified nucleic acids and nucleic acid mimics such as peptide nucleic acids, locked nucleic acids, phosphorodiamidate morpholino oligomers (PMO). In some embodiments, a tag of the invention is a nucleic acid tag comprising locked nucleotides. For instance, a nucleic acid tag comprising locked nucleotides may be as described in US20110262908 or US20120040857, and a peptide nucleic acid tag may be as described in Boffa et al. 1995 PNAS 92:1901-1905, the disclosures of all of which are incorporated herein in their entirety.
In some preferred embodiments, a tag of the invention is a protein tag capable of binding a nucleic acid sequence component of a chromatin, wherein the nucleic acid sequence component of the chromatin is a nucleic acid sequence normally found in a chromatin of a cell of the invention. Non limiting examples of a protein tag capable of binding a nucleic acid sequence normally found in a chromatin of a cell may be a nucleic acid binding protein such as protein A, the Lac repressor, a zinc finger protein, a transcription activator protein capable of binding a nucleic acid, or a transcription activator-like (TAL) protein. In one embodiment, a tag of the invention is a transcription activator protein capable of binding a nucleic acid sequence normally found in a chromatin of a cell of the invention. In another embodiment, a tag of the invention is a zinc finger protein capable of binding a nucleic acid sequence normally found in a chromatin of a cell of the invention. In an exemplary embodiment, a tag of the invention is a protein A tagged transcription activator-like (TAL) protein capable of binding a nucleic acid sequence normally found in a chromatin of a cell of the invention.
A nucleic acid binding protein tag of the invention may be a wild type nucleic acid binding protein capable of binding a nucleic acid sequence normally found in a target chromatin. Alternatively, a nucleic acid binding protein tag of the invention may be engineered to have binding specificity for a nucleic acid sequence component normally found in a target chromatin of the invention. Individuals of ordinary skill in the art will recognize that nucleic acid binding proteins such as zinc finger proteins, transcription activator proteins, and transcription activator-like (TAL) proteins may be engineered to have novel nucleic acid binding specificity compared to naturally-occurring forms of the proteins. See, for example, U.S. Pat. Nos. 6,453,242 and 6,534,261, and U.S. Pate. Appl. Nos 20110239315, 20120110685, and 20120270273, the disclosures of which are incorporated by reference herein in their entireties. In some embodiments, a nucleic acid binding protein tag of the invention is a wild type nucleic acid binding protein capable of binding a nucleic acid sequence normally found in a target chromatin. In other embodiments, a nucleic acid binding protein tag of the invention is a nucleic acid binding protein engineered to have binding specificity for a nucleic acid sequence component of a target chromatin of the invention. In a preferred embodiment, a nucleic acid binding protein tag of the invention is a zinc finger protein engineered to have binding specificity for a nucleic acid sequence component of a target chromatin of the invention. In an exemplary embodiment, a nucleic acid binding protein tag of the invention is a TAL protein engineered to have binding specificity for a nucleic acid sequence component of a target chromatin of the invention.
In other preferred embodiments, a tag of the invention is a protein tag capable of binding a nucleic acid sequence component of a chromatin, wherein the nucleic acid sequence component of the chromatin is a nucleic acid sequence normally found in a cell of the invention. In exemplary embodiments, a tag of the invention is a protein A tagged TAL protein capable of binding a nucleic acid sequence normally found in a cell of the invention. In an exemplary embodiment, a tag of the invention is a protein A tagged TAL protein capable of binding a nucleic acid sequence normally found in a cell upstream of the transcriptional start site of the GAL1 promoter of a S. cereviseae cell as described in the examples.
A target chromatin may be contacted with a tag at any time during a method of the invention leading to isolation of target chromatin. For instance, a target chromatin may be contacted with a protein tag during cell culture by expressing the protein tag in a cell of the invention. Alternatively, a target chromatin may be contacted with a tag after cell culture but before cell lysis, after cell lysis, or after fragmentation of chromatin to generate chromatin fragments comprising a target chromatin.
In some embodiments, a target chromatin is contacted with a tag after cell culture but before cell lysis. As such, a tag may be introduced into a cell before cell lysis. Methods of introducing a tag into a cell of the invention can and will vary depending on the type of cell, the tag, and the application of a method of the invention. For instance, a nucleic acid tag may be electroporated into a cell after culture. In other embodiments, a target chromatin is contacted with a tag after cell lysis. In yet other embodiments, a target chromatin is contacted with a tag after cell lysis and chromatin fragmentation. In preferred embodiments, a target chromatin is contacted with a tag during cell culture by expressing the tag in a cell of the invention during cell culture. In exemplary embodiments, a target chromatin is contacted with a protein A tagged TAL protein during cell culture by expressing the protein A tagged TAL protein in a cell of the invention during cell culture. In an exemplary embodiment, a target chromatin is contacted with a protein A tagged TAL protein during cell culture by expressing the protein A tagged TAL protein in a yeast cell of the invention during cell culture as described in the examples.
A target chromatin contacted and bound by a tag as described above may be isolated using an affinity handle of the tag. The term “isolated”, may be used herein to describe a purified preparation of a target chromatin that is enriched for the target chromatin, but wherein the target chromatin is not necessarily in a pure form. That is, an isolated target chromatin is not necessarily 100% pure, but may be about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% or 90% pure. An isolated target chromatin may be enriched for the target chromatin, relative to a chromatin in the lysed preparation that was not contacted by a tag of the invention. An isolated target chromatin may be enriched by 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 fold relative to a chromatin that is not contacted by a tag of the invention. In some embodiments, an isolated target chromatin is enriched by 2, 3, 4, or 5 fold relative to a chromatin that was not contacted by a tag of the invention. In other embodiments, an isolated target chromatin is enriched by 5, 6, 7, 8, 9, or 10 fold relative to a chromatin that was not contacted by a tag of the invention. In an exemplary embodiment, an isolated target chromatin is enriched 4, 5, or 6 fold relative to a chromatin that was not contacted by a tag of the invention.
A target chromatin contacted and bound by a tag as described above may be isolated using any affinity purification method known in the art. In short, a tagged target chromatin is bound to a substrate capable of binding the affinity handle. The substrate comprising a bound target chromatin may then be washed to remove non-target chromatin and other cell debris, and the target chromatin may be released from substrate. Methods of affinity purification of material comprising an affinity handle are known in the art and may include binding the affinity handle to a substrate capable of binding the affinity handle. The substrate may be a gel matrix such as gel beads, the surface of a container, or a chip. The tagged target chromatin bound to the substrate may then be purified. Methods of purifying tagged molecules are known in the art and will vary depending on the target molecule, the tag, and the substrate. For instance, if the tag is a TAL-protein A tag bound to a site in a target chromatin, the target chromatin may be bound to a magnetic bead substrate comprising IgG, and purified using a magnet.
Proteins and peptides associated with an isolated target chromatin are extracted from the isolated target chromatin. Methods of extracting proteins from chromatin are generally known in the art of protein biochemistry. Generally, any extraction protocol suitable for isolating proteins and known to those of skill in the art may be used. Extracted proteins may also be further purified before protein identification. For instance, protein extracts may be further purified by differential precipitation, differential solubilization, ultracentrifugation, using chromatographic methods such as size exclusion chromatography, hydrophobic interaction chromatography, ion exchange chromatography, affinity chromatography, metal binding, immunoaffinity chromatography, HPLC, or gel electrophoriesis such as SDS-PAGE and QPNC-PAGE. In a preferred embodiment, extracted proteins are further purified using SDS-PAGE.
Extracted and purified intact proteins and post-translational modification of proteins may then be identified. Alternatively, extracted and purified intact proteins may be further digested, and the resulting peptide fragments are identified. In some embodiments, intact extracted proteins are identified. In preferred embodiments, extracted proteins are further digested, and the resulting peptide fragments are identified. For instance, protein extracts may be fragmented by enzymatically digesting the proteins using a protease such as trypsin. In exemplary embodiments, extracted proteins are further digested as described in the examples.
Methods of identifying proteins or protein fragments are known in the art and may include mass spectrometry (MS) analysis, or a combination of mass spectrometry with a chromatographic technique. Non limiting examples of mass spectrometer techniques may include tandem mass spectrometry (MS/MS), matrix-assisted laser desorption/ionization source with a time-of-flight mass analyzer (MALDI-TOF), inductively coupled plasma-mass spectrometry (ICP-MS), accelerator mass spectrometry (AMS), thermal ionization-mass spectrometry (TIMS), isotope ratio mass spectrometry (IRMS), and spark source mass spectrometry (SSMS). Chromatographic techniques that may be used with MS may include gas chromatography, liquid chromatography, and ion mobility spectrometry. In a preferred embodiment, proteins may be identified using tandem mass spectrometry in combination with liquid chromatography (LC-MS/MS). In another preferred embodiment, post-translational modification of proteins may be identified using tandem mass spectrometry in combination with liquid chromatography (LC-MS/MS).
In the present invention, the method of label-free proteomics is used to categorize whether proteins enriched with a section of chromatin are specific or contaminant. Label-free methods of quantifying proteins or protein fragments are known in the art. In label-free quantitative proteomics, each sample is separately prepared, then subjected to individual methods of identifying proteins or protein fragments which may include LC-MS/MS or LC/LC-MS/MS. According to the invention, one sample comprises a target chromatin that is tagged in the cell sample and one sample comprises a target chromatin that is untagged in the cell sample. Label-free protein quantification is generally based on two categories of measurement. In the first are the measurements of ion intensity changes such as peptide peak areas or peak heights in chromatography. The second is based on the spectral counting of identified proteins after MS/MS analysis. Peptide peak intensity or spectral count is measured for individual LC-MS/MS or LC/LC-MS/MS runs and changes in protein abundance are calculated via a direct comparison between different analyses. In a preferred embodiment, the proteins identified using mass spectrometry are quantified and identified as enriched in the sample containing the tagged target chromatin compared to the sample containing the untagged target chromatin using label-free proteomics. In an exemplary embodiment, the proteins identified using mass spectrometry are quantified and identified as enriched in the sample containing the tagged target chromatin compared to the sample containing the untagged target chromatin using spectral counting.
The method of protein quantification by spectral count is known in the art and is reviewed in Zhu et al., J Biomed Biotechnol 2010, which is incorporated by reference herein. In spectral counting, relative protein quantification is achieved by comparing the number of identified MS/MS spectra from a protein of one sample to the same protein in the other sample. In the present invention, one sample comprises a target chromatin that is tagged and another sample comprises a target chromatin that is untagged. Protein quantification in spectral counting utilizes the fact that an increase in protein abundance typically results in an increase in the number of its proteolytic peptides, and vice versa. This increased number of (tryptic) digests then usually results in an increase in protein sequence coverage, the number of identified unique peptides, and the number of identified total MS/MS spectra (spectral count) for each protein.
As such, determining the abundance of an identified protein in a tagged chromatin sample compared to the same protein in an untagged chromatin sample, may determine if the protein was specifically associated with a target chromatin of the invention. If an identified protein associated with a target chromatin is in enriched in a tagged chromatin sample compared to the same protein in an untagged chromatin sample, then the protein was specifically associated with a target chromatin of the invention. If an identified protein is not enriched in a tagged chromatin sample compared to an untagged chromatin sample, then the protein is non-specifically associated with a target chromatin of the invention.
A skilled artisan in spectral counting will appreciate that normalization and statistical analysis of spectral counting datasets are necessary for accurate and reliable detection of protein changes. Since large proteins tend to contribute more peptide/spectra than small ones, a normalized spectral abundance factor (NSAF) is defined to account for the effect of protein length on spectral count. NSAF is calculated as the number of spectral counts (SpC) identifying a protein, divided by the protein's length (L), divided by the sum of SpC/L for all proteins in the experiment. NSAF allows the comparison of abundance of individual proteins in multiple independent samples and has been applied to quantify the expression changes in various complexes.
In the present invention, to measure enrichment of a protein, the normalized spectral abundance factor (NSAF) is calculated for each protein in each lane of an SDS-PAGE gel by dividing the number of spectral counts (normalized for the size of the protein) of a given protein by the sum of all normalized spectral counts of all proteins in the gel lane. The enrichment level for each protein is identified by calculating the fold enrichment (tagged chromatin/untagged chromatin) using the NSAF values. In an exemplary embodiment, proteins enriched in a sample containing a tagged target chromatin compared to a sample containing an untagged target chromatin are enriched by at least about 2 fold. In other embodiments, proteins enriched in a sample containing a tagged target chromatin compared to a sample containing the untagged target chromatin are enriched by at least about 1.5 fold. In other embodiments, proteins enriched in a sample containing a tagged target chromatin compared to a sample containing an untagged target chromatin are enriched by at least about 3 fold, about 4 fold, about 5 fold, about 6 fold, about 7 fold, about 8 fold, about 9 fold, about 10 fold, about 11 fold, about 12 fold, about 13 fold, about 14 fold, about 15 fold, about 16 fold, about 17 fold, about 18 fold, about 19 fold or about 20 fold. As such, a protein enriched by at least about 2 fold in a tagged chromatin sample compared to an untagged chromatin sample, is specifically associated with the chromatin. For instance, a baseline for non-specifically associated proteins may be proteins enriched by less than about 1.5 fold in a tagged chromatin sample compared to an untagged chromatin sample, wherein one or more proteins are not associated with chromatin. Non-limiting examples of proteins not associated with a chromatin may include enzymes required for metabolism, receptors, and ribosomal proteins. In preferred embodiments, proteins not associated with a chromatin are ribosomal proteins, and a baseline for non-specifically associated proteins is an enrichment less than about 1.5 fold in a tagged chromatin sampled compared to an untagged chromatin sample. In an exemplary embodiment, proteins or protein fragments enriched by at least 15 fold in a tagged chromatin sample compared to an untagged chromatin sample are specifically associated with a target chromatin.
In preferred embodiments, a target chromatin is tagged in one cell sample and a target chromatin is untagged in a second cell sample, and MS analysis is used to identify proteins or protein fragments isolated during affinity purification of each sample, and label-free proteomics is used to determine if a protein or a protein fragment is specifically or non-specifically associated with the target chromatin. Methods of deriving MS data to identify proteins or protein fragments are known in the art, and may include using known computational techniques to distill MS data such as Mascot Distiller, Rosetta Elucidator, and MaxQuant. In some embodiments, MS data is derived using Rosetta Elucidator. In other embodiments, MS data is derived using MaxQuant. In preferred embodiments, MS data is derived using Mascot Distiller.
Applications of the invention may be as described in Section I(f) above.
In other aspects, the present invention provides kits for isolating and identifying proteins specifically associated with a chromatin. The kits may comprise, for example, a growth medium comprising a metabolic label, or a metabolic label that may be added to a growth medium, and cells comprising a tagged target chromatin, and instructions describing a method of the invention. A kit may further comprise material necessary for affinity purification of a tagged target chromatin, and a sample comprising metabolically labeled and unlabeled non-specifically associated proteins for determination of a baseline for non-specifically associated proteins. A kit my also comprise material necessary for affinity purification of a tagged target chromatin, and instructions describing a method of the invention.
Cells, and methods of the invention may be as described in Section I and Section II above.
The following examples are included to demonstrate preferred embodiments of the invention. It should be appreciated by those of skill in the art that the techniques disclosed in the examples that follow represent techniques discovered by the inventors to function well in the practice of the invention, and thus can be considered to constitute preferred modes for its practice. However, those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments which are disclosed and still obtain a like or similar result without departing from the spirit and scope of the invention.
It has long been appreciated that chromatin-associated proteins and epigenetic factors play central roles in cell-fate reprogramming of genotypically identical stem cells through lineage-specific transcription or repression of precise genes and large chromosomal regions (Martin, 1981; Ho and Crabtree, 2010; Rossant, 2008). However, the hierarchy of chromatin-templated events orchestrating the formation and inheritance of different epigenetic states remains poorly understood at a molecular level. Since misregulation of chromatin structure and post-translational modification of histones (PTMs) is linked to cancer and other epigenetic diseases (Jones and Baylin, 2007; Chi et al., 2010), it is imperative to establish new methodologies that will allow comprehensive studies and unbiased screens for participants in epigenetic mechanisms. Unfortunately, defining how chromatin regulators collectively assemble and operate on a precise region of the genome is difficult to elucidate; there are no current methodologies that allow for determination of all proteins present at a defined, small region of chromatin.
Technical challenges have precluded the ability to determine positioning of chromatin factors along the chromosome. Chromatin immunoprecipitation (ChIP) assays have been used to better understand genome-wide distribution of proteins and histone modifications within a genome at the nucleosome level (Dedon at al., 1991; Ren et al., 2000; Pokholok et al., 2005; Robertson et al., 2007; Johnson et al., 2007; Barski at al., 2007; Mikkelsen et al., 2007). However, major drawbacks of ChIP-based chromatin enrichment methods include experiments that are largely confined to examining singular histone PTMs or proteins rather than simultaneous profiling of multiple targets, the inability to determine the co-occupancy of particular histone PTMs, and that ChIP is reliant on the previous identification of the molecular target. Affinity purification approaches have been devised for the isolation of a chromatin region (Griesenbeck et al., 2003; Agelopoulos et al., 2012); however, these approaches were not done at a level for proteomic analysis and they do not provide a mechanism for determining the specificity of protein interactions. More recently, groups biochemically enriching for intact chromatin have reported characterization of proteins associated with large chromatin structures such as telomeres (Dejardin and Kingston, 2009) and engineered plasmids (Akiyoshi et al., 2009; Unnikrishnan et al., 2010); however, these approaches do not enrich for a small integrated genomic locus and do not employ specialized mass spectrometric techniques to detect protein contamination in purified material.
We sought to compare differences in chromatin between the transcriptionally active and silent states of a single genomic locus, and developed a technology, called chromatin affinity purification with mass spectrometry (ChAP-MS). ChAP-MS provides for the site-specific enrichment of a given ˜1,000 base pair section of a chromosome followed by unambiguous identification of both proteins and histone PTMs associated with this chromosome section using highly selective mass spectrometry. Using ChAP-MS, we were able to purify chromatin at the Saccharomyces cerevisiae GAL.1 locus in transcriptionally silent and active states. We identified proteins and combinatorial histone PTMs unique to each of these functional states and validated these findings with ChIP. The ChAP-MS technique will greatly improve the field of epigenomics as an unbiased approach to study regulatory mechanisms on chromatin.
The GAL1 gene is present at one copy per haploid cell; due to the relative low abundance of the targeted chromatin region in cellular lysates, it was fully anticipated that proteins nonspecifically associating with GAL1 chromatin would complicate analysis of the resulting purified material. Copurification of nonspecifically associating proteins is one of the major complications of affinity purifications; however, isotopic labeling of media provides a means to gauge in vivo protein-protein interactions and quantitate differences in peptide abundance (Smart et al., 2009; Tackett et al., 2005a). The inventors had previously developed a variation of this labeling technique called iDIRT (isotopic differentiation of interactions as random or targeted) that provides a solution for determining which coenriched proteins are specifically or nonspecifically associated with a complex of proteins (Smart et al., 2009; Tackett et al., 2005a). The iDIRT technique was adapted (as described in
To provide for enrichment of a specific chromosome section, a DNA affinity handle was engineered at the GAL1 gene in S. cerevisiae (
To determine the effectiveness of isolation of GAL1 chromatin, the stringency and specificity of different purification conditions was analyzed. Purification of protein complexes under increasing stringencies such as high salt levels provides for the isolation of fewer nonspecifically interacting proteins (Smart et al., 2009; Taverna et al., 2006). Since the proteins purified with GAL1 chromatin will be chemically crosslinked, the stringency of the purification can potentially be quite high. Indeed, ChIP-qPCR against GAL1 showed that the PrA-based purification can survive relatively stringent conditions (
Strain LEXA::GAL1 pLexA-PrA was subjected to the ChAP-MS procedure as outlined in
Once proteins were identified, a baseline was established for nonspecifically associated proteins in accordance to the iDIRT approach (Smart et al., 2009). Nonspecifically enriching ribosomal proteins were used to establish the nonspecifically associating baseline (Smart et al., 2009). The average percent isotopically light peptides from 20 ribosomal proteins from the glucose and galactose growth conditions were used to establish this nonspecifically associating baseline (Table 3). This resulted in a nonspecifically associating baseline of 49.93%±2.12% light for the glucose ChAP-MS and 66.8%±7.1% light for the galactose ChAP-MS (
The ChAP-MS analyses of GAL1 chromatin revealed association of Gal3, Spt16, Rpb1, Rpb2, H3K14ac, H3K9acK14ac, H3K18acK23ac, H4K5acK8ac, and H4K12acK16ac under transcriptionally active conditions, while transcriptionally repressive conditions showed the enrichment of H3K36me3. In order to validate the ChAP-MS approach, standard ChIP was performed to specific interactions detected in the transcriptionally active and silent chromatin state at GAL1 (
The chromatin biology and epigenomics research communities have been limited to biased technologies that restrict targeted genome localization studies to previously identified proteins or histone PTMs. Here, a newly developed technology, called ChAP-MS, is described that circumvents this limitation by providing for isolation of a ˜1,000 base pair section of a chromosome for proteomic identification of specifically bound proteins and PTMs. In essence, the ChAP-MS approach allows one to take a “molecular snapshot” of chromatin dynamics at a specific genomic locus. Furthermore, employing this approach to target other chromatin regions will likely provide unprecedented insight on a variety of epigenetic regulatory mechanisms, chromatin structure, and genome metabolism.
The ChAP-MS approach was validated on the well-studied GAL1 locus in S. cerevisiae. The GAL1 gene is activated for gene transcription in the presence of galactose, while glucose represses transcription. Accordingly, it was rationalized that a purified ˜1,000 base pair section of chromatin at the 5′ end of the GAL1 gene from cells grown in galactose would contain histone PTMs correlated with active transcription and cellular machinery necessary for transcription, while the same chromatin section from cells grown in glucose would be enriched with histone PTMs associated with transcriptional repression. Prior publications have documented that H3 acetylation is enriched on the 5′ end of the active galactose-induced GAL1 gene, while in the presence of glucose it contains H3K36me3 (Shukla et al., 2006; Houseley et al., 2008). Results presented in the Examples herein with ChAP-MS, support each of these prior findings (
The ChAP-MS technology presented here demonstrates the ability to purify a unique chromosome section on the order of four to five nucleosomes in length from an in vivo source that can subsequently be subjected to sensitive proteomic studies. ChAP-MS has numerous advantages relative to traditional ChIP, including the ability to unbiasedly detect proteins/PTMs at a specific genomic locus and the identification of combinatorial histone modifications on a single histone molecule. Furthermore, ChAP-MS only requires approximately an order of magnitude more cells relative to biased ChIP studies, which is a huge advantage if doing more than ten blind ChIP studies at a given region is factored in (chances are many antibodies for many proteins would be heavily invested in, trying to guess a specifically bound protein/PTM). In this regard, ChAP-MS is a more cost-effective option for characterizing specifically bound proteins and histone PTMs relative to ChIP. Future derivations of this technology may employ targeted mass spectrometric approaches for better determination of combinatorial histone PTMs as well as identification of other regulatory PTMs on nonhistone proteins from these isolated sections (Taverna et al., 2007). Given the sensitivity of the mass spectrometry analysis employed and the relatively modest biological starting material, the findings presented in the Examples herein also establish a framework for applying ChAP-MS to profile across entire regions of chromosomes or investigate higher eukaryotic systems. Regardless, any advances that permit ChAP-MS analysis of in vivo untagged or unaltered samples, like tissues, will undoubtedly have valuable applications for investigating altered gene transcription mechanisms in human disease states, as this technique could provide a comprehensive way to intelligently identify targets for therapeutics.
Construction of the LEXA::GAL 1 pLexA-PrA Strain
The LEXA::GAL1 pLexA-PrA strain used to affinity enrich GAL1 chromatin was designed to have a LexA DNA binding site just upstream of the GAL1 start codon and contains a plasmid constitutively expressing a LexA-PrA fusion protein. In S. cerevisiae from the W303a background, the GAL1 gene was genomically replaced with URA3 using homologous recombination. Next, the GAL1 gene (+50 base pairs up- and downstream) was PCR amplified with primers that incorporated a LexA DNA binding site (5′-CACTTGATACTGTATGAGCATACAGTATAATTGC) immediately upstream of the GAL1 start codon. This LEXA::GAL1 cassette was transformed into the gal1::URA3 strain and selected for growth with 5-fluoroorotic acid, which is lethal in URA3 expressing cells. Positive transformants were sequenced to ensure homologous recombination of the cassette to create the LEXA::GAL1 strain. A plasmid that constitutively expresses LexA-PrA fusion protein with TRP selection was created by amplification of the PrA sequence from template pOM60 via PCR and subcloning into the Sac1/Sma1 ends of the expression plasmid pLexA-C. Transforming this plasmid into the LEXA:GAL1 strain gave rise to the LEXA:GAL1 pLexA-PrA strain. Additionally, a control used in these studies was W303a S. cerevisiae transformed only with pLexA-PrA.
Strains LEXA:GAL1 pLexA-PrA and pLexA-PrA were grown in yeast synthetic media lacking tryptophan to mid-log phase at 30° C. LEXA:GAL1 pLexA-PrA strain growths were done with isotopically light lysine, while strain pLexA-PrA was cultured exclusively with isotopically heavy 13C615N2-lysine. For each strain, 12 l of media containing either 2% glucose or 3% galactose were grown to yield ˜5×1011 cells per growth condition. At mid-log phase, the cultures were crosslinked with 1.25% formaldehyde for 5 min at room temperature and then quenched with 125 mM glycine for 5 min at room temperature. Cells were harvested by centrifugation (2,500×g) and frozen in liquid nitrogen as pellets in suspension with 20 mM HEPES (pH 7.4), 1.2% polyvinylpyrrolidone (1 ml/10 g of cell pellet). Frozen cell pellets were mixed as follows at 1:1 cell weight ratios: (1) LEXA:GAL1 pLexA-PrA isotopically light in glucose plus pLexA-PrA isotopically heavy control in glucose (2) LEXA:GAL1 pLexA-PrA isotopically light in galactose plus pLexA-PrA isotopically heavy control in galactose. Cell mixtures were cryogenically lysed under liquid nitrogen temperature with a Retsch MM301 ball mill (Smart et al., 2009; Tackett et al., 2005a).
Each of the following two cell lysates were processed for purification of GAL1 chromatin: (1) LEXA:GAL1 pLexA-PrA isotopically light in glucose plus pLexA-PrA isotopically heavy control in glucose, referred to as the glucose ChAP-MS, and (2) LEXA:GAL1 pLexA-PrA isotopically light in galactose plus pLexA-PrA isotopically heavy control in galactose, referred to as the galactose ChAP-MS. Twenty grams of frozen cell lysate (˜5×1011 cells) was used for each of the glucose and galactose ChAP-MS analyses. ChAP-MS steps were performed at 4° C. unless otherwise noted. Lysates were resuspended in 20 mM HEPES (pH 7.4), 1 M NaCl, 2 mM MgCl2, 1 M urea, 0.1% Tween 20, and 1% Sigma fungal protease inhibitor cocktail with 5 ml buffer per gram of frozen lysate. Lysates were subjected to sonication with a Diagenode Bioruptor UCD-200 (low setting, 30 s on/off cycle, 12 min total time) in 20 ml aliquots to yield ˜1 kb chromatin fragments. Supernatants from sonicated lysates were collected by centrifugation at 2,000×g for 10 min. Dynabeads (80 mg) coated with rabbit IgG were added to the lysates and incubated for 4 hr with constant agitation (Byrum et al., 2012a). Dynabeads were collected with a magnet and washed 5 times with the purification buffer listed above and 3 times with 20 mM HEPES (pH 7.4), 2 mM MgCl2, 10 mM NaCl, 0.1% Tween 20. Washed Dynabeads were treated with 0.5 N ammonium hydroxide/0.5 mM EDTA for 5 min at room temperature to elute proteins. Eluants were lyophilized with a Savant SpeedVac Concentrator. Lyophilized proteins were resuspended in Laemmli SDS-PAGE loading buffer, heated to 95° C. for 20 min, resolved with 4%-20% tris-glycine Invitrogen precast gels, and visualized by colloidal Coomassie staining.
Gel lanes were sliced into 2 mm sections and subjected to in-gel trypsin digestion (Byrum et al., 2011a, Byrum et al., 2011b, Byrum et al., 2012a; Tackett et al., 2005b). Peptides were analyzed with a Thermo Velos Orbitrap mass spectrometer coupled to a Waters nanoACQUITY liquid chromatography system (Byrum et al., 2011b). Using a data-dependent mode, the most abundant 15 peaks were selected for MS2 from a high-resolution MS scan. Proteins were identified and the ratio of isotopically light/heavy lysine-containing tryptic peptide intensity was determined with Mascot and Mascot Distiller. The search parameters included: precursor ion tolerance 10 ppm, fragment ion tolerance 0.65 Da, fixed modification of carbamidomethyl on cysteine, variable modification of oxidation on methionine, and two missed cleavages possible with trypsin. A threshold of 95% confidence for protein identification, 50% confidence for peptide identification and at least two identified peptides per protein was used, which gave a 2% peptide false discovery rate. All specifically associating protein identifications and ratios were manually validated.
A baseline was established for nonspecifically associated proteins with nonspecifically enriched ribosomal proteins (Smart et al., 2009). The average percent isotopically light peptides from 20 ribosomal proteins from the glucose and galactose growth condition were used to establish this nonspecifically associated baseline. This resulted in a nonspecifically associated baseline of 49.93%±2.12% light for the glucose ChAP-MS and 66.8%±7.1% light for the galactose CHAP-MS. Proteins were categorized as specifically associating if the percent light was greater than 2 SDs above the ribosomal level (Tables 4 and 5) (Smart et al., 2009). Duplicate ChAP-MS procedures showed Pearson and Spearman correlation coefficient p values of <0.001.
ChIP and gene transcription assays were performed as previously reported (Tackett et al., 2005b; Taverna et al., 2006). Assays were performed in triplicate and analyzed by real time PCR.
One of the most compositionally diverse structures in a eukaryotic cell is a chromosome. A multitude of macromolecular protein interactions and epigenetic modifications must properly occur on chromatin to drive functional aspects of chromosome biology like gene transcription, DNA replication, recombination, repair and sister chromatid segregation. Analyzing how proteins interact in vivo with chromatin to direct these activities and how epigenetics factors into these mechanisms remains a significant challenge owing to the lack of technologies to comprehensively analyze protein associations and epigenetics at specific native chromosome sites. Chromatin immunoprecipitation (ChIP) assays have traditionally been used to better understand genome-wide distributions of chromatin-associated proteins and histone post-translational modifications (PTMs) at the nucleosome level (Cermak et al., 2011). However, major drawbacks of current ChIP-based methods include their confinement to examining singular histone PTMs or proteins rather than simultaneous profiling of multiple targets, the inability of ChIP to directly determine the co-occupancy of particular histone PTMs and that ChIP is reliant on the previous identification and development of affinity reagents against the molecular target. A more comprehensive and unbiased approach would be the biochemical isolation of a specific native genomic locus for proteomic identification of proteins-associated and histone PTMs. Similar approaches have been performed for large structures like telomeres, engineered plasmids or engineered loci (Griesenbeck et al., 2003; Dejardin and Kingston, 2009; Hoshino and Fuji, 2009; Akiyoshi et al., 2009; Unnikrishnan et al., 2010; Byrum et al., 2012b); however, the proteomic analysis of a small native genomic region without genomic engineering has yet to be performed. To work toward proteomic studies of native chromatin regions (i.e. sections of chromatin that are unaltered genetically and spatially the genome), we recently developed a technique termed Chromatin Affinity Purification with Mass Spectrometry (ChAP-MS) that provides for the enrichment of a native 1-kb section of a chromosome for site-specific identification of protein interactions and associated histone PTMs (Byrum et al., 2012b). This ChAP-MS approach uses the association of an ectopically expressed affinity-tagged LexA protein with a genomically incorporated LexA DNA binding site for site-specific chromatin enrichment. The ChAP-MS approach provides for the isolation of chromatin from the native site in the chromosome; however, one must genomically engineer a LexA DNA binding site, which could alter the native state of the chromatin and which requires a biological system readily amendable to genomic engineering.
To alleviate genomic engineering for affinity enrichment of chromatin sections, we report the use of modified transcription activator-like (TAL) effector proteins to site-specifically target a native section of a chromosome for purification and proteomic analysis. We term this approach TAL-ChAP-MS (
One of the major complications for studying specific protein associations with purified protein complexes or with chromatin is the co-enrichment of non-specifically associating proteins. This particularly becomes an issue when studying low copy number entities such as a single genomic locus. With the advancement of high-resolution and sensitive mass spectrometry in recent years, it has been suggested that >109 cell equivalents are needed to study single genomic loci with proteomic approaches (Chait, 2011). In agreement, our ChAP-MS studies used 1011 cells for isolation of GAL1 promoter chromatin at levels sufficient for proteomic analysis (Byrum et al., 2012b). When scaling up purifications of low copy entities to meet the sensitivity necessary for high-resolution mass spectrometric analysis, the issue of co-purifying abundant non-specific proteins becomes a major challenge. In the ChAP-MS approach (Byrum et al., 2012b), we used an isotope-labeling strategy to categorize whether a protein co-enriching with a section of chromatin was specifically associated or a contaminant. Limitations for isotope-labeling approaches are cost and having biological systems of study that are amendable to stable isotope-labeling with amino acids. To circumvent the use of isotope-labeling, we now have incorporated label-free quantitative mass spectrometry in the TAL-ChAP-MS workflow. The described TAL-ChAP-MS approach can therefore provide for the purification of a native chromatin region for label-free quantitative proteomic analysis, which will greatly simplify studies of how proteins and combinatorial histone PTMs regulate chromosome metabolism.
A schematic of the TAL-ChAP-MS approach to purify native chromatin for proteomic analysis is shown in
Saccharomyces cerevisiae cells were transformed with pTAL-PrA, and protein expression was validated by western blotting (
As detailed in the Experimental Procedures for Examples 4-5 section, chromatin from the transcriptionally active GAL1 promoter was enriched with TAL-PrA and resolved by SDS-PAGE (
In addition to protein associations with the GAL1 promoter, the following single histone PTMs were identified under transcriptionally active conditions: H3K14ac, H3K56ac, H3K79me1/me2/me3, H2BK 17ac and H2AK7ac; and the following combinatorial histone PTMs: H3K9acK14ac, H3K18acK23ac, H2BK6acK11ac and H2BK11acK17ac (
We describe a novel approach called TAL-ChAP-MS that provides for the biochemical isolation of 1-kb native chromatin sections for proteomic identification of specifically associated proteins and combinatorial histone PTMs. The described TAL-ChAP-MS approach overcomes limitations of the ChAP-MS approach (Byrum et al., 2012b), as genomic engineering is not necessary for TAL-based affinity enrichment and because protein enrichment with a given locus can now be determined with label-free proteomics. Even without genomic engineering of the DNA, the ChAP-MS approach does require targeting of a DNA-binding affinity enrichment reagent (i.e. the TAL protein), which has the potential to perturb the chromatin state. However, the data in
pTAL-PrA Plasmid, Real-Time rtPCR and ChIP
For affinity enrichment of chromatin from the promoter region of the GAL1 gene in S. cerevisiae, a TAL protein was designed (by the GeneArt Precision TAL services of Life Technologies) to bind a unique 18-nt sequence (GGGGTAATTAATCAGCGA) 193 base pairs upstream of the GAL1 open-reading frame (
To test the TAL-ChAP-MS approach at the promoter region of GAL1, wild-type and wild-type (+pTAL-PrA) S. cerevisiae (W303 matA) cells were cultured to mid-log phase in 3% galactose-containing media, subjected to 1.25% formaldehyde cross-linking, cryogenically lysed and subjected to sonication to shear genomic DNA to ˜1 kb [as detailed in (Byrum et al., 2012b; Byrum et al., 2011a; Byrum et al., 2011b)]. Immunoglobulin G (IgG)-coated Dynabeads were added to lyste from ˜1011 cells from each growth separately [as detailed in (Byrum et al., 2012b)]. Proteins co-enriching with the TAL-PrA (wild-type cells +pTAL-PrA lysate) or proteins non-specifically binding to the Dynabeads (wild-type cell lysate) were resolved by SDS-PAGE/Coomassie-staining (
cerevisiae GN = PDC1 PE = 1 SV = 7
cerevisiae GN = FBA1 PE = 1 SV = 3
cerevisiae GN = SSE1 PE = 1 SV = 4
cerevisiae GN = PGI1 PE = 1 SV = 3
cerevisiae GN = FPP1 PE = 1 SV = 2
cerevisiae GN = PDC6 PE = 1 SV = 3
cerevisiae (strain YJM789) GN = TIF1 PE = 3 SV = 1
cerevisiae GN = RPL21A PE = 1 SV = 1
cerevisiae GN = PMA1 PE = 1 SV = 2
cerevisiae GN = RLI1 PE = 1 SV = 1
cerevisiae GN = FAS2 PE = 1 SV = 2
cerevisiae GN = GPP1 PE = 1 SV = 3
cerevisiae GN = HSP60 PE = 1 SV = 1
cerevisiae GN = SES1 PE = 1 SV = 2
cerevisiae GN = TEF4 PE = 1 SV = 1
cerevisiae GN = VMA2 PE = 1 SV = 2
cerevisiae GN = NAP1 PE = 1 SV = 2
cerevisiae GN = CPR1 PE = 1 SV = 3
cerevisiae GN = RPL11A PE = 1 SV = 2
cerevisiae GN = RFA3 PE = 1 SV = 1
cerevisiae GN = VMA5 PE = 1 SV = 4
cerevisiae GN = CDC48 PE = 1 SV = 3
cerevisiae GN = PRE6 PE = 1 SV = 1
cerevisiae GN = PFK1 PE = 1 SV = 1
cerevisiae GN = TRP2 PE = 1 SV = 4
cerevisiae GN = VAS1 PE = 1 SV = 2
cerevisiae GN = RPS17A PE = 1 SV = 1
cerevisiae GN = RPL33A PE = 1 SV = 3
cerevisiae GN = ERG13 PE = 1 SV = 1
cerevisiae GN = RPP0 PE = 1 SV = 1
cerevisiae GN = RPL33B PE = 1 SV = 2
cerevisiae GN = RPS27A PE = 1 SV = 1
cerevisiae GN = MBF1 PE = 1 SV = 2
cerevisiae GN = RPS26B PE = 1 SV = 1
cerevisiae GN = ACS2 PE = 1 SV = 1
cerevisiae GN = YKR011C PE = 1 SV = 2
cerevisiae (strain YJM789) GN = DBP1 PE = 3 SV = 1
cerevisiae GN = SPO75 PE = 1 SV = 1
cerevisiae GN = RPL16B PE = 1 SV = 3
cerevisiae GN = CDC37 PE = 1 SV = 2
cerevisiae GN = FAS1 PE = 1 SV = 2
cerevisiae GN = TOR1 PE = 1 SV = 3
cerevisiae GN = RPS22A PE = 1 SV = 2
cerevisiae GN = ADY4 PE = 1 SV = 1
cerevisiae GN = SEC18 PE = 1 SV = 2
cerevisiae GN = SEC23 PE = 1 SV = 1
cerevisiae GN = IRA1 PE = 1 SV = 2
cerevisiae GN = SSA1 PE = 1 SV = 4
cerevisiae GN = PDC1 PE = 1 SV = 7
cerevisiae GN = SSE1 PE = 1 SV = 4
cerevisiae GN = GAL10 PE = 1 SV = 2
cerevisiae GN = SSB1 PE = 1 SV = 3
cerevisiae GN = SES1 PE = 1 SV = 2
cerevisiae GN = SSA2 PE = 1 SV = 3
cerevisiae GN = VAS1 PE = 1 SV = 2
cerevisiae GN = FBA1 PE = 1 SV = 3
cerevisiae GN = FAS1 PE = 1 SV = 2
cerevisiae GN = IPP1 PE = 1 SV = 4
cerevisiae GN = UBA1 PE = 1 SV = 2
cerevisiae GN = RPS20 PE = 1 SV = 3
cerevisiae GN = ARG1 PE = 1 SV = 2
cerevisiae GN = RPL4A PE = 1 SV = 4
cerevisiae GN = SEC23 PE = 1 SV = 1
cerevisiae GN = LEU1 PE = 1 SV = 3
cerevisiae GN = RPL33A PE = 1 SV = 3
cerevisiae GN = TEF4 PE = 1 SV = 1
cerevisiae GN = ACS2 PE = 1 SV = 1
cerevisiae GN = YNK1 PE = 1 SV = 1
cerevisiae GN = PGI1 PE = 1 SV = 3
cerevisiae GN = RPL30 PE = 1 SV = 3
cerevisiae GN = HSP12 PE = 1 SV = 1
cerevisiae GN = ERG6 PE = 1 SV = 4
cerevisiae GN = FPP1 PE = 1 SV = 2
cerevisiae GN = CDC48 PE = 1 SV = 3
cerevisiae GN = TPI1 PE = 1 SV = 2
cerevisiae GN = RPL11A PE = 1 SV = 2
cerevisiae GN = TRP2 PE = 1 SV = 4
cerevisiae GN = RPL27A PE = 1 SV = 1
cerevisiae (strain YJM789) GN = SUB2 PE = 3 SV = 1
cerevisiae GN = YKL215C PE = 1 SV = 2
cerevisiae GN = BFR1 PE = 1 SV = 1
cerevisiae GN = RPP0 PE = 1 SV = 1
cerevisiae GN = VMA2 PE = 1 SV = 2
cerevisiae GN = CYS4 PE = 1 SV = 1
cerevisiae GN = PBI2 PE = 1 SV = 3
cerevisiae GN = VMA4 PE = 1 SV = 4
cerevisiae GN = SSA3 PE = 1 SV = 3
cerevisiae GN = CCT8 PE = 1 SV = 1
cerevisiae GN = RPS26B PE = 1 SV = 1
cerevisiae GN = NEW1 PE = 1 SV = 1
cerevisiae GN = RPS14A PE = 1 SV = 5
cerevisiae GN = RLI1 PE = 1 SV = 1
cerevisiae GN = PRE9 PE = 1 SV = 1
cerevisiae GN = RPL8A PE = 1 SV = 4
cerevisiae GN = SAM2 PE = 1 SV = 3
cerevisiae GN = WTM1 PE = 1 SV = 1
cerevisiae GN = YKL054C PE = 1 SV = 1
cerevisiae GN = RPL22A PE = 1 SV = 3
cerevisiae GN = SMX2 PE = 1 SV = 1
cerevisiae GN = RPS9B PE = 1 SV = 4
cerevisiae GN = RPS15 PE = 1 SV = 1
cerevisiae GN = RPS11A PE = 1 SV = 3
cerevisiae GN = GPM1 PE = 1 SV = 3
cerevisiae GN = DUG1 PE = 1 SV = 1
cerevisiae GN = RPS25B PE = 1 SV = 1
cerevisiae GN = SEC24 PE = 1 SV = 1
cerevisiae GN = RPS22A PE = 1 SV = 2
cerevisiae (strain YJM789) GN = SCY_2952 PE = 3 SV = 1
cerevisiae GN = SEC21 PE = 1 SV = 2
cerevisiae (strain RM11-1a) GN = RPS1A PE = 3 SV = 1
cerevisiae GN = ARG5, 6 PE = 1 SV = 1
cerevisiae GN = ECM29 PE = 1 SV = 1
cerevisiae GN = PRK1 PE = 1 SV = 1
cerevisiae GN = RPL25 PE = 1 SV = 4
cerevisiae GN = MFB1 PE = 1 SV = 1
cerevisiae GN = FAS2 PE = 1 SV = 2
cerevisiae GN = YBT1 PE = 1 SV = 2
cerevisiae GN = GLN4 PE = 1 SV = 2
cerevisiae GN = GDB1 PE = 1 SV = 1
cerevisiae GN = SRV2 PE = 1 SV = 1
cerevisiae GN = YGL117W PE = 1 SV = 1
cerevisiae
TTATACATTAATCAGCGA
AAATTAATCAGCGGTGAC
cerevisiae GN = LEU9 PE = SV = 1
cerevisiae GN = RIB4 PE = 1 SV = 2
cerevisiae GN = TRP1 PE = 1 SV = 2
cerevisiae GN = NSR1 PE = 1 SV = 1
cerevisiae GN = YGR054W PE = 1 SV = 1
cerevisiae (strain YJM789) GN = TOM70 PE = 3 SV = 1
cerevisiae GN = CBF5 PE = 1 SV = 1
cerevisiae GN = PRT1 PE = 1 SV = 1
cerevisiae GN = MES1 PE = 1 SV = 4
cerevisiae GN = TIF35 PE = 1 SV = 1
cerevisiae GN = TUP1 PE = 1 SV-2
cerevisiae GN = RPT6 PE = 1 SV = 4
cerevisiae GN = RPN2 PE = 1 SV = 4
cerevisiae GN = GDH3 PE = 1 SV = 1
cerevisiae GN = TIF3 PE = 1 SV = 1
cerevisiae GN = PHO88 PE = 1 SV = 1
cerevisiae GN = RPN9 PE = 1 SV = 1
cerevisiae GN = CBP6 PE = 1 SV = 1
cerevisiae GN = TIF32 PE = 1 SV = 1
cerevisiae GN = RPA1 PE = 1 SV = 2
cerevisiae GN = PEP1 PE = 1 SV = 1
cerevisiae GN = OM45 PE = 1 SV = 2
cerevisiae GN = TOM5 PE = 1 SV = 1
cerevisiae GN = TIF11 PE = 1 SV = 1
cerevisiae GN = HFD1 PE = 1 SV = 1
cerevisiae GN = PEM2 PE = 1 SV = 1
cerevisiae GN = ERV25 PE = 1 SV = 1
cerevisiae GN = DLD3 PE = 1 SV = 1
cerevisiae GN = DLD3 PE = 1 SV = 1
cerevisiae GN = RPT3 PE = 1 SV = 1
cerevisiae GN = SNU13 PE = 1 SV = 1
cerevisiae GN = TPD3 PE = 1 SV = 2
cerevisiae GN = GAL7 PE = 1 SV = 4
cerevisiae GN = PDR16 PE = 1 SV = 1
cerevisiae GN = YNL010W PE = 1 SV = 1
cerevisiae GN = MNP1 PE = 1 SV = 1
cerevisiae GN = TIF4631 PE = 1 SV = 2
cerevisiae GN = ARB1 PE = 1 SV = 1
cerevisiae GN = RPN8 PE = 1 SV = 3
cerevisiae GN = EMC1 PE = 1 SV = 1
cerevisiae GN = OLA1 PE = 1 SV = 1
cerevisiae GN = PPZ2 PE = 1 SV = 4
cerevisiae (strain YJM789) GN = TMA22 PE = 3 SV = 1
cerevisiae GN = ARC18 PE = 1 SV = 1
cerevisiae GN = ILS1 PE = 1 SV = 1
cerevisiae (strain YJM789) GN = TIF34 PE = 3 SV = 1
cerevisiae GN = DPM1 PE = 1 SV = 3
cerevisiae GN = YET1 PE = 1 SV = 1
cerevisiae GN = YIL005W PE = 1 SV = 1
cerevisiae GN = YDJ1 PE = 1 SV = 1
cerevisiae GN = STO1 PE = 1 SV = 2
cerevisiae GN = UTP22 PE = 1 SV = 1
cerevisiae GN = GAS3 PE = 1 SV = 1
cerevisiae GN = RPA2 PE = 1 SV = 1
cerevisiae GN = YML6 PE = 1 SV = 1
cerevisiae GN = MRPL3 PE = 1 SV = 2
cerevisiae GN = NIP7 PE = 1 SV = 1
cerevisiae GN = HYP2 PE = 1 SV = 3
cerevisiae GN = YOR285W PE = 1 SV = 1
cerevisiae GN = PAT1 PE = 1 SV = 3
cerevisiae GN = YET3 PE = 1 SV = 1
cerevisiae GN = SNF1 PE = 1 SV = 1
cerevisiae GN = SNF1 PE = 1 SV = 1
cerevisiae GN = NCL1 PE = 1 SV = 1
cerevisiae GN = LYS2 PE = 1 SV = 2
cerevisiae GN = SHM1 PE = 1 SV = 2
cerevisiae GN = GLC7 PE = 1 SV = 1
cerevisiae GN = GPD1 PE = 1 SV = 4
cerevisiae GN = GPT2 PE = 1 SV = 1
cerevisiae GN = NOP2 PE = 1 SV = 1
cerevisiae GN = YPK2 PE = 1 SV = 1
cerevisiae GN = RVS161 PE = 1 SV = 1
cerevisiae GN = ARC15 PE = 1 SV = 1
cerevisiae GN = GPG1 PE = 1 SV = 1
cerevisiae GN = POR1 PE = 1 SV = 4
cerevisiae GN = ILV3 PE = 1 SV = 2
cerevisiae (strain YJM789) GN = IML2 PE = 3 SV = 1
cerevisiae GN = PRO2 PE = 1 SV = 1
cerevisiae GN = MRPL1 PE = 1 SV = 1
cerevisiae GN = NAB3 PE = 1 SV = 1
cerevisiae GN = THI3 PE = 1 SV = 1
cerevisiae GN = YIL108W PE = 1 SV = 1
cerevisiae GN = RPN13 PE = 1 SV = 1
cerevisiae GN = SER3 PE = 1 SV = 1
cerevisiae GN = RPN3 PE = 1 SV = 4
cerevisiae GN = PEX11 PE = 1 SV = 2
cerevisiae GN = PRS5 PE = 1 SV = 1
cereviaiae (strain YJM789) GN = LSM6 PE = 3 SV = 1
cerevisiae GN = RPT2 PE = 1 SV = 3
cerevisiae GN = UPT21 PE = 1 SV = 1
cerevisiae GN = POL1 PE = 1 SV = 2
cerevisiae GN = COX15 PE = 1 SV = 1
cerevisiae GN = LSM5 PE = 1 SV = 1
cerevisiae GN = ZWF1 PE = 1 SV = 4
cerevisiae GN = RPN6 PE = 1 SV = 3
cerevisiae GN = UBP6 PE = 1 SV = 1
cerevisiae GN = PUF6 PE = 1 SV = 1
cerevisiae (strain YJM789) GN = OM14 PE = 3 SV = 1
cerevisiae GN = SRP72 PE = 1 SV = 2
cerevisiae GN = PEM1 PE = 1 SV = 1
cerevisiae GN = RPN12 PE = 1 SV = 3
cerevisiae GN = URB1 PE = 1 SV = 2
cerevisiae GN = MPM1 PE = 1 SV = 1
cerevisiae GN = CTR9 PE = 1 SV = 2
cerevisiae GN = SSZ1 PE = 1 SV = 2
cerevisiae GN = FET5 PE = 1 SV = 1
cerevisiae GN = ESS1 PE = 1 SV = 3
cerevisiae GN = YKL100C PE = 1 SV = 1
cerevisiae GN = YBL032W PE = 1 SV = 1
cerevisiae GN = TOM22 PE = 1 SV = 3
cerevisiae GN = EMC4 PE = 1 SV = 1
cerevisiae GN = MNN10 PE = 1 SV = 1
cerevisiae GN = SHM2 PE = 1 SV = 2
cerevisiae GN = MRP1 PE = 1 SV = 2
cerevisiae GN = TMA20 PE = 1 SV = 1
cerevisiae GN = RPB9 PE = 1 SV = 1
cerevisiae GN = NHP2 PE = 1 SV = 2
cerevisiae GN = SER33 PE = 1 SV = 1
cerevisiae GN = YML125C PE = 1 SV = 1
cerevisiae GN = ARC35 PE = 1 SV = 1
cerevisiae GN = COX14 PE = 1 SV = 1
cerevisiae GN = SRP14 PE = 1 SV = 1
cerevisiae GN = GUS1 PE = 1 SV = 3
cerevisiae GN = THS1 PE = 1 SV = 2
cerevisiae GN = TY2B-LR1 PE = 3 SV = 1
cerevisiae GN = PMA1 PE = 1 SV = 2
cerevisiae (strain RM11-1a) GN = RPS1B PE = 3 SV = 1
cerevisiae (strain RM11-1a) GN = RPS1A PE = 3 SV = 1
cerevisiae GN = ARG1 PE = 1 SV = 2
cerevisiae GN = PDC6 PE = 1 SV = 3
cerevisiae GN = BFR1 PE = SV = 1
cerevisiae GN = RPL18A PE = 1 SV = 1
cerevisiae GN = RPS2 PE = 1 SV = 3
cerevisiae GN = SPT16 PE = 1 SV = 1
cerevisiae GN = RPS8A PE = 1 SV = 3
cerevisiae GN = SEC23 PE = 1 SV = 1
cerevisiae GN = RRP5 PE = 1 SV = 1
cerevisiae (strain YJM789) GN = MCR1 PE = 2 SV = 1
cerevisiae GN = GLT1 PE = 1 SV = 2
cerevisiae GN = UGA1 PE = 1 SV = 2
cerevisiae GN = URA1 PE = 1 SV = 1
cerevisiae GN = RPL32 PE = 1 SV = 1
cerevisiae GN = RPL15A PE = 1 SV = 3
cerevisiae GN = RPL8A PE = 1 SV = 4
cerevisiae GN = RPL14B PE = 1 SV = 1
cerevisiae GN = ARO1 PE = 1 SV = 1
cerevisiae GN = RPL8B PE = 1 SV = 3
cerevisiae GN = RHO1 PE = 1 SV = 3
cerevisiae GN = PDR15 PE = 1 SV = 1
cerevisiae GN = RPL28 PE = 1 SV = 2
cerevisiae GN = RPL15B PE = 1 SV = 2
cerevisiae GN = RPS23A PE = 1 SV = 1
cerevisiae GN = CCT8 PE = 1 SV = 1
cerevisiae GN = YHR087W PE = 1 SV = 1
cerevisiae GN = YME2 PE = 1 SV = 1
cerevisiae GN = PAA1 PE = 1 SV = 1
cerevisiae GN = RPS22B PE = 1 SV = 3
cerevisiae GN = RPS26A PE = 1 SV = 1
cerevisiae GN = GCN1 PE = 1 SV = 1
cerevisiae GN = STT3 PE = 1 SV = 2
cerevisiae GN = ACB1 PE = 1 SV = 3
cerevisiae GN = ARG3 PE = 1 SV = 1
cerevisiae (strain YJM789) GN = YME2 PE = 3 SV = 1
cerevisiae GN = CCT4 PE = 1 SV = 2
cerevisiae GN = GAL10 PE = 1 SV = 2
cerevisiae GN = YBT1 PE = 1 SV = 2
cerevisiae GN = SEC21 PE = 1 SV = 2
cerevisiae GN = CDC53 PE = 1 SV = 1
cerevisiae GN = CCT6 PE = 1 SV = 1
cerevisiae GN = GSF2 PE = 1 SV = 1
cerevisiae GN = RPS29A PE = 1 SV = 3
cerevisiae GN = SEC24 PE = 1 SV = 1
cerevisiae GN = VMA6 PE = 1 SV = 2
cerevisiae GN = YJL171C PE = 1 SV = 1
cerevisiae GN = POB3 PE = 1 SV = 1
cerevisiae GN = YRA2 PE = 1 SV = 2
cerevisiae GN = TCP1 PE = 1 SV = 2
cerevisiae GN = ERG28 PE = 1 SV = 1
cerevisiae GN = TPA1 PE = 1 SV = 1
cerevisiae GN = CCT7 PE = 1 SV = 1
cerevisiae GN = CBR1 PE = 1 SV = 2
cerevisiae GN = GDB1 PE = 1 SV = 1
cerevisiae GN = KRE33 PE = 1 SV = 1
cerevisiae GN = RPL43A PE = 1 SV = 2
cerevisiae GN = RPT4 PE = 1 SV = 4
cerevisiae GN = RPL3 PR = 1 SV = 4
cerevisiae GN = CKA2 PE = 1 SV = 2
cereviaie GN = YNL247W PE = 1 SV = 1
cerevisiae (strain YJM789) GN = CBR1 PE = 2 SV = 2
cerevisiae GN = STH1 PE = 1 SV = 1
cerevisiae GN = PUF3 PE = 1 SV = 1
cerevisiae GN = AIP1 PE = 1 SV = 1
cerevisiae GN = ERG1 PE = 1 SV = 2
cerevisiae GN = SPO21 PE = 1 SV = 1
cerevisiae GN = CDC42 PE = 1 SV = 2
cerevisiae GN = RPL24A PE = 1 SV = 1
cerevisiae GN = RPL35A PE = 1 SV = 1
cerevisiae GN = MDJ1 PE = 1 SV = 1
cerevisiae GN = ACS2 PE = 1 SV = 1
cerevisiae GN = RPL24B PE = 1 SV = 1
cerevisiae GN = RPS29B PE = 1 SV = 3
cerevisiae GN = SEC72 PE = 1 SV = 3
cerevisiae GN = HEM15 PE = 1 SV = 1
cerevisiae GN = CBP3 PE = 1 SV = 1
cerevisiae GN = PRE9 PE = 1 SV = 1
cerevisiae GN = RPS268PE = SV = 1
cerevisiae GN = IKI3 PE = 1 SV = 1
cerevisiae GN = DNM1 PE = 1 SV = 1
cerevisiae GN = RHO3 PE = 1 SV = 2
cerevisiae GN = ERV29 PE = 1 SV = 1
cerevisiae GN = PRO3 PE = 1 SV = 1
cerevisiae GN = RPL34A PE = 1 SV = 1
cerevisiae GN = RPL19A PE = 1 SV = 5
cerevisiae GN = CDC10 PE = 1 SV = 1
cerevisiae GN = RSP5 PE = 1 SV = 1
cerevisiae GN = EXG1 PE = 1 SV = 1
cerevisiae GN = SEC61 PE = 1 SV = 1
cerevisiae GN = PRE8 PE = 1 SV = 1
cerevisiae GN = YCF1 PE = 1 SV = 2
cerevisiae GN = RNA1 PE = 1 SV = 2
cerevisiae GN = RPS10B PE = 1 SV = 1
cerevisiae GN = RPS10A PE = 1 SV = 1
cerevisiae GN = ACO1 PE = 1 SV = 2
cerevisiae GN = TUF1 PE = 1 SV = 1
cerevisiae GN = SMT3 PE = 1 SV = 1
cerevisiae GN = DPL1 PE = 1 SV = 1
cerevisiae GN = SSS1 PE = 1 SV = 2
cerevisiae GN = NCE102 PE = 1 SV = 1
cerevisiae GN = RPL37A PE = 1 SV = 2
cerevisiae GN = PUS1 PE = 1 SV = 1
cerevisiae GN = TRP2 PE = 1 SV = 4
cerevisiae GN = YPL260W PE = 1 SV = 1
cerevisiae GN = FAA1 PE = 1 SV = 1
cerevisiae GN = SAR1 PE = 1 SV = 1
cerevisiae GN = NAM7 PE = 1 SV = 1
cerevisiae GN = PRE2 PE = 1 SV = 3
cerevisiae GN = NOC2 PE = 1 SV = 2
cerevisiae GN = YKL054C PE = 1 SV = 1
cerevisiae GN = ADE4 PE = 1 SV = 1
cerevisiae GN = HCH1 PE = SV = 1
cerevisiae GN = NAP1 PE = 1 SV = 2
cerevisiae GN = SSH1 PE = 1 SV = 1
cerevisiae GN = CYC3 PE = 1 SV = 1
cerevisiae GN = RPL37B PE = 1 SV = 2
cerevisiae GN = YPL225W PE = 1 SV = 1
cerevisiae GN = STM1 PE = 1 SV = 3
cerevisiae GN = GPM2 PE = 1 SV = 1
cerevisiae GN = HAL2 PE = 1 SV = 1
cerevisiae GN = NEW1 PE = 1 SV = 1
cerevisiae GN = CCT2 PE = 1 SV = 1
cerevisiae GN = UTR2 PE = 1 SV = 3
cerevisiae GN = SEC28 PE = 1 SV = 3
cerevisiae GN = RPS28A PE = 1 SV = 1
cerevisiae GN = ADE12 PE = 1 SV = 3
cerevisiae GN = YBL036C PE = 1 SV = 1
cerevisiae GN = RPL29 PE = 1 SV = 3
cerevisiae GN = EMP24 PE = 1 SV = 1
cerevisiae GN = PRE10 PE = 1 SV = 2
cerevisiae GN = ASF1 PE = 1 SV = 1
cerevisiae GN = APL4 PE = 1 SV = 1
cerevisiae GN = SEC31 PE = 1 SV = 2
cerevisiae GN = CDS1 PE = 1 SV = 1
cerevisiae GN = ARC1 PE = 1 SV = 2
cerevisiae GN = TRA1 PE = 1 SV = 1
cerevisiae GN = RPL10 PE = 1 SV = 1
cerevisiae GN = SCO2 PE = 1 SV = 1
cerevisiae GN = IFA38 PE = 1 SV = 1
cerevisiae GN = ELO3 PE = 1 SV = 1
cerevisiae GN = NOG1 PE = 1 SV = 1
cereviiae GN = PMC1 PE = 1 SV = 1
cerevisiae GN = YMR027W PE = 1 SV = 1
cerevisiae GN = YJL217W PE = 1 SV = 1
cerevisiae GN = HXT5 PE = 1 SV = 1
cerevisiae GN = CYB2 PE = 1 SV = 1
cerevisiae GN = ALO1 PE = 1 SV = 1
cerevisiae GN = PTC3 PE = 1 SV = 3
cerevisiae GN = CKA1 PE = 1 SV = 1
cerevisiae GN = RPT5 PE = 1 SV = 3
cerevisiae GN = RTG2 PE = 1 SV = 2
cerevisiae GN = YDR476C PE = 1 SV = 1
cerevisiae GN = VMA9 PE = 1 SV = 1
cerevisiae GN = CDC28 PE = 1 SV = 1
cerevisiae GN = YPT31 PE = 1 SV = 3
cerevisiae GN = FPR3 PE = 1 SV = 2
cerevisiae GN = DAP2 PE = 2 SV = 2
cerevisiae GN = FAA3 PE = 1 SV = 1
cerevisiae GN = SEC13 PE = 1 SV = 1
cerevisiae GN = YMR178W PE = 1 SV = 1
cerevisiae GN = RPS25A PE = 1 SV = 1
cerevisiae GN = FPP1 PE = 1 SV = 2
cerevisiae GN = RPS3 PE = 1 SV = 5
cerevisiae GN = TEF4 PE = 1 SV = 1
This application is a US non-provisional that claims priority to U.S. provisional application 61/726,936 filed Nov. 15, 2012 and U.S. provisional application 61/875,969 filed Sep. 10, 2013, each of which is hereby incorporated by reference in its entirety.
This invention was made with government support under R01DA025755, F32GM093614, P20RR015569, P20RR016460, U54RR020839, and UL1 TR000039 awarded by the National Institutes of Health. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
61726936 | Nov 2012 | US | |
61875969 | Sep 2013 | US |