This application contains a Sequence Listing, which has been submitted electronically as an ASCII text file and is hereby incorporated by reference in its entirety. Said Sequence Listing, created on Aug/ 3, 2023, is named VOSSP0131US_ST25.txt and is 22,573 bytes in size.
The present invention relates to a method for monitoring the effect of an environmental factor on the proteome of a cell. Furthermore, cell populations are provided that comprise multiple cells each comprising an inserted tag sequence in an intron of said cell, wherein the tag is inserted in-frame with the preceding exonic sequence and wherein the intron into which the tag is inserted is different between cells.
While most currently available pharmacological agents, including small molecule pharmaceuticals or pharmacologically active biologics act as inhibitors of enzymes or as modulators of receptors and transporter, drugs may also exert other functions, like (but not limited to) the inhibition or induction of protein-protein interactions and the stabilization or degradation of target proteins.
Currently available methods for the unbiased discovery and for the elucidation of biological/pharmacological functions of bioactive compounds (“mechanisms of action”) and/or some of the known screening methods of substances for their potential use pharmaceuticals are based on the monitoring of the biological and/or pharmacological effects on proteomes and transcriptomes; see, inter alia, Rix (2009) Nat Chem Biol 5, 616-24; Martinez Molina (2013) Science 341, 84-7; Savitski (2014) Science 346, 1255784; Drewes (2015) Trends Biotechnol 36, 1275-1286; Huber (2015). Nat Methods 12, 1055-7; Subramanian (2017). Cell 171, 1437-1452 e17; or Lamb (2006) Science 313, 1929-35.
Yet, costs and sample preparation requirements associated with these methods preclude their application in large scale screenings and/or they preclude the use of these methods on a large number of drugs/drug candidates at multiple concentrations and/or time points of assessment. Furthermore, other high-content screening approaches that monitor drug effects on cell morphology, as disclosed in Bray (2016) Nat Protoc 11, 1757-74 and/or protein localization approaches by microscopy, for example by staining of fluorescent-tagging approaches, are hampered by the fact that these methods merely allow the monitoring of one or of only a few selected proteins.
The prior art saw in this context approaches in which fluorescently tagged reporter cells are generated either by overexpression to non-physiologic levels, by targeting a single gene with a homologous recombination template. Also “genetrap” approaches have been applied in this context; see, e.g. Morin (2001) Proc Natl Acad Sci USA 98, 15050-5). Yet, such approaches are limited by integration site biases. Yet, these “genetrap virus approaches” employ viral constructs in order to generate tagged cell pools. Since the employed viruses have tremendous integration site biases, namely in the first intron, some genes are targeted much more efficiently by these vial constructs than others. Furthermore, there are no means in these approaches to select specific gene sets or specific introns to be targeted.
Serebrenik and colleagues proposed a tagging technology of endogenous genes by homology-independent intron targeting, whereby intron-based protein trapping with homology-independent repair-based integration of a generic donor was combined, see Serebrenik (2019) Genome Research 29, 1322-28. The corresponding approach is based on homology-independent CRISPR-Cas9 editing to place a fluorescent tag as a synthetic exon into introns of individual target genes has been described by combining a generic sgRNA excising a fluorescent tag flanked by splice acceptor and donor sites from a generic donor plasmid with co-expression of a gene-specific intron-targeting sgRNA. Based on the fact that this technology employs generic donors, it is speculated that this technology would enable the generation of multiple fusion cell lines but that this would require the cloning of additional intron-targeting sgRNAs. Yet, from the technology as provided by Serebrenik, an efficient way to determine which cell expresses which protein is not feasible.
Accordingly, there is a need in the art to provide for means and methods for a characterization of factors influencing individual proteins comprised in the whole proteome whereby the whole proteome or at least a substantial part thereof is or can be assessed.
The technical problem is solved by the embodiments as characterized in the claims and as provided herein.
In a first aspect, the present invention relates to a method for monitoring the effect of an environmental factor on the proteome or parts thereof of a cell, the method comprising the steps of:
In a second aspect, the present invention relates to a method for monitoring the effect of an environmental factor on the proteome or parts thereof of a cell, the method comprising the steps of:
Accordingly, within the present invention, the tag sequence may be cloned into a donor plasmid or minicircle DNA which is then used as vector for transduction. Accordingly, the term “cloning identified gRNA sequences and tag sequence into transduction vectors” as used herein is to be understood as comprising the cloning into a donor plasmid or minicircle DNA as a vector.
As shown in the appended examples, the methods of the present invention comprise a step of identifying gRNA sequences suitable for inserting a tag in introns in the genome of a cell. In the methods of the present invention, it is preferred that a cell comprised in a population of cells that is to be tagged receives a single tag. The general principle of intron tagging is described by Serebrenik et al. (2019), loc cit. The strategy of Serebrenik et al. relies on a single generic sgRNA excising a single fluorescent tag flanked by splice acceptor and donor sites from a generic donor plasmid, which is co-expressed with a single gene-specific intron-targeting sgRNA specifying the single integration site.
However, as shown in the appended examples, the methods of the present invention generate an intron-tagged cell pool that is then used to characterize/monitor effects of an environmental factor on the proteasome or parts thereof of a cell type. This is achieved by tagging cells in a population of cells of the same cell type, whereby each cell receives a single tag. The population thus comprises cells tagged at different genomic sites, providing as a whole a tagged proteome or tagged parts thereof. Accordingly, and in contrast to the technology as provided by Serebrenik et al. (2019), loc cit., the present invention provides for means and methods wherein the whole proteome (or at least substantial parts thereof) can be monitored in one individual cell population. Therefore, the present invention provides for a “one shot” analysis of the whole proteome (or substantial parts thereof). As such, the present invention, for the first time, allows the analysis of the whole proteome (or at least substantial parts thereof) in one experiment by using different gRNA sequences for a multitude of tagged proteins combined with in situ sequencing. In this regard, the inventors found that the use of a sequencing-enabling vector that expresses the gRNA as part of the transcript that can be detected by in situ sequencing, such as a CROPseq vector, as a transduction vector allows the identification of the individual gRNA sequence, which corresponds to the tagged protein in each clone in the pool, e.g. using an imaging technique such as microscopy.
While most currently available pharmacological agents, including small molecule pharmaceuticals or pharmacologically active biologics act as inhibitors of enzymes or as modulators of receptors and transporters, drugs may also exert other functions, like (but not limited to) the inhibition or induction of protein-protein interactions and the stabilization or degradation of target proteins. In context of this invention, a scalable strategy to discover in real time the effects drugs exert on levels and subcellular localizations of a large subsets of the proteome is provided. Illustratively for the present invention, CRISPR-Cas9 based intron tagging was employed to generate cell pools expressing hundreds of GFP-fusion proteins at endogenous levels, monitor drug effects on protein levels and localization by time-lapse microscopy, and identify targeted introns by in situ sequencing. This is also documented in the appended figures and examples. From the pool of tag-positive cells (here illustratively GFP positive cells), more than 500 individual clones are isolated and these positive cells where analyzed/imaged by fluorescence microscopy in order to reveal the subcellular protein localization of many proteins for which the subcellular protein localization had not been previously characterized. Furthermore, the inventive pool of cells may also be used to study protein dynamics in response to various metabolic perturbations either in an arrayed or pooled format and strategies to identify individual drug-responsive clones in the pool are provided.
As disclosed herein, the present invention provides for methods for monitoring the effect of an environmental factor on the proteome or parts thereof of a cell. In context of this invention, the term “part(s) of the proteome” relates to a substantial part of the proteome, i.e. at least 100, at least 200 at least 300 at least 400 at least 500, at least 600, at least 700 and more preferably at least 900 expressed genes (coding for proteins).
In particular and in contrast to the prior art, in particular Serebrenik et al., the invention provided herein allows scalability to enable pooled protein tagging of a multitude of metabolic enzymes and epigenetic modifiers. As shown in the appended Examples, more than 900 metabolic enzymes were targeted. Exposing the GFP-tagged cells to compounds to monitor drug effects on the localization and levels of hundreds of proteins in real time in a pooled format, followed by identification of responding clones by in situ sequencing of the expressed intron-targeting sgRNA that corresponds to the tagged protein, as shown in
It is preferred that the methods of the present invention further comprise a sequencing step subsequent to step (d).
As detailed above, the sequencing step allows the association of individual cells with tagged proteins by sequencing the individual gRNA. This may be achieved by sequencing the gRNA insert while it is not necessary to sequence the protein directly. This can either be done on whole population level or based on expressed proteins, for example subsequent to a cell sorting step based on the expressed tag. Accordingly, in the methods of the present invention, the gRNA insert, or a part thereof, of (a) cell(s) of the population is sequenced in the genome of said cell(s) or in the transcriptome of said cell(s).
In a further preferred embodiment, the sequencing step is subsequent to step (f) of the methods of the invention.
Sequencing of the gRNA insert in the transcriptome preferably further comprises a step of reverse transcription and the use of a sequencing vector as transduction vector. An exemplary vector suitable for sequencing is a Crop-Seq vector. In an exemplary embodiment, the procedure may be as in
As shown in the appended examples, the introns to be targeted are selected based on the reading frame of the upstream exonic sequence, wherein the to be inserted sequence is in-frame with the exonic sequence. As shown in
Design of sgRNA sequences is further based on cutting efficiency. Thus, in a further embodiment, the gRNA sequences suitable for inserting a tag in the selected introns in the genome of the cell are identified according to cas9 cutting efficiency or Cpf1 cutting efficiency or Cas12b cutting efficiency. Additionally, or alternatively, the gRNA sequences suitable for inserting a tag in the selected introns in the genome of the cell are identified according to their occurrence in the genome of the cell, preferably wherein the occurrence is 1.
The vector further encodes for a tag that is to be inserted into the intron. The tag can be any tag allowing detection subsequent to integration or expression. Preferably, the tag is a fluorescence tag (preferably green fluorescent protein (GFP or enhanced GFP) or yellow fluorescent protein (YFP), or red fluorescent protein (RFP) or a tag suitable for detection by covalent (e.g. Halo tag, Clip tag, Snap tag) or non-covalent (e.g. Strep-tag, HA tag, dTag) binding to a detection reagent enabling detection by microscopy by fluorescence or luminescence.
Once the sgRNA sequences and the tag sequences have been selected and cloned, the library of sgRNA vectors is contacted with a population of a cell to integrate the tag of into selected introns.
To ensure that each cell gets infected only with a single vector and thus only one intron-targeting sgRNA, cells are preferably infected at a multiplicity of infection of below 1, preferably 0.8, 0.6, 0.4, more preferably 0.2. The term “multiplicity of infection” is to be understood as meaning that only 80%, 60%, 40% or 20%, respectively, of the cells are infected.
In order to select for infected cells, a selection marker can be comprised in the vector. An exemplary marker is the puromycin selection marker also present on the vector, e.g. the CROP-seq vector.
In an exemplary method, transient transfection can subsequently be used to introduce a plasmid for expression of Cas9 (that would introduce a cut specifically in one intron as specified by the sgRNA/gRNA previously introduced to the same cell with the CROP-seq vector) and a generic sgRNA/gRNA. A second plasmid can also be introduced that acts as a generic donor plasmid that provides the tag sequence, for example an EGFP sequence, to be integrated into the intron. This plasmid contains a Cas9 cut-site (targeted by the generic sgRNA sequence present on the Cas9 plasmid), a splice acceptor site, tag sequence (e.g. EGFP), a splice donor site and another Cas9-cut site (targeted by the generic sgRNA sequence present on the Cas9 plasmid). As an alternative to the generic donor plasmid with two Cas9 cut-sites, minicircle DNA that does not comprise a plasmid backbone and a single Cas9 cut-site, a splice acceptor site, tag sequence (e.g. GFP or EGFP) and a splice donor site may be used. When using a minicircle, the intron tagging efficiency is increased, due to the lack of a plasmid backbone that can get integrated at the intronic integration site instead of the tag sequence containing fragment. The methods may further comprise selection for cells that are successfully transfected (e.g. blasticidin marker on the Cas9 plasmid) and expansion of the cells, for example over a period of 5 days.
The methods of the present invention may further comprise a step of separating tagged cells from non-tagged cells. The separation method depends on the tag that is used. In a preferred embodiment, the cells are fluorescence tagged and the cells are separated using FACS. Accordingly, FACS or an alternative separation method can be used to sort out targeted cells. In addition, tagged proteins can be selected according to expression levels. That is, all proteins are expressed at endogenous levels, and some proteins are expressed to very low levels. To differentiate between expression levels, a further parameter may be used (e.g. a further channel during selection), e.g. for cell-specific background fluorescence and sort cells that are enriched for the tag, for example GFP or EGFP (
The sorted pool of tagged cells or the unsorted pool comprising tagged cells may further be characterized using a suitable method based on the introduced tag. For example, protein expression, protein localization, surface expression, protein-protein interaction, protein stability, and/or protein mobility may be monitored using a suitable detection method.
As such, the invention also relates to a population of cells comprising multiple cells each comprising an inserted tag sequence in an intron of said cell, wherein the tag is inserted in-frame with the preceding exonic sequence and wherein the intron into which the tag is inserted is different between cells.
As detailed above, the cells may also be characterized by sequencing. In one exemplary embodiment, the intron tagged cell pool can be characterized by PCR amplifying the integrated sgRNA sequence from genomic DNA, next generation sequencing and mapping back to the sequences in the designed sgRNA library. This way it was determined that more highly expressed genes were more likely to be successfully tagged (
In a further exemplary embodiment, the intron tagged cell pool can be further characterized by diluting to single cells and growing them up to large colonies on a 96-well plate. These single-cell derived clones can be characterized by imaging (
It is thus an object of the present invention to monitor effects of environmental factors on the proteome or parts thereof of a cell. The environmental factor may be selected from radiation, a chemical compound, a biological compound, temperature, nutrient depletion, ion concentrations or combinations thereof. In an exemplary embodiment, the method may comprise plating of cell pools at conditions of approximately 7,000 cells per well in a 384-well plate (
Within the present invention, the cell may be a HAP1 cell, K562 cell, HeLa cell, KBM7 cell, BT474 cell, MG-63 cell, SKNAS cell, A427 cell, A375 cell, A498 cell, RCH-ACV cell, HEK293T cell, A673 cell, SK-N-MC cell, A549 cell, SKMES1 cell, NCIH727 cell, THP1 cell, NB4 cell, MOLM13 cell, KASUMI-1 cell, HEL cell, NB-4 cell, HL-60 cell, RS4-11 cell, MOLT7 cell, aTC1 cell, bTC3 cell, Min6 cell or another cell line, preferentially an adherent, non-migratory cell line .
The invention furthermore relates to a population of cells comprising multiple cells each comprising an inserted tag sequence in an intron of said cell, wherein the tag is inserted in-frame with the preceding exonic sequence and wherein the intron into which the tag is inserted is different between cells. Accordingly, the present invention provides for a novel and inventive cell population that is characterized in that said cell population and that comprises multiple cells each individually tagged in one intron sequence of a gene coding for an (expressed) protein. As such the population of cells of the present invention provides a novel and inventive tool for whole proteome analysis as well as for a valuable tool for drug screenings on a cellular basis. In a further embodiment, the present invention comprises a kit comprising said novel and inventive cell population.
Such a kit is also particularly useful as means for drug screenings, drug evaluations, treatment monitoring, as research tool for basic sciences. Further uses of the inventive cell population are within the capabilities of the skilled artisan.
In a preferred embodiment, the population of cells of the present invention is obtained by a method comprising the steps of:
The population of cells provided herein and, in particular, as obtained by the method of the invention, can be used e.g. for screening purposes, in particular drug screening. For example, the population of cells may be used in a method for screening a drug for its effect on the proteome of the cell population. This may be used, inter alia, in the screening of a drug suitable for downregulating/upregulating a specific protein or group of proteins that is/are part of the proteome of the population of cells.
The invention is further illustrated by the following non-limiting figures and examples:
To design an intron-targeting sgRNA library for metabolic enzymes and epigenetic modifiers a list of 2,889 genes was generated by combining a published list of all classic metabolic enzymes (see, Corcoran (2017) Am J Physiol Renal Physiol 312, F533-F542), most genes in a human CRISPR metabolic gene knockout library (see; Birsoyv (2015) Cell 162, 540-51) as well as genes annotated with the GO terms “Histone modification”, “DNA methylation” or “DNA demethylation”. Then, the Ensembl BioMart data mining tool was used to obtain chromosomal coordinates of introns of the primary transcripts of those genes and only those introns were selected where integration of the donor plasmid does not lead to frameshift mutations after splicing, since the donor plasmid starts with a full codon and is not compatible to all exon-exon junctions. Using Ensembl BioMart this filtering was done by only selecting introns that are preceded by an exon with the attribute “End phase=0”. The GuideScan (Perez, 2017, Nat Biotechnol 35, 347-349) was then used to obtain the top 20 guides for each selected intronic region based on the GuideScan cutting efficiency score. Those 20 guides were then ranked based on a combined on- and off-target score using the scores provided by GuideScan. For genes that have only one intron that can be targeted, up to three sgRNAs per intron were selected, for genes with two or three introns that can be targeted, up to 2 sgRNAs per intron were selected and for genes that have more than three introns that can be targeted, the top ranked sgRNA of each intron was selected. Using that strategy, 14,049 sgRNAs targeting 11,614 introns of 2,387 genes were selected. In addition, 75 non-targeting sgRNAs from the human Brunello CRISPR KO library (Doench, 2016, Nat Biotechnol 34, 184-191) were added to the library. For cloning of the library into the CROPseq-Guide-Puro vector16 (Addgene #86708) using Gibson Assembly, adapter sequences were added to the sgRNA sequences and 74 nucleotide oligos were ordered as an oligo pool (Twist Biosciences). Additional adapters were added to the pooled oligos by PCR (8 cycles, NEB Q5) to generate fragments with a size of 140 nucleotides that were purified (QIAGEN MinElute PCR Purification) before being used for Gibson Assembly. The vector was digested with BsmBI (NEB), size-selected using agarose gel electrophoresis and gel purified (QIAGEN QIAquick Gel Extraction Kit) followed by an additional column purification (QIAGEN QIAquick PCR Purification Kit). 4 Gibson Assembly reactions (10 μl NEBuilder HiFi DNA Assembly, 60 ng vector, 10 ng insert) were prepared and incubated at 50° C. for 45 minutes. Reactions were pooled and purified (QIAGEN MinElute PCR Purification) before being used for transformation in Lucigen Endura electrocompetent bacteria (four reactions, 25 μl each). Bacteria were plated on four 245×245×25 mm Bioassay dishes and dilution plates (1:10,000) and incubated at 32° C. for 16 h. Cells were scraped off the plates and plasmid DNA was extracted using multiple QIAGEN Plasmid Plus Midi kits. Library coverage was 211× and was estimated based on the number of colonies on the dilution plates.
The GFP-donor plasmid with the coding sequence of EGFP flanked by generic sgRNA targeting sites, splice acceptor and splice donor sites and 20 amino acid linkers was assembled from 4 fragments using Gibson Assembly to generate a donor plasmid that is similar in design to a previously published donor plasmid that can be used for intron tagging; see Feldman (2019) Cell 179, 787-799 e17. The DNA fragment with a 25 nucleotide overlap to the pUC19 vector and 32 nucleotide overlap to the N-terminus of EGFP was generated from overlapping oligos (Sigma) and comprises a generic sgRNA targeting site that is not present in the human genome (He, 2016,. Nucleic Acids Res 44, e85) followed by a splice acceptor site (Guzzardo, 2017, Sci Rep 7, 16770) and a flexible 20 amino acid glycine-serine linker. This fragment is followed by a fragment with the coding sequence of EGFP without a start or stop codon that was generated by PCR. The third fragment has a 27 nucleotide overlap to the C-terminus of EGFP and a 25 nucleotide overlap to the pUC19 vector and was generated from overlapping oligos (Sigma) and comprises a flexible 20 amino acid glycine-serine linker followed by a splice donor site (Guzzardo, 2017, loc, cit) the generic sgRNA targeting site. The pUC19 vector was linearized by PCR for Gibson Assembly (NEBuilder HiFi DNA Assembly) with the other three fragments.
The pX330 plasmid expressing Cas9 and the generic sgRNA targeting the donor plasmid was generated by digesting pU6-(Bbsl)_CBh-Cas9-T2A-mCherry (Addgene #64324; see also Chu, 2015, Nat Biotechnol 33, 543-8) with Bbsl followed by ligation with an annealed oligo duplex as described before; see, Ran (2013), Nat Protoc 8, 2281-2308. mCherry was replaced with a Blasticidin resistance (BSD) using Gibson Assembly.
For the generation of lentiviral particles, HEK293T cells were transiently transfected with the intron-targeting library and packaging plasmids psPAX2, pMD2.G using PEI transfection. After 12 h the media was replaced with IMDM supplemented with 10% FBS and P/S. Viral supernatant was collected 48 h after transfection and stored at −80° C. HAP1 cells were transduced with virus and selected with puromycin for three days. Multiplicity of infection (MOI) was 0.2 and transduction was done at a coverage of 500×. After puromycin selection, cells were grown for one day in media without puromycin before being seeded for transfection (8 million cells per 15 cm dish, 48 million cells in total). One day after seeding, each dish was co-transfected with 20 μg pX330 expressing Cas9-BSD and the generic sgRNA and 10 μg EGFP donor plasmid with 90 μl Turbofection in 2.5 ml OptiMEM as described by the manufacturer. Transfection efficiency was approximately 10% as determined by a transfection done in parallel with pX330 Cas9-mCherry and the EGFP donor plasmid using the same ratio. The next day, cells were subjected to a transient selection using Blasticidin (10 μg/ml) for 24 h. After selection, cells were maintained in full media without Blasticidin and sorted five days after transfection by flow cytometry using a Sony Cell Sorter SH800ZD. 0.03% cells were GFP-positive and in total 24,300 of those GFP-positive cells were sorted and the cell population was expanded for 7 days before DNA was isolated to determine sgRNA abundance in the cell population.
In order to generate an NGS library, genomic DNA from one million cells of the GFP positive cell population was isolated and the sgRNA region was amplified by PCR (two reactions using 500 ng genomic DNA, NEB Q5 high-fidelity Polymerase). Illumina adapter ligation and sequencing were done by a commercial sequencing service. To determine sgRNA abundance, sgRNA sequences were extracted from NGS reads using Cutadapt and sgRNA read counts were determined using the MAGeCK count function to match the extracted reads to the sgRNA library. Of the 14,049 sgRNA in the library we considered 1,777 as highly enriched as these sgRNAs accounted for 90% of the obtained sequencing reads while the majority of sgRNAs was not detectable anymore. The remaining 10% of sequencing reads comprise an additional 1,622 sgRNAs, which we do not consider as enriched, as each of them is only supported by a few sequencing reads that might be the result of cells being transduced with two sgRNAs or the result of off-target integration and expression of the GFP-tag. Our library also includes 75 nontargeting sgRNAs making up 0.53% of the sgRNAs in our library. As expected, they are depleted in the pool of GFP-positive, making up 0.15% of the sequencing reads with only 3 non-targeting sgRNAs among the 1,777 sgRNAs we consider enriched.
To obtain clonal cell lines, cells were seeded at a concentration of 0.7 cells per well in 96-well cell culture plates. After 9 days of clonal expansion, 768 colonies were harvested using trypsin and cell suspensions were transferred in equal amounts to eight 96-well imaging plates (Perkin Elmer CellCarrier Ultra) and eight corresponding 96-well cell culture plates. After 24 h, cells on the imaging plates were imaged on a Perkin Elmer Opera Phenix High Content Screening System (5 fields of view per well, 63× water-immersion objective, confocal mode, excitation: 488 nm, emission: nm, 700 ms). Images were processed using Cell Profiler. To identify the intron-targeting sgRNAs expressed in imaged cells, multiplexed amplicon sequencing of the sgRNA regions was performed in the corresponding clones on the eight 96-well cell culture plates. Cells were lysed and cell lysates were used for PCR to amplify the sgRNA region in each clone using barcoded primers flanking the sgRNA region (36 different 5-mers added to the 5′end of the forward primer and 24 different 5-mers added to the 5′end of the reverse primer, 768 of all possible 864 combinations were used). PCR reactions were pooled and column purified before being send for sequencing by a commercial sequencing service. NGS reads were demultiplexed using Cutadapt (see Martin, M. EMBnet. journal, [S.I.], v. 17, n. 1, p. pp. 10-12, may 2011) and sgRNA read counts for each individual well were obtained using MAGeCK (see, Li (2014) Genome Biol 15, 554 (2014). For further analysis clones were excluded, for which either no cells in any of the 5 fields of view that were imaged were observed, no sequencing reads for the corresponding well were observed or for which polyclonal cell populations as determined by imaging or detection of multiple sgRNAs per well were observed. Using that strategy, images of 335 clones were obtained for which the expressed intron-targeting sgRNA corresponding to the tagged protein could be identified.
Comparison of subcellular protein localizations of GFP-tagged protein in 335 clones to the localization patterns as annotated on The Human Protein Atlas was done as described previously for the comparison of N- or C-terminally GFP-tagged proteins to IF-based annotations on the Human Protein Atlas, see Stadler (2013) Nat Methods 10, 315-23. Briefly, the overlap was defined as ‘identical’ if one or multiple main and additional localizations were the same in the intron-tagged clone compared to The Human Protein Atlas, ‘similar’ if one localization is the same in the clone compared to The Human Protein Atlas with additional localization(s) observed either in the clone or on The Human Protein atlas or ‘dissimilar’ if there were no common subcellular localization patterns. Extended localization annotations such as nucleoplasm, nuclear speckles or nucleoli that were considered as “nuclear” were not taken into account.
Live cell imaging was performed on a PerkinElmer Opera Phenix microscope with excitation laser 488 nm, and emission filter 500-550 nm, 700 ms.
Identification of the expressed sgRNAs by in situ sequencing was performed by following and modifying published protocols, see, e.g., Feldman (2019) loc. cit; Ke (2013) Nat Methods 10, 857-60; and Larsson (2010) Nat Methods 7, 395-7.
After live-cell imaging after treatment with MTX or dBET6, cells were fixed with 4% paraformaldehyde for 30 minutes, washed with PBS, permeabilized with 70% ethanol for 30 minutes and washed with PBS-T (PBS+0.05% Tween-20) twice. Reverse transcription mix (1× RevertAid RT buffer, 250 μM dNTPs, 0.2 mg/mL BSA, 1 μM RT primer, 0.8 U/mL Ribolock RNase inhibitor, and 4.8 U/mL RevertAid H minus reverse transcriptase) was added to the sample and incubated for 16 hours at 37° C. Following reverse transcription, cells were washed 5 times with PBS-T and post-fixed with 3% paraformaldehyde and 0.1% glutaraldehyde for 30 minutes at room temperature and washed 5 times with PBS-T. Cells were incubated in a padlock probe and extension-ligation reaction mix (1× Ampligase buffer, 0.4 U/mL RNase H, 0.2 mg/mL BSA, 100 nM padlock probe, 0.02 U/mL KlenTaq polymerase, 0.5 U/mL Ampligase and 50 nM dNTPs) for 5 minutes at 37° C. and 90 minutes at 45° C., and then washed 2 times with PBS-T. Circularized padlocks were amplified with rolling circle amplification mix (1× Phi29 buffer, 250 μM dNTPs, 0.2 mg/mL BSA, 5% glycerol, and 1 U/mL Phi29 DNA polymerase) at 30° C. for 4 hours. Rolling circle amplicons were prepared for sequencing by hybridizing a mix containing sequencing primer oSBS_CROP-seq (1 μM primer in 2×SSC+10% formamide) for 30 minutes at room temperature. Barcodes were read out using sequencing-by-synthesis reagents from the Illumina NextSeq 500/550 kit v2 (Illumina 15057934). First, samples were washed with incorporation buffer (NextSeq 500/550 buffer cartridge, position 35) and incubated for 4 minutes in incorporation mix (NextSeq 500/550 reagent cartridge, position 31) at 60° C. Samples were then washed with incorporation buffer (4 washes, 60° C. for 4 minutes at the last wash) and placed in scan mix (NextSeq 500/550 reagent cartridge, position 30) for imaging. Imaging was performed on a PerkinElmer Opera Phenix microscope with excitation laser: 561 nm, emission filter:570-630, 500 ms; excitation laser: 640 nm, emission filter: 650-760 nm, 500 ms using a 63× water immersion objective, confocal mode. Based were detected as follows: Base T: signal in 561 channel; Base C: signal in 640 channel, Base A: (weaker) signal in both channels, Base G: no signal. Following each imaging cycle, samples were washed with the cleavage mix (NextSeq 500/550 reagent cartridge, position 29) once followed by incubation with cleavage mix for 4 minutes at 60° C. to remove dye terminators. Samples were washed 5 times with incorporation buffer before starting the next cycle.
The present invention relates to the provision of a large cell pool that comprises individual intro-tagged proteins.
As illustrative, non limiting example of the means and methods of the present invention, a pooled GFP (green fluorescent protein)-intron-tagging of metabolic enzymes is provided herein. As provided herein, a CRISPR/Cas9 mediated intron tagging approach is employed to generate a large pool of cells herein with more than 900 tagged proteins, wherein each cell comprises one tagged protein, i.e. a “one protein per cell” approach is provided. The inventive means and methods of the present invention offer the following advantages, namely that (i) by designing the sgRNA target genes can be chosen as desired (ii) by designing the sgRNA different introns for the same genes can be chosen, allowing to avoid tagging within functionally important domains and (iii) that very homogenous distributions of cells can be generated with roughly equal numbers of clones for each targeted protein.
A second key aspect of the inventive method is the application of in situ sequencing. Following exposure of the inventive cell pool to molecules to be screened (for example. drugs and/or pharmacologically relevant molecules), some cells respond with changes in protein localization or in protein abundance (measured by fluorescence microscopy of the GFP tag fused to the protein). The application of a CROP-seq vector as provided and illustrated herein for the intron-targeting sgRNA library allows for in situ sequencing in order to identify the tagged intron. In order to render this compatible with the provided illustrative GFP tagged cell pool, the in situ sequencing protocol was adopted to a two color system
Accordingly, a CRISPR-Cas9 based intron tagging is employed herein to generate cell pools expressing hundreds of labeled/tagged-fusion proteins at endogenous levels, to monitor drug effects on protein levels and/or to localization by time-lapse microscopy. Furthermore, within the means and methods of the present invention is the identification of targeted introns by in situ sequencing. Accordingly, the means and methods of the present invention provide for a pooled protein tagging approach allowing for the localization and even (expression) levels of hundreds of proteins in individual cells in real time; see also illustrative
In context of the present invention, 2,889 genes were selected to be targeted comprising all classic metabolic enzymes and epigenetic modifiers; see Corcoran (2017). Am J Physiol Renal Physiol 312, F533-F542; Birsoy (2015), Cell 162, 540-51. For the 2,387 genes from this set that harbor targetable introns in the selected reading frame, a library comprising 14,049 sgRNAs targeting 11,614 introns (
It was reasoned that the highly diverse pool of cells expressing GFP-tagged proteins can be used to identify compounds that change protein levels or localization of any of the tagged proteins. Therefore, the cell pool was treated with the BRD4-targeting PROTAC dBET6 (Winter (2017). Mol Cell 67, 5-18 e19) and high-content live cell imaging was used to track protein dynamics of GFP-tagged proteins over 9 hours in approximately 7,000 cells in a single well on a 384-well plate (
It was then tested whether the cell pool also reveals complex cellular responses to compounds that act by conventional mechanisms. Therefore the cell pool was treated with methotrexate (MTX), an antimetabolite impairing DNA and RNA synthesis and causing DNA damage by inhibiting tetrahydrofolate metabolism. Changes to the localizations of several proteins were observed in the cell pool (
The generation of targeted GFP tagged cell pools enables, inter alia, the identification of cellular drug responses by time lapse microscopy. Future applications of the present invention and corresponding uses, including deep learning and image recognition as well as direct in situ sequencing, will further accelerate the assignment of the targeted clones directly from screening well. Importantly, the low cost and fast timescales of imaging-based approaches enable applications both in large scale screening and in the rapid optimization of doses and response kinetics in a cellular system. This approach is especially useful for the discovery and development of PROTACs and molecular glue degraders, for which activity can easily be determined by the disappearance of the tagged protein, however we document herein also that the means and methods of the present invention can be employed to verify and/or confirm known drug actions and/or to discover new effects of known drugs. Importantly, intron tagging can easily be applied for other sets of genes beyond metabolic enzymes and potentially in a genome-wide manner to study protein dynamics at scale not only in response to drug treatment or other physiological perturbations.
For protein tagging with an intron tagging strategy, a generic sgRNA is excising a fluorescent tag flanked by splice acceptor and donor sites from a generic donor plasmid. This excision was done by cutting the donor plasmid twice, resulting in the fragment containing the coding sequence of the tag flanked by splice acceptor and a splice donor (
To compare the conventional donor plasmid to a minicircle, it was attempted to tag CANX at intron 14 by using either a GFP donor plasmid containing two generic sgRNA sites or a GFP minicircle containing only one generic sgRNA sites and no plasmid backbone. A tagging rate of 3.0% was achieved when using the GFP donor plasmid as determined by analyzing transfected cells by flow cytometry (
In a second independent experiment, similar improvements when using the minicircle were observed (4-fold increase in GFP-positive cells when using the same amount of GFP minicircle DNA as GFP donor plasmid and 3-fold when using ⅓ the amount of minicircle DNA) but in this experiment the overall tagging rates were lower due to lower transfection efficiency (less than 10% of cells that were analyzed were transfected,
Minicircle DNA was produced with a commercial minicircle production kit (SBI MC-Easy™ Minicircle DNA Production Kit). First, a parental production plasmid was generated by cloning a DNA fragment starting with the generic sgRNA target site followed by a splice acceptor, a 20 amino acid linker sequence, the coding sequence of EGFP, another 20 amino acid linker sequence and a splice donor site into the pMC.BESPX-MCS1 production plasmid. The DNA fragment was generated by PCR using the GFP donor plasmid as a template, pMC.BESPX-MCS1 was digested with EcoRV and the fragment was integrated at the restriction site via Gibson Assembly. The E. coli producer strain ZYCY10P3S2T was transformed with the ligation reaction and clonal bacterial colonies were selected for isolation and sequencing of parental plasmid. A colony containing the correct parental plasmid was used for minicircle production as described by the manufacturer. In brief, bacteria were grown overnight in the provided growth media and induction media was added the next day to induce att recombination and parental plasmid backbone degradation. Minicircle DNA was isolated from bacterial pellets using multiple Qiagen Plasmid Plus Midi kits and the produced minicircle was analyzed by restriction enzyme digest and gel electrophoresis.
For intron tagging experiments, A549 cells were cells seeded in a 12-well plate and were co-transfected with 400 ng of the CROPseq plasmid expressing the intron-targeting sgRNA targeting intron 14 of the CANX gene, 400 ng of the pX330 plasmid expressing Cas9-mCherry and the donor-targeting sgRNA, together with 200 ng of the GFP donor plasmid or 200 ng GFP minicircle using Lipofectamine 3000 as described by the manufacturer. In samples with ⅓ of the amount of GFP minicircle, cells were cotransfected with 467 ng of the CROPseq plasmid with the intron-targeting sgRNA targeting intron 14 of the CANX gene, 467 ng of the pX330 plasmid expressing Cas9-mCherry and the donor-targeting sgRNA, and 67 ng GFP minicircle. To enrich for transfected cells, mCherry-positive cells were sorted 48 h after transfection and expanded for one week before GFP-positive cells were sorted. In an independent experiment a px330 plasmid expressing Cas9-BSD instead of Cas9-mCherry was used, cells were not enriched for transfected cells and GFP-positive cells were sorted 48 h after transfection.
Annotation and sequence of the parental GFP minicircle production plasmid:
Only the sequence between the attB and attP site circularizes and remains in the final GFP minicircle.
Only the part between the attB and attP sites was designed. The parental producer plasmid backbone is part of the commercial SBI MC-Easy™ Minicircle DNA Production Kit.
Number | Date | Country | Kind |
---|---|---|---|
19211077.3 | Nov 2019 | EP | regional |
The present application is a national phase application under 35 U.S.C. § 371 of International Application No. PCT/EP2020/082295, filed Nov. 16, 2020, the entire contents of which are hereby incorporated by reference. International Application No. PCT/EP2020/082295 claims the priority benefit of European Application No. 19211077.3, filed Nov. 22, 2019.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2020/082295 | 11/16/2020 | WO |