The present invention relates generally to biomolecular condensates or membraneless compartments in cells and the design and application of phase separation sensors capable of targeting or associating with biomolecular condensates. The phase separation sensors comprise at least two domains including one or more accessory protein or molecule and an artificial client protein or intrinsically disordered sequence. The invention also relates to methods and applications of the sensors.
Biomolecular condensates are two- and three-dimensional compartments in eukaryotic cells that concentrate specific collections of molecules without an encapsulating lipid-based membrane. Condensate formation has emerged as a fundamental mechanism for the organization of biomolecules within the nucleus and cytosol and at membranes (Hyman A A et al Annu Rev Cell Dev Biol 30 (2014) 39-58; Banani S F et al Nat. Rev. Mol. Cell Biol. 18 (2017) 285-298; Shin Y et al Science 357 (2017); Alberti S Curr. Biol. 27 (2017) R1097-R1102). Many condensates behave as dynamic liquids and appear to form through liquid-liquid phase separation (LLPS) driven by weak, multivalent interactions between macromolecules. There are numerous manifestations of this multivalency, including sticky ultra-weak interactions between intrinsically disordered proteins (IDPs), arrays of modular protein domains (Li P et al Nature 483 (2012) 336-340; Fromm S A et al Angew Chem Int Ed Engl 53(2014) 7354-7359; Banjade S et al elife 3 (2014), e04123, doi.org/10.7554/eLife.04123;Zeng M et al Cell 166 (2017)1163-1175.e12; Su X et al Science 352 (2016) 595-599; Sun D et al Cell Res. 28 (2018) 405-415), distributed weakly adhesive motifs separated by intrinsically disordered regions (IDRs) of proteins (Lin Y et al Mo. Cell 60 (2015) 208-219; Nott T J et al Mol Cell 57 (2015) 936-947; Patel A et al Cell 162 (2015) 1066-1077; Molliex A et al Cell 163 (2015) 123-133; Murakami T et al Neuron 88 (2015) 678-690; Xiang S et al Cell 163 (2015) 829-839), and repetitive base-pairing elements in RNA and DNA (Jain A et al Nature (2017) doi.org/10.1038/nature22386; Langdon E M et al Science (2018), doi.org/10.1126/science.aar7432). Specific interactions, such as interactions between modular binding domains and nucleic acid base pairing, weaker interactions between intrinsically disorder regions, and nonspecific interactions, such as electrostatic interactions and hydrophobic interactions, influence condensate formation and composition.
Representative and recognized biomolecular condensates include PML nuclear bodies, P-bodies, stress granules, the nucleolus, and two-dimensional membrane localized LAT and nephrin clusters. Individual condensates can contain hundreds of distinct molecular components. For example, promyelocytic leukemia protein (PML) bodies can contain over 200 unique proteins (Van Damme E et al J. Int. Biol. Sci. 6 (2010) 51-67,doi.org/10.7150/ijbs.6.51), the nucleolus can contain over 4500 unique proteins (Ahmad Y et al (2009) Nucleic Acids Res 27:D181-D184), and stress granules can contain over 100 proteins as well as over 1000 RNA transcripts (Khong A et al Mol. Cell 68 (2017) 808-820.e5,doi.org/10.1016/j.molcel.2017.10.015; Jain S et al Cell 164 (2016) 487-498; Markmiller S et al Cell 172 (2018), doi.org/10.1016/j.cell.2017.12.032 (590-604.e13); Youn J et al Mol. Cell 69 (2018) 517-532). Some of these components are unique to a specific condensate, but others can be shared between different types, particularly among the various RNA-containing structures (Langdon E M et al Science (2018), doi.org/10.1126/science.aar7432; Markmiller S et al Cell 172 (2018), doi.org/10.1016/j.cell.2017.12.032 (590-604.e13); Youn J et al Mol Cell 69 (2018) 517-532; Gopal J J et al Proc Natl Acad Sci 114 (2017) E2466-E2475, doi.org/10.1073/pnas.1614462114; Buchan J R et al Mol Cell 36 (2009) 932-941). Composition can vary dramatically under different cellular conditions and can rapidly change in response to signals (Markmiller S et al Cell 172(2018), doi.org/10.1016/j.cell.2017.12.032 (590-604.e13); Youn J et al Mol. Cell 69 (2018) 517-532; Buchan J R et al Mol Cell 36 (2009) 932-941; Fong K et al J Cell Biol 203 (2013) 149-164; Weidtkamp-Peters S et al J. Cell Sci. 121 (2008) 2731-2743). Even in the absence of stimuli, many condensate components or residents rapidly exchange between the condensate and the surrounding cytoplasm or nucleoplasm (Molliex A et al Cell 163 (2015) 123-133; Brangwynne C P et al Proc Natl Acad Sci 108 (2011) 4334-4339; Woodruff J B et al Cell 169 (2017),doi.org/10.1016/j.cell.2017.05.028(1066-1077.e10); Schwarz-Romond T et al J Cell Sci 120 (2007) 2402-2412; Dundr M et al Biochem. J. 356 (2001) 297-310).
Often only a few components or residents are necessary to form the condensate and deletion or depletion of these molecules decreases the size and/or number of the structures in a cell, while overexpression can have the opposite effect (Clemson C M et al Mol Cell 33 (2009) 717-726; Ishov A M et al J Cell Biol 147 (1999) 221-234; Teixeira D et al Mol Biol Cell 18 (2007) 2274-2287; Rao B S et al Proc Natl Acad Sci USA 114 (2017) E9569-E9578,doi.org/10.1073/pnas.1712396114). These resident elements are referred to as scaffolds. PML is an example of a scaffold—knocking out PML abolishes PML nuclear body formation while increasing PML expression results in an increased number of PML nuclear bodies (Ishov A M et al J. Cell Biol. 147 (1999) 221-234; Zhong S et al Blood 95 (2000) 2748-2752; de Stanchina E et al Mol Cell 13 (2004) 523-535). Other condensate residents are concentrated within the structure, often by direct interactions with scaffolds, but are not required for condensate formation, and these are referred to as clients. Examples of clients include PML nuclear body proteins Sp100 and BLM, and it has been shown that knocking out either protein does not ablate PML nuclear body formation (Ishov A M et al J. Cell Biol. 147 (1999) 221-234; Zhong S et al Oncogene 18 (1999) 7941).
Despite remarkable progress, the study of cellular phase separation remains challenging and often relies on truncated protein mutants, reconstituted systems in non-physiological buffers, and overexpression/knockin of tagged fusions that can alter a protein's phase separation behavior (Alberti S et al Cell 176, 419-434 (2019); Schmidt H B et al Nature communications 10, 1-14 (2019); Bracha D et al Cell 175, 1467-1480.e1413 (2018).
Therefore, it should be apparent that there still exists a need in the art for methods and approaches to evaluate, detect, monitor, target, and assess cellular phase separation and biomolecular condensates. There are deficiencies in the present knowledge and available tools to be able to assess, monitor and manipulate biomolecular condensates, particularly in living cells and in vivo, especially in instances where the condensates are involved in critical aspects of cellular physiology or provide markers of or targets for disease or conditions.
The citation of references herein shall not be construed as an admission that such is prior art to the present invention.
In its most general embodiment, the present invention extends to biomolecular condensates or membraneless compartments in cells, and the ability to detect, target, monitor, assess and modulate biomolecular condensates, including in vitro or ex vivo in cells or tissues and in vivo in animals, including humans, and in animal model systems. The invention provides novel phase separation sensors capable of targeting or associating with biomolecular condensates, including nascent or preassembled biomolecular condensates. These sensors are designed to preferentially target or associate with target biomolecular condensates. The sensors comprise at least two domains, wherein a first domain includes one or more accessory protein or molecule and a second domain includes an artificial client protein or intrinsically disordered sequence. The artificial client protein or intrinsically disordered sequence is uniquely capable of interacting with one or more component protein, particularly one or more scaffold protein, in a target biomolecular condensate.
In an initial embodiment of the invention, a phase separation sensor is provided wherein the sensor is capable of targeting or associating with a biomolecular condensate and comprises at least two protein domains, wherein the first domain comprises one or more accessory protein and the second domain comprises an artificial client protein having intrinsic disorder and capable of engaging in ultra-weak phase separation-specific amino acid interactions with one or more component protein in the condensate.
In an embodiment, the phase separation sensor lacks independent phase separation behavior when expressed in the cell. In an embodiment, the phase separation sensor lacks independent phase separation behavior when expressed in the cell at reasonably high levels.
In another embodiment, the phase separation sensor associates with the biomolecular condensate without disrupting the condensate.
In an embodiment of the invention, the artificial client protein is an intrinsically disordered protein having low complexity sequence. In an embodiment, the artificial client protein contains one or more disordered region that provides one or more or multiple weakly adhesive sequence elements. In an embodiment, the artificial client protein sequence lacks recognized protein three dimensional structural aspects. In an embodiment, the artificial client protein sequence contains repeated sequence elements. In an embodiment, the artificial client protein sequence contains low complexity sequence elements. In a particular such embodiment, the low complexity sequence elements provide basis for multivalent weakly adhesive intermolecular interactions. In a particular embodiment, the low complexity sequence elements provide basis for multivalent weakly adhesive intermolecular interactions with a target scaffold protein.
In accordance with another embodiment of the invention, the sensor artificial client protein sequence comprises similar compositional bias or comprises related sequence patterns with low sequence identity to the amino acid sequence of a naturally-occurring intrinsically disordered protein or protein region within a larger protein or target protein. In one embodiment, the larger/target protein is a component of a biomolecular condensate. In one such embodiment, this similar compositional bias or related sequence patterns contributes to or is responsible for driving assembly of said biomolecular condensate.
In an embodiment of the invention, the phase separation sensor's artificial client protein sequence is related to the native or target intrinsic disordered protein (IDP) sequence. In an embodiment of the invention, the phase separation sensor's artificial client protein sequence is related to the native or target intrinsic disordered protein (IDP) sequence by reversing native or target IDP amino acid sequence. In accordance with this embodiment, the sensor's sequence artificial client protein sequence is generated by reading the original, native or target IDP sequence in the non-natural C-terminal to N-terminal direction. This provides an absolutely distinct non-native sequence for the artificial client protein, including wherein the chirality and orientation and/or the structure of molecule in space is absolutely distinct from the target IDP amino acid sequence. In an embodiment, the native or target IDP sequence's compositional bias, overall amino acid sequence and charge is maintained in the artificial client protein. In an embodiment, the artificial client protein sequence is a randomized or jumbled sequence corresponding to or based on the sequence of the target IDP sequence.
In another and alternative embodiment, the sensor's artificial client protein sequence is generated de novo without reference to the or any target sequence. In this embodiment, the artificial client protein sequence is intrinsically disordered and may comprise a repeated sequence which is a low complexity sequence comprising a limited number of amino acids. In certain embodiments, the artificial client protein sequence is intrinsically disordered and may comprise a repeated sequence which is a low complexity sequence comprising a limited number of amino acid in a repeating sequence pattern.
The invention contemplates a sensor molecule or protein which provides a functional, active, visible or detectable label or marker. In one such embodiment, the sensor comprises a reporter molecule or protein which provides a functional, active, visible or detectable label or marker The invention contemplates a sensor molecule or protein which provides a function, including an enzymatic activity or other activity or capability. In an embodiment of the invention, the sensor comprises in a first domain, or in one or more embodiment or portion of a first domain, one or more accessory protein wherein at least one accessory protein provides a detectable or functional label.
In one or more embodiment, the at least one accessory protein may be selected from fluorescent protein, protease, nuclease, ligase, peroxidase, phosphatase, kinase and protein capable of modifying a protein or nucleic acid.
In one such embodiment, at least one accessory protein is a fluorescent protein. In embodiments thereof, the fluorescent protein may be a GFP protein. In an embodiment, the GFP protein is a GFP protein with positively-charged amino acids exposed on the protein surface. In an embodiment, the GFP protein may be +15GFP. In an embodiment, the reporter molecule is a GFP with net charge +15 and is selected from +15sfGFP (SEQ ID NO:28) and +15sfGFPK (SEQ ID NO:29).
In an embodiment, a one or more accessory protein may be an enzyme. In one embodiment the enzyme may be a protease, nuclease, ligase, peroxidase, phosphatase, kinase.
In an embodiment, one or more accessory protein may comprise a label. In an embodiment, the label may include a radioactive element. In one such embodiment, the sensor may thereby introduce a label or radioactive element into a cell, particularly into a biomolecular condensate in a cell. The label or element may then be examined by known techniques, which may vary with the nature of the label or element attached. In the instance where a radioactive label is used, it may be selected from isotopes such as the isotopes 3H, 14C, 32P, 35S, 36C, 51Cr, 57Co, 58Co, 59Fe, 90Y, 125I, 131I, and 186Re.
In accordance with a further embodiment, at least one accessory protein may be capable of tagging one or more biomolecular condensate component with a detectable or functional molecule, peptide or marker.
In an embodiment of the invention, the sensor is a functionalized sensor and at least one accessory protein is capable of modifying a target component protein in the condensate.
In another embodiment, the sensor is a functionalized sensor and at least one accessory protein is capable of delivering a compound or agent to the condensate or to a target component protein in the condensate.
In embodiments of the invention, the two or more domains comprising the sensor may be directly linked or may be separated in each or any instance by one or more linker sequence. In a particular embodiment, one or more accessory protein(s) and/or the accessory protein(s) and the artificial client protein are separated by a flexible linker sequence. The flexible linker sequence may comprise between 2 and 10, 10 and 20, 20 and 40, 2 and 20, 2 and 30, 2 and 40, up to 10, up to 20, up to 30, up to 40 amino acid residues. The flexible linker sequence may comprise between 2 and 10 amino acid residues. In a preferred embodiment, one or more short flexible linkers of 2 to 10 residues in length is utilized. In an embodiment, the linker sequence lacks charged residues. In an embodiment, the linker sequence contains charged residues. In an embodiment, the linker sequence contains charged residues and is zwitterionic, having equal numbers of positive-charged and negatively-charged residues. In exemplary embodiments and sequences hereof, linkers of 2, 4 and 10 residues are utilized. In embodiments, linker sequences GSPG (SEQ ID NO: 59) and/or GRSDGVPGSG (SEQ ID NO: 60), as examples, are utilized.
In a particular embodied embodiment of the invention, a phase separation sensor is provided wherein the target component protein is a filaggrin family protein or paralog protein. In one such embodiment, the sensor artificial client protein sequence is derived from or based on a filaggrin protein sequence. In one embodiment, the artificial client protein sequence is derived from or based on human filaggrin protein sequence or on a mouse filaggrin protein sequence. In an embodiment, the artificial client protein sequence is derived from or based on a filaggrin protein sequence provided in TABLE 1, or a mouse or human filaggrin protein sequence including as provided in SEQ ID NO: 1 or SEQ ID NO: 56. In a particular embodiment, the artificial client protein sequence is derived from or based on a filaggrin protein repeat component sequence.
In embodiments of the invention, exemplary filaggrin-based or filaggrin-targeting phase separation sensors are provided herein, including in TABLE 3 and in Examples 1 and 2 hereof. These sequences include artificial client protein sequences designed based on the filaggrin target sequence and tested herein. Phase separation sensor designs and examples are provided and described herein an include SEQ ID NO: 26, 27, 50, 51, 52, 53 and 54. Phase separation sensors include Sensor A (SEQ ID NO:26), Sensor B (SEQ ID NO:27), Apex2-Sensor A (SEQ ID NO:50), Apex2-Sensor B (SEQ ID NO:51), Sensor C (SEQ ID NO:52) and Sensor D (SEQ ID NO:53). An additional phase separation sensor is provided in Sensor Apex2-excluded (SEQ ID NO: 54).
In a further embodiment, phase separation sensors are contemplated and provided herein that are directed to one or more biomolecular condensate in a cell or in vivo in an animal. The sensor(s) of the invention may target or associate with one or more biomolecular condensate in the cytoplasm of a cell and/or in the nucleus of a cell. In one such embodiment, the condensate is a keratohyalin granule (KG) in the epidermis or in one or more skin cell. In embodiments, one or more phase separation sensor is provided that targets a biomolecular condensate selected from P granule, Germ granule, Lewy bodies, synaptic condensates, stress granule, P bodies, T cell signalosome, crystalline condensates of the lens fibers, and other cytoplasmic condensates or membraneless organelles assembled through liquid-liquid phase separation. In further embodiments one or more phase separation sensor is provide that targets a biomolecular condensate in the nucleus. In an embodiment, nuclear condensates may be selected from Nucleoli, Paraspeckles, Histone Locus Bodies, Cajal Bodies, Heterochromatin, super-enhancer domains. The biomolecular condensate may be an RNA-protein granule or an RNA-containing condensate. In an embodiment, the target condensate protein may be an RNA-binding protein.
In any embodiment of the invention wherein the target condensate or condensate protein is a cytoplasmic condensate or cytoplasmic condensate protein, the phase separation sensor may include one or more nuclear export signal (NES). NES sequences are known and available to one skill in the art. NES sequences described and provided herein include LELLEDLTL (SEQ ID NO: 57) and SGLELLEDLTL (SEQ ID NO: 58). In one such embodiment, the NES prevents nuclear localization and targets the protein or sensor to the cytoplasm. In any embodiment of the invention wherein the target condensate or condensate protein is a nuclear condensate or a condensate or condensate protein located in the nucleus, the phase separation sensor may include one or more nuclear localization signal (NLS), so as to promote or limit localization to the nucleus. In an embodiment, a sensor of the invention lacks a nuclear localization signal and also lacks a nuclear export signal and thereby may function, may be expressed in, or may localize to either of or both of the nucleus and cytoplasm.
In an embodiment of the invention, a phase separation sensor is provided to investigate or assess phase separation of a putative or candidate condensate, including to determine whether a target protein is incorporated in a biomolecular condensate. In an embodiment of the invention, a phase separation sensor is provided to investigate or assess phase separation of a putative or candidate condensate, including to randomly or indirectly characterize the proteins in a putative or candidate condensate. Thus in an embodiment of the invention, a phase separation sensor is designed which generically or relatively non-specifically associates with biomolecular condensates by virtue of ultra-weak interactions and not by target sequence-based derivation. Provided that the interaction is sufficient and the accessory protein label is adequate, a condensate may be generally or generically targeted and tagged or monitored by association with the sensor. In accordance with an embodiment of the invention, a phase separation sensor of the invention may identify, monitor and characterize a biomolecular condensate of previously unknown nature, composition or purpose.
In an embodiment of the invention, a sensor is designed to generally or generically recognize and monitor the phase behavior of an intrinsic disorder protein (IDP) or sequence, including wherein the IDP is predicted to undergo liquid-liquid phase separation. In an embodiment, phase behavior can be monitored by virtue of a tag or label comprised in, provided in or with the sensor.
The invention includes compositions of the phase separation sensors provided herein. The compositions include pharmaceutical compositions, optionally further comprising one or more vehicle, carrier or diluent. In embodiments of the invention, compositions including pharmaceutical compositions, may include one or more of the phase sensors in combination with an agent or compound for a diagnostic or therapeutic purpose or intent. In an embodiment, such compositions may provide targeting or delivery of an agent or compound to a biomolecular condensate, including a target biomolecular condensate.
In another embodiment, the invention provides nucleic acids encoding a phase separation sensor hereof. In an embodiment, a sensor may comprise a nucleic acid sequence, such as an RNA or DNA sequence. DNA molecules comprising the nucleic acids are an embodiment of the invention. Further, a vector comprising the nucleic acids or DNA molecules of the invention is also provided.
In additional embodiments, methods are provided herein based on the characteristics and capabilities of the phase separation sensors. In one such embodiment, a method is provided for targeting a biomolecular condensate in a cell or tissue comprising administering to the cell or tissue or otherwise expressing in the cell or tissue one or more phase separation sensor of the invention.
In an embodiment, a method is provided for targeting a biomolecular condensate in a cell comprising transfecting or transducing the cell with a nucleic acid or with a vector comprising nucleic acid encoding a sensor of the invention or otherwise capable of expressing the sensor of the invention in a cell.
In another embodiment, a method is provided for detecting or visualizing a biomolecular condensate in a cell or tissue comprising administering to the cell or tissue or otherwise expressing in the cell or tissue one or more sensor of the invention as provided herein. In one such method embodiment, the sensor comprises at least one accessory protein comprising a detectable or functional label or marker, or a protein capable of tagging the condensate with a detectable or functional label or marker, including for example by association with or localization in the condensate. In a method embodiment, the sensor comprises at least one accessory protein suitable for tagging the condensate, such as a fluorescent protein, a radioactive dye or label, a protein that creates contrast suitable for electron microscopy, or a protein otherwise capable of tagging the condensate with a detectable or functional label or marker. In a method embodiment, the sensor comprises at least one accessory protein selected from a fluorescent protein, a protein that creates contrast suitable for electron microscopy, or a protein capable of tagging the condensate with a detectable or functional label or marker.
Another method embodiment of the invention is provided in a method for monitoring biomolecular condensates in a cell comprising administering to the cell or otherwise expressing in the cell or tissue one or more sensor described and provided herein wherein the sensor is capable of tagging the condensate with a detectable or functional label or marker. In an embodiment, the sensor is capable of tagging or labeling a protein in the condensate via a chemical interaction or enzymatic reaction. In an embodiment, the sensor is capable of tagging or labeling a protein in the condensate via ultra-weak bonding or by association with or localization in the condensate. In an embodiment, the sensor is capable of tagging the condensate with a detectable or functional label or marker without significantly altering the condensate or any condensate protein. In an embodiment, the sensor is capable of tagging the condensate with a detectable or functional label or marker without altering the condensate or any condensate protein.
A kit for evaluation of biomolecular condensates in cells or tissues is provided in another embodiment of the invention, wherein the kit comprises a phase separation sensor as described and provided herein, a nucleic acid encoding a sensor hereof, or a vector comprising a nucleic acid or otherwise capable of expressing one or more sensor hereof in a cell.
In alternative methods of the invention, the phase separation sensors provided herein may be utilized in monitoring phase separation dynamics. The sensors can monitor the formation of condensates and their disassembly, including in a cell, tissue or organ. In method embodiments, a phase separation sensor can monitor the formation and/or disassembly of a target biomolecular condensate in a cell, tissue or organ, such as in skin.
Further methods embodiments include use and application of one or more phase separation sensor to evaluate or screen compounds, drugs or agents for their effect on a condensate. This is particularly relevant wherein the formation of a condensate, the size or location of a condensate, or the component make up is altered in or associated with a disease or condition, or is involved in a cellular response in an animal, particularly in a human. In one such embodiment, the sensors are utilized in screening for drugs that promote assembly or disassembly of target condensates.
Other objects and advantages will become apparent to those skilled in the art from a review of the following description which proceeds with reference to the following illustrative drawings.
In accordance with the present invention there may be employed conventional molecular biology, microbiology, and recombinant DNA techniques within the skill of the art. Such techniques are explained fully in the literature. See, e.g., Sambrook et al, “Molecular Cloning: A Laboratory Manual” (1989); “Current Protocols in Molecular Biology” Volumes I-III [Ausubel, R. M., ed. (1994)]; “Cell Biology: A Laboratory Handbook” Volumes I-III [J. E. Celis, ed. (1994))]; “Current Protocols in Immunology” Volumes I-III [Coligan, J. E., ed. (1994)]; “Oligonucleotide Synthesis” (M. J. Gait ed. 1984); “Nucleic Acid Hybridization” [B. D. Hames & S. J. Higgins eds. (1985)]; “Transcription And Translation” [B. D. Hames & S. J. Higgins, eds. (1984)]; “Animal Cell Culture” [R. I. Freshney, ed. (1986)]; “Immobilized Cells And Enzymes” [IRL Press, (1986)]; B. Perbal, “A Practical Guide To Molecular Cloning” (1984).
Therefore, if appearing herein, the following terms shall have the definitions set out below.
The amino acid residues described herein are preferred to be in the “L” isomeric form. However, residues in the “D” isomeric form can be substituted for any L-amino acid residue. NH2 refers to the free amino group present at the amino terminus of a polypeptide. COOH refers to the free carboxy group present at the carboxy terminus of a polypeptide. In keeping with standard and recognized polypeptide nomenclature, abbreviations for amino acid residues are shown in the following Table of Correspondence:
It should be noted that all amino-acid residue sequences are represented herein by formulae whose left and right orientation is in the conventional direction of amino-terminus to carboxy-terminus. Furthermore, it should be noted that a dash at the beginning or end of an amino acid residue sequence indicates a peptide bond to a further sequence of one or more amino-acid residues. The above Table is presented to correlate the three-letter and one-letter notations which may appear alternately herein.
A “replicon” is any genetic element (e.g., plasmid, chromosome, virus) that functions as an autonomous unit of DNA replication in vivo; i.e., capable of replication under its own control.
A “vector” is a replicon, such as plasmid, phage or cosmid, to which another DNA segment may be attached so as to bring about the replication of the attached segment.
A “DNA molecule” refers to the polymeric form of deoxyribonucleotides (adenine, guanine, thymine, or cytosine) in its either single stranded form, or a double-stranded helix. This term refers only to the primary and secondary structure of the molecule, and does not limit it to any particular tertiary forms. Thus, this term includes double-stranded DNA found, inter alia, in linear DNA molecules (e.g., restriction fragments), viruses, plasmids, and chromosomes. In discussing the structure of particular double-stranded DNA molecules, sequences may be described herein according to the normal convention of giving only the sequence in the 5′ to 3′ direction along the nontranscribed strand of DNA (i.e., the strand having a sequence homologous to the mRNA).
An “origin of replication” refers to those DNA sequences that participate in DNA synthesis.
A DNA “coding sequence” is a double-stranded DNA sequence which is transcribed and translated into a polypeptide in vivo when placed under the control of appropriate regulatory sequences. The boundaries of the coding sequence are determined by a start codon at the 5′ (amino) terminus and a translation stop codon at the 3′ (carboxyl) terminus. A coding sequence can include, but is not limited to, prokaryotic sequences, cDNA from eukaryotic mRNA, genomic DNA sequences from eukaryotic (e.g., mammalian) DNA, and synthetic DNA sequences. A polyadenylation signal and transcription termination sequence will usually be located 3′ to the coding sequence.
Transcriptional and translational control sequences are DNA regulatory sequences, such as promoters, enhancers, polyadenylation signals, terminators, and the like, that provide for the expression of a coding sequence in a host cell.
A “promoter sequence” is a DNA regulatory region capable of binding RNA polymerase in a cell and initiating transcription of a downstream (3′ direction) coding sequence. For purposes of defining the present invention, the promoter sequence is bounded at its 3′ terminus by the transcription initiation site and extends upstream (5′ direction) to include the minimum number of bases or elements necessary to initiate transcription at levels detectable above background. Within the promoter sequence will be found a transcription initiation site (conveniently defined by mapping with nuclease S1), as well as protein binding domains (consensus sequences) responsible for the binding of RNA polymerase. Eukaryotic promoters will often, but not always, contain “TATA” boxes and “CAT” boxes. Prokaryotic promoters contain Shine-Dalgarno sequences in addition to the −10 and −35 consensus sequences.
An “expression control sequence” is a sequence, including a DNA sequence, that controls and regulates the transcription and translation of another DNA sequence. A coding sequence is “under the control” of transcriptional and translational control sequences in a cell when RNA polymerase transcribes the coding sequence into mRNA, which is then translated into the protein encoded by the coding sequence.
A nucleic acid sequence, including a DNA sequence is “operatively linked” to an expression control sequence when the expression control sequence controls and regulates the transcription and translation of that nucleic acid or DNA sequence. The term “operatively linked” may include having an appropriate start signal (e.g., ATG) in front of the nucleic acid or DNA sequence to be expressed and maintaining the correct reading frame to permit expression of the nucleic acid or DNA sequence under the control of the expression control sequence and production of the desired product encoded by the nucleic acid or DNA sequence.
A “signal sequence” can be included before the coding sequence. This sequence encodes a signal peptide, N-terminal to the polypeptide, that communicates to the host cell to direct the polypeptide to the cell surface or secrete the polypeptide into the media, and this signal peptide is clipped off by the host cell before the protein leaves the cell. Signal sequences can be found associated with a variety of proteins native to prokaryotes and eukaryotes.
The term “oligonucleotide,” as used herein in referring to the probe of the present invention, is defined as a molecule comprised of two or more ribonucleotides, preferably more than three. Its exact size will depend upon many factors which, in turn, depend upon the ultimate function and use of the oligonucleotide.
The term “primer” as used herein refers to an oligonucleotide, produced synthetically, which is capable of acting as a point of initiation of synthesis when placed under conditions in which synthesis of a primer extension product, which is complementary to a nucleic acid strand, is induced, i.e., in the presence of nucleotides and an inducing agent such as a DNA polymerase and at a suitable temperature and pH. The primer may be single-stranded and must be sufficiently long to prime the synthesis of the desired extension product in the presence of the inducing agent. The exact length of the primer will depend upon many factors, including temperature, source of primer and use of the method. For example, for diagnostic applications, depending on the complexity of the target sequence, the oligonucleotide primer typically contains 15-25 or more nucleotides, although it may contain fewer nucleotides.
The primers herein are selected to be “substantially” complementary to different strands of a particular target DNA sequence. This means that the primers must be sufficiently complementary to hybridize with their respective strands. Therefore, the primer sequence need not reflect the exact sequence of the template. For example, a non-complementary nucleotide fragment may be attached to the 5′ end of the primer, with the remainder of the primer sequence being complementary to the strand. Alternatively, non-complementary bases or longer sequences can be interspersed into the primer, provided that the primer sequence has sufficient complementarity with the sequence of the strand to hybridize therewith and thereby form the template for the synthesis of the extension product.
A cell has been “transformed” or “transduced” by exogenous or heterologous nucleic acid or DNA when such nucleic acid or DNA has been introduced inside the cell. The transforming or transducing nucleic acid or DNA may or may not be integrated (covalently linked) into chromosomal DNA making up the genome of the cell. In prokaryotes, yeast, and mammalian cells for example, the transforming or transducing nucleic acid or DNA may be maintained on an episomal element such as a plasmid. With respect to eukaryotic cells, a stably transformed or transduced cell is one in which the transforming or transducing nucleic acid or DNA has become integrated into a chromosome or otherwise incorporated so that it is inherited by daughter cells through chromosome replication. This stability is demonstrated by the ability of the eukaryotic cell to establish cell lines or clones comprised of a population of daughter cells containing the transforming or transducing nucleic acid or DNA.
Two DNA sequences are “substantially homologous” when at least about 75% (preferably at least about 80%, and most preferably at least about 90 or 95%) of the nucleotides match over the defined length of the DNA sequences. Sequences that are substantially homologous can be identified by comparing the sequences using standard software available in sequence data banks, or in a Southern hybridization experiment under, for example, stringent conditions as defined for that particular system. Defining appropriate hybridization conditions is within the skill of the art. See, e.g., Maniatis et al., supra; DNA Cloning, Vols. I & II, supra; Nucleic Acid Hybridization, supra.
It should be appreciated that also within the scope of the present invention are nucleic acids including DNA sequences encoding a phase separation sensor hereof which code for a phase separation sensor having the same amino acid sequence as provided herein, including in the Tables and Examples or sequences provided, but which are degenerate to one another. By “degenerate to” is meant that a different three-letter codon is used to specify a particular amino acid. It is well known in the art that the following codons can be used interchangeably to code for each specific amino acid (RNA codons are provided, however as would be recognized by one skilled in the art, for a DNA sequence a T should be substituted for a U in the codon sequence):
The phase separation sensors of the invention extend to those proteins having the amino acid sequence data, characteristics and sequences described herein and presented in the Tables and Examples herein, and the profile of activities set forth herein and in the Claims. Accordingly, proteins displaying substantially equivalent activity are likewise contemplated. Further, proteins displaying somewhat altered activity but remaining active and capable of targeting or associating with biomolecular condensates are likewise contemplated. These modifications may be deliberate, for example, such as modifications obtained through site-directed mutagenesis, or through random mutagenesis, or may be accidental, such as those obtained through mutations in hosts. Also, the phase separation sensors, including the specific sensors exemplified by noted sequence herein, are intended to include within their scope proteins specifically recited herein as well as substantially homologous analogs and variants, including allelic variations, particularly wherein the analogs or variants remain active and capable of targeting or associating with biomolecular condensates, particularly with a target biomolecular condensate.
Mutations can be made in the phase separation sequences and nucleic acid sequences provided and contemplated herein such that a particular codon is changed to a codon which codes for a different amino acid. Such a mutation is generally made by making the fewest nucleotide changes possible. A substitution mutation of this sort can be made to change an amino acid in the resulting protein in a non-conservative manner (i.e., by changing the codon from an amino acid belonging to a grouping of amino acids having a particular size or characteristic to an amino acid belonging to another grouping) or in a conservative manner (i.e., by changing the codon from an amino acid belonging to a grouping of amino acids having a particular size or characteristic to an amino acid belonging to the same grouping). Such a conservative change generally leads to less change in the structure and function of the resulting protein or peptide. A non-conservative change is more likely to alter the structure, activity or function of the resulting protein. The present invention should be considered to include sequences containing conservative changes which do not significantly alter the activity or binding characteristics of the resulting protein or peptide.
The following provides one example of various groupings of amino acids:
Amino acids with nonpolar R groups: Alanine, Valine, Leucine, Isoleucine, Proline, Phenylalanine, Tryptophan, Methionine; Amino acids with uncharged polar R groups: Glycine, Serine, Threonine, Cysteine, Tyrosine, Asparagine, Glutamine; Amino acids with charged polar R groups (negatively charged at pH 6.0): Aspartic acid, Glutamic acid; Basic amino acids (positively charged at pH 6.0): Lysine, Arginine, Histidine (at pH 6.0).
Another grouping may be those amino acids with phenyl groups: Phenylalanine, Tryptophan, Tyrosine
Another grouping may be according to molecular weight (i.e., size of R groups): Glycine 75; Alanine 89; Serine 105; Proline 115; Valine 117; Threonine 119; Cysteine 121; Leucine 131; Isoleucine 131; Asparagine 132; Aspartic acid 133; Glutamine 146; Lysine 146; Glutamic acid 147; Methionine 149; Histidine(at pH 6.0) 155; Phenylalanine 165; Arginine 174; Tyrosine 181; Tryptophan 204.
Particularly preferred substitutions are:
Amino acid substitutions may also be introduced to substitute an amino acid with a particularly preferable property. For example, a Cys may be introduced a potential site for disulfide bridges with another Cys. A His may be introduced as a particularly “catalytic” site (i.e., His can act as an acid or base and is the most common amino acid in biochemical catalysis). Pro may be introduced because of its particularly planar structure, which induces P-turns in the protein's structure.
Two amino acid sequences are “substantially homologous” when at least about 70% of the amino acid residues (preferably at least about 80%, and most preferably at least about 90 or 95%) are identical, or represent conservative substitutions.
In embodiments hereof, variant peptide sequences having substantial identity to the sequences provided herein are contemplated. Variants having different amino acid sequences, wherein the sequence has at least 75%, at least 80%, at least 85%, at least 90%, at least 95% amino acid sequence identity to a sequence provided herein are included in the invention. Variants are and can be selected for maintaining the purpose and characteristics of the parent sequence from which they are variant. Thus, suitable variant artificial client protein sequences will retain the characteristic(s) of intrinsic disorder and capable of engaging in ultra-weak phase-separation specific amino acid interactions with one or more component protein, particularly one or more target component protein in the condensate.
A “heterologous” region of a nucleic acid or of a DNA construct is an identifiable segment of nucleic acid, including DNA, within a larger nucleic acid or DNA molecule that is not found in association with the larger molecule in nature. Thus, when the heterologous region encodes a mammalian gene, the gene will usually be flanked by DNA that does not flank the mammalian genomic DNA in the genome of the source organism. Another example of a heterologous coding sequence is a construct where the coding sequence itself is not found in nature (e.g., a cDNA where the genomic coding sequence contains introns, or synthetic sequences having codons different than the native gene).
An “antibody” is any immunoglobulin, including antibodies and fragments thereof, that binds a specific epitope. The term encompasses polyclonal, monoclonal, and chimeric antibodies, the last mentioned described in further detail in U.S. Pat. Nos. 4,816,397 and 4,816,567.
An “antibody combining site” is that structural portion of an antibody molecule comprised of heavy and light chain variable and hypervariable regions that specifically binds antigen.
The phrase “antibody molecule” in its various grammatical forms as used herein contemplates both an intact immunoglobulin molecule and an immunologically active portion of an immunoglobulin molecule.
Exemplary antibody molecules are intact immunoglobulin molecules, substantially intact immunoglobulin molecules and those portions of an immunoglobulin molecule that contains the paratope, including those portions known in the art as Fab, Fab′, F(ab′)2 and F(v), which portions are preferred for use in the therapeutic methods described herein. Fab and F(ab′)2 portions of antibody molecules are prepared by the proteolytic reaction of papain and pepsin, respectively, on substantially intact antibody molecules by methods that are well-known.
The phrase “monoclonal antibody” in its various grammatical forms refers to an antibody having only one species of antibody combining site capable of immunoreacting with a particular antigen. A monoclonal antibody thus typically displays a single binding affinity for any antigen with which it immunoreacts. A monoclonal antibody may therefore contain an antibody molecule having a plurality of antibody combining sites, each immunospecific for a different antigen; e.g., a bispecific (chimeric) monoclonal antibody.
The phrase “pharmaceutically acceptable” refers to molecular entities and compositions that are physiologically tolerable and do not typically produce an allergic or similar untoward reaction, such as gastric upset, dizziness and the like, when administered to a human.
The phrase “therapeutically effective amount” is used herein to mean an amount sufficient to prevent, and preferably reduce by at least about 30 percent, more preferably by at least 50 percent, more preferably by at least 70 percent, most preferably by at least 90 percent, a clinically significant change in the mitotic or enzymatic activity of a target cell, or alter or modify a feature of pathology such as a characteristic of one or more biomolecular condensate or target protein or component as may attend its presence and activity.
Notably, previous conceptions of the assembly of multidomain macromolecules and biological macromolecules, including those predominating in biomolecular condensates, have focused largely on the networks created by strong, specific interactions, without consideration of the extremely weak, nonspecific interactions that govern solubility, and how they would be affected by the assembly process. In fact, as demonstrated herein, considering the coupling between the strong and weak interactions, and therefore the ability of multivalency to promote phase separation, is essential to understanding the behavior of and being in a position to specifically assess, target and modulate or modify multivalent biological molecules. Thus, in some systems, such as disordered proteins, the important or relevant interactions may be in an intermediate spot on the spectrum between strong, stereospecific contacts and weak, nonspecific contacts. As disordered polymers become less soluble and as they grow, the presence of multiple points of contact between molecules provides an important driving force for phase separation. In this invention, the more weak and nonspecific interaction in biomolecular condensates are exploited in designing and applying and enabling novel phase separation sensors comprising an accessory and an artificial client sequence whereby the sensors are incorporated via the more weak and nonspecific (yet enriched across phase-separating proteins) interaction in biomolecular condensates.
In a general embodiment, the present invention relates to biomolecular condensates or membraneless compartments in cells, and the ability to detect, target, monitor, assess and modulate biomolecular condensates, including in vitro in cells and in vivo in animals. Novel phase separation sensors are provided which are uniquely capable of targeting or associating with a biomolecular condensate, particularly a specific and target biomolecular condensate by design. The sensors comprise at least two domains, wherein a first domain includes one or more accessory protein or molecule and a second domain includes an artificial client protein or intrinsically disordered sequence. The artificial client protein or intrinsically disordered sequence is uniquely capable of interacting with one or more component protein in a target biomolecular condensate.
Phase separation sensors of the invention include wherein the sensor is capable of targeting or associating with a biomolecular condensate and wherein the sensor comprises at least two protein domains. The first domain comprises one or more accessory protein or molecule. The first domain may thus include one or more subdomains or one or more proteins or peptides. The second domain comprises an artificial client protein having intrinsic disorder and capable of engaging in ultra-weak phase separation-specific amino acid interactions with one or more component protein in the condensate.
Phase separation sensors thus comprise at least one accessory protein or molecule domain and an additional domain comprising an artificial client protein having intrinsic disorder. As described and provided herein, two-domain sensors were designed, constructed and evaluated based on a two-domain structure, with the sensors having a first domain comprising a fluorescent protein or marker (the at least one accessory protein or molecule) and a second domain comprising an IDP sensing domain or an artificial client protein sequence. A general two-domain sensor architecture is as follows:
Two-domain: [Fluorescent marker/Enzyme/Cargo]-[Optional Linker]-[IDP sensing domain]
In further studies, multi-domain sensors were designed, constructed and evaluated based on a three-domain structure (see for example
Three-domain: [Fluorescent marker]-[Optional linker]-[Enzyme/Cargo]-[Optional Linker]-[IDP sensing domain]
In particular, the phase separation sensor lacks independent phase separation behavior when expressed in the cell, including whereby the sensor associates with the biomolecular condensate without disrupting the condensate. Thus, the sensor does not independently form a condensate, but can sufficiently interact with a condensate target or target protein or sequence so as to be incorporated in a condensate.
The two or more domains comprising the sensor of the invention may be directly linked or may be separated in each or any instance by one or more linker sequence. In a particular embodiment, one or more accessory protein(s) and/or the accessory protein(s) and the artificial client protein are separated by a flexible linker sequence. The flexible linker sequence may comprise between 2 and 10, 10 and 20, 20 and 40, 2 and 20, 2 and 30, 2 and 40, up to 10, up to 20, up to 30, up to 40 amino acid residues. The flexible linker sequence may comprise between 2 and 10 amino acid residues. In a preferred embodiment, one or more short flexible linkers of 2 to 10 residues in length is utilized. In an embodiment, the linker sequence lacks charged residues. In an embodiment, the linker sequence contains charged residues. In an embodiment, the linker sequence contains charged residues and is zwitterionic, having equal numbers of positive-charged and negatively-charged residues. In exemplary embodiments and sequences hereof, linkers of 2, 4 and 10 residues are utilized. Exemplary linker sequences are provided herein, including in the Examples and alternative sequences are well known to or could be designed by those of skill in the art. For example, Chen et al describes various useful and flexible linkers (Chen X et al (2013) Adv Drug Deliv Rev 65(10):1357-1369). In embodiments, linker sequences GSPG (SEQ ID NO: 59) and/or GRSDGVPGSG (SEQ ID NO: 60), as examples, are utilized.
In a particular embodiment, a phase separation sensor is provided wherein the target component protein is a filaggrin family protein or paralog protein. In one such embodiment, the sensor artificial client protein sequence is derived from or based on a filaggrin protein sequence. The artificial client protein sequence may be derived from or based on human filaggrin protein sequence or on a mouse filaggrin protein sequence. The artificial client protein sequence may be derived from or based on a filaggrin protein sequence provided in TABLE 1, or the mouse of human filaggrin protein as provided in Table 1 or set out in SEQ ID NO: 1 or in SEQ ID NO: 56. Notably, the mouse and human filaggrin sequences when compared directly have about 34% identity in amino acid sequence. In particular, the artificial client protein sequence may be derived from or based on a filaggrin protein repeat component sequence. Exemplary filaggrin-based or filaggrin-targeting phase separation sensors are provided herein, including in TABLE 3 and in Examples 1 and 2 hereof. Phase separation sensors include Sensor A (SEQ ID NO:26), Sensor B (SEQ ID NO:27), Apex2-Sensor A (SEQ ID NO:50), Apex2-Sensor B (SEQ ID NO:51), Sensor C (SEQ ID NO:52) and Sensor D (SEQ ID NO:53).
Phase separation sensors are contemplated and provided herein that are directed to one or more biomolecular condensate in vivo in an animal. The sensor(s) of the invention may target or associate with one or more biomolecular condensate in the cytoplasm of a cell or in the nucleus of a cell. In an exemplary embodiment, the condensate is a keratohyalin granule (KG) in the epidermis or in one or more skin cell. In embodiments, one or more phase separation sensor is provided that targets a biomolecular condensate selected from P granule, Germ granule, Lewy bodies, synaptic condensates, stress granule, P bodies, T cell signalosome, crystalline condensates of the lens fibers, and other cytoplasmic condensates or membraneless organelles assembled through liquid-liquid phase separation. In further embodiments one or more phase separation sensor is provide that targets a biomolecular condensate in the nucleus. In an embodiment, nuclear condensates may be selected from Nucleoli, Paraspeckles, Histone Locus Bodies, Cajal Bodies, Heterochromatin. The biomolecular condensate may be an RNA-protein granule or an RNA-containing condensate. In an embodiment, the target condensate protein may be an RNA-binding protein.
Wherein the target condensate or condensate protein is a cytoplasmic condensate or cytoplasmic condensate protein, the phase separation sensor may include one or more nuclear export signal (NES). In particular, the NES prevents nuclear localization and targets the protein or sensor to the cytoplasm. Wherein the target condensate or condensate protein is a nuclear condensate or condensate or condensate protein located in the nucleus, the phase separation sensor may include one or more nuclear localization signal (NLS), so as to promote or limit localization to the nucleus. Exemplary NES and NLS sequences are provided herein and recognized and known in the art. Alternatively, a sensor of the invention may lack a nuclear localization signal and also lack a nuclear export signal and thereby may function, be expressed in or localize to either of or both of the nucleus and cytoplasm.
In an embodiment of the invention, a phase separation sensor is provided to investigate or access phase separation of a putative or candidate condensate, including to determine whether a target protein is incorporated in a biomolecular condensate. Thus, a phase separation sensor is provided to investigate or access phase separation of a putative or candidate condensate, including to randomly or indirectly characterize the proteins in a putative or candidate condensate. Thus, a phase separation sensor is designed which generically or relatively non-specifically associates with biomolecular condensates by virtue of ultra-weak interactions and not by target sequence-based derivation. Provided that the interaction is sufficient and the accessory protein label is adequate, a condensate may be generally or generically targeted and tagged or monitored by association with the sensor. In accordance with an embodiment of the invention, a phase separation sensor of the invention may identify, monitor and characterize a biomolecular condensate of previously unknown nature, composition or purpose. In an embodiment of the invention, a sensor is designed to generally or generically recognize and tag an intrinsic disorder protein (IDP) or sequence, including wherein the IDP is predicted to undergo liquid-liquid phase separation.
In designing and implementing a sensor for targeting or associating an unknown or non-specified biomolecular condensate, the artificial client protein sequence is designed to generically associate with one or more intrinsically disordered protein sequence, or an intrinsically disordered repeat by virtue of weak and non-specific interactions over the repeat or IDR sequence, and by selecting amino acids or a compositional character that will permit non-specific weak interactions.
The invention provides nucleic acids encoding a phase separation sensor hereof. In an embodiment, a sensor may comprise a nucleic acid sequence, such as an RNA or DNA sequence. DNA molecules comprising the nucleic acids are an embodiment of the invention. Further, a vector comprising the nucleic acids or DNA molecules of the invention is also provided.
An important embodiment of the phase separation sensors of the invention is that the domain component comprising an artificial client protein, by virtue of its ability to interact and associate with target component sequences, such as via intrinsic disordered sequence, thereby delivers or brings along the or a other domain component, particularly one or more accessory protein to the biomolecular condensate or within associative distance of one or more target component sequences. The other domain of the instant phase separation sensors may comprise one or more accessory protein, peptide or molecule. The accessory protein may provide a label or marker, such as a fluorescent protein, such that the biomolecular condensate can be visualized or monitored. Alternatively, or in addition, the accessory protein may provide an activity, such as an enzymatic activity, to or in the vicinity of the biomolecular condensate.
The invention contemplates a sensor molecule or protein which provides an active, useful, visible or detectable label or marker, particularly via one or more accessory protein or molecule in a first domain. The invention contemplates a sensor molecule or protein which provides a function, enzyme or capability. Thus, the sensor comprises in a first domain, or in one or more embodiment or portion of a first domain, one or more accessory protein wherein at least one accessory protein provides a detectable or functional label.
In embodiments of the invention, the at least one accessory protein is a fluorescent protein. The fluorescent protein may be selected from a protein known in the art, provided that the fluorescent protein does not detract from or interfere with the sensors ability to target or associate with a target condensate component protein or the biomolecular condensate. Numerous suitable and applicable fluorescent proteins are known and available in the art. The fluorescent protein may be selected from one or more of a blue/UV protein, a cyan protein, a green protein, a yellow protein, an orange protein, a red protein, a far-red protein, a near-IR protein, a long stokes shift protein, a photactivatible protein, a photoconvertible protein and a photo switchable protein. Examples of blue/UV fluorescent proteins include TagBFP and Sapphire. Examples of Cyan proteins include ECFP and derivatives thereof, Cerulean, TagCFP and mTFP1. Examples of green proteins include GFP and derivatives thereof, Emerald, monomeric azami green. Examples of yellow proteins include EYFP and derivatives thereof, and examples of orange proteins include monomeric kusabira orange and derivatives thereof. Red fluorescent proteins are known in the art and include for example RFP and derivatives thereof, mRaspberry, mCherry, mStrawberry, mRuby. The fluorescent protein may particularly be a GFP protein. In an embodiment, the GFP protein is a GFP protein with positively-charged amino acids exposed on the protein surface. The fluorescent protein may be a supercharged protein, wherein the protein sequence is altered, mutated or modified to have additional positively charged residues. For example, the GFP protein may be a supercharged GFP protein. Supercharged GFP proteins are described for instance in US 2011/0112040A1 and in U.S. Pat. No. 9,221,886. In a preferred embodiment and aspect, the GFP protein may be +15GFP.
The invention contemplates and includes wherein more than one phase separation sensor is introduced in a cell, whereby distinct sensors target different component proteins and/or carry different accessory proteins, such as different fluorescent proteins, such that multiple and distinct components of a biomolecular condensate are targeted and can be monitored or evaluated simultaneously.
Other relevant and useful accessory proteins in accordance with the invention are enzymes. One or more enzyme may be selected from a protease, nuclease, ligase, peroxidase, phosphatase, kinase and protein capable of modifying a protein or nucleic acid.
One or more accessory protein may comprise a label. In an embodiment, the label may include a radioactive element. In one such embodiment, the sensor may thereby introduce a label or readioactive element into a cellular sample. The label or element may then be examined by known techniques, which may vary with the nature of the label attached. In the instance where a radioactive label is used, it may be selected from isotopes such as the isotopes 3H, 14C, 32P, 35S, 36Cl, 51Cr, 51Co, 58Co 59Fe, 90Y 125I, 131I, and 186Re.
Enzyme labels are likewise useful, and can be detected by any of the presently utilized colorimetric, spectrophotometric, fluorospectrophotometric, amperometric or gasometric techniques. The enzyme is conjugated to the selected particle by reaction with bridging molecules such as carbodiimides, diisocyanates, glutaraldehyde and the like. Many enzymes which can be used in these procedures are known and can be utilized. The preferred are peroxidase, β-glucuronidase, β-D-glucosidase, β-D-galactosidase, urease, glucose oxidase plus peroxidase and alkaline phosphatase.
In accordance with a further embodiment, at least one accessory protein may be capable of tagging one or more biomolecular condensate component with a detectable or functional molecule, peptide or marker. The examples provided herein exemplify sensors wherein one or more accessory protein is a peroxidase, including wherein the enzyme is capable of biotinylating one or more target component protein, for instance within a certain reaction distance from the enzyme protein molecule. The peroxidases Apex2 and BioID, for example, have been utilized.
In an embodiment of the invention, the sensor is a functionalized sensor and at least one accessory protein is capable of modifying a target component protein in the condensate. Said accessory protein may be capable of modifying condensate components, through covalent or non-covalent crosslinking of condensate components, to alter the material properties of the condensate. Crosslinking may be triggered by exogenous or endogenous stimuli to cells containing said condensates and accessory proteins.
In another embodiment, the sensor is a functionalized sensor and at least one accessory protein is capable of delivering a compound or agent to the condensate or to a target component protein in the condensate.
Experts in the art will further recognize, for example, that the peroxidase Apex2 domain in the phase separation sensors provided herein may be further modified to include other or alternative compounds or agents, such as enzymes or proteins of interest for example, so as to exploit said phase separation sensors as vehicles that deliver cargo of interest to biomolecular condensates. Said cargo may include but is not limited to fluorescent proteins, proteases, nucleases, ligases, peroxidases, phosphatases, kinases and other proteins capable of modifying proteins and nucleic acids.
In accordance with the invention the artificial client protein aspect or domain of the phase separation sensor is an intrinsically disordered protein having low complexity sequence. Thus, the artificial client protein contains one or more disordered region that provides one or more or multiple weakly adhesive sequence elements. In an embodiment, the artificial client protein sequence lacks recognized protein three dimensional structural aspects. In an embodiment, the artificial client protein sequence contains repeated sequence elements. In an embodiment, the artificial client protein sequence contains low complexity sequence elements. In a particular such embodiment, the repeated sequence or low complexity elements provide basis for multivalent weakly adhesive intermolecular interactions.
In accordance with another embodiment of the invention, the sensor artificial client protein sequence comprises similar compositional bias or comprises related sequence patterns with low sequence identity to amino acid sequence of a naturally-occurring intrinsically disordered protein or protein region within a larger protein. This may be achieved in certain embodiments by reordering or by shuffling or randomizing the sequence of a naturally-occurring intrinsically disordered protein. In one such embodiment this similar compositional bias or related sequence patterns contributes to or is responsible for driving assembly of said biomolecular condensate.
In an embodiment of the invention, the phase separation sensor's artificial client protein sequence is related to the native or target intrinsic disordered protein (IDP) sequence by reversing its amino acid sequence. In accordance with this embodiment, the phase separation sensor sequence artificial client protein sequence is generated by reading the original, native or target IDP sequence in the non-natural C-terminal to N-terminal direction. In an embodiment, the intrinsically disordered protein sequence is reversed so as to read and be presented C terminal to N terminal in sequence order. This retains the amino acid composition but completely alters the sequence as presented per se.
In another embodiment, the artificial client protein sequence is enriched in a limited number of amino acid types. In a further embodiment, the artificial client protein sequence is enriched in a charged residues such as lysine, arginine, glutamate and aspartate. In view of the lack of sequence diversity in the artificial client protein, the sequence may be characterized by or may contain multiple short sequence repeat tracks, poly-single amino acid tracts, sequence blocks of positive or negative charge. These repetitive motifs may then contribute to interactions with the biomolecular condensate target protein(s).
A main criterion and characteristic of the artificial client protein sequence in a phase separation sensor of the invention is intrinsic disorder. It is important to note that many proteins have multi-domain architecture so that some domains are well-folded, and thus have defined/known secondary structure such as helices, sheets, etc, and some domains are intrinsically-disordered regions (IDRs). IDRs within proteins containing other domains are often responsible for their overall phase separation behavior. The phase separation sensors of the invention target the properties of those IDRs. In the case of filaggrin (FLG) (and its paralogs), for example, the overall protein is greater than 4000 amino acid residues in length and of those only the first 100 amino acids are part of a folded domain (so-called S100 domain composed of two EF-hand motifs). In other proteins, the relative size of the folded or structures and disordered domains varies widely.
Intrinsically disordered proteins (IDPs) lack stable tertiary and/or secondary structures under physiological conditions in vitro. They are highly abundant in nature and their functional repertoire complements the functions of ordered proteins. IDPs are involved in regulation, signaling, and control, where binding to multiple partners and low-specificity/low-affinity interactions play a crucial role. Intrinsic disorder is a unique structural feature that enables IDPs to participate in both one-to-many and many-to-one signaling. Numerous IDPs are associated with human diseases, including cancer, cardiovascular disease, amyloidoses, neurodegenerative diseases, and diabetes. Overall, intriguing interconnections among intrinsic disorder, cell signaling, and human diseases suggest that protein conformational diseases may result not only from protein misfolding, but also from misidentification, missignaling, and unnatural or nonnative folding. IDPs, such as α-synuclein, tau protein, p53, and BRCA1, are attractive targets for drugs modulating protein-protein interactions. From these and other examples, novel strategies for drug discovery based on IDPs are of interest and being developed (Uversky V N et al (2008) Ann Rev Biophysics 37:215-246).
The global, multi-level simplicity of IDPs/IDRs can be correlated with the character and peculiarities of their amino acid sequences, which are depleted in order-promoting residues (Trp, Cys, Ile, Val, Asn, and Leu) and enriched in disorder-promoting residues (Arg, Pro, Gln, Gly, Glu, Ser, Ala, and Lys) and commonly contain repeats (Radivojac P et al Biophys J. (2007) 92:1439-56, doi: 10.1529/biophysj.106.094045; Williams R M et al Pac Symp Biocomput. (2001) 2001:89-100; Romero P et al Proteins (2001) 42:38-48; Garner E et al Genome Inform Ser Workshop Genome Inform (1998) 9:201-13; Jorda J et al FEBS J (2010) 277:2673-82; Darling A L et al Molecules (2017) 22:27, doi: 10.3390/molecules22122027). Therefore, IDPs/IDRs are characterized by the reduced informational content of their amino acid sequences, and their amino acid alphabet is decreased in comparison with the alphabet utilized in the amino acid sequences of ordered domains and proteins. The behavior of an IDP as a highly frustrated system that does not possess a singular folded state is reflected in its free energy landscape, which is relatively flat and simple and is sensitive to different environmental changes that can modify the landscape in several different ways, lowering some energy minima while raising some energy barriers. This explains the conformational plasticity of an IDP/IDR, its extreme sensitivity to changes in the environment, its ability to interact with multiple different partners, and consequently to fold in different ways. This is then directly related to the remarkable multifunctionality of disordered proteins that are able to control, regulate, interact with, as well as be controlled and regulated by, a plethora of structurally unrelated partners (Uversky V N et al (2019) Front Phys 7(Article 10), doi:10.3389/fphy.2019.00010).
Despite their lack of a stable structure, IDPs/IDRs are involved in a multitude of crucial biological functions related to regulation, recognition, signaling, and control, where binding to multiple partners and high-specificity/low-affinity interactions plays a crucial role. Furthermore, intrinsic disorder is a unique structural feature that enables IDPs/IDRs to participate in both one-to-many and many-to-one signaling. Since they serve as general regulators of various cellular processes, IDPs/IDRs themselves are tightly controlled, however, when misexpressed, misprocessed, mismodified, or dysregulated, IDPs/IDRs are prone to engage in promiscuous, often unwanted interactions and, thus, are associated with the development of various pathological states. In fact, many human cancer-related proteins, as well as many proteins associated with neurodegeneration, diabetes, cardiovascular disease, amyloidosis, and genetic diseases, are either intrinsically disordered or contain long IDRs (Iakoucheva L M et al J Mol. Biol (2002) 323:573-584; Uversky VN Front Biosci (2014) 19:181-258; Uversky VN Front Biosci (2009) 14:5188-5238; Du Z et al Int J Mol Sci (2017) 18:10; Cheng Y et al Biochemistry (2006) 45:10448-10460; Uversky VN Curr Alzheimer Res (2008) 5:260-287; Midic U et al BMC Genomics (2009)10:1,S12). Intrinsically disordered proteins and their roles and relevance in chronic diseases is reviewed in Kulkarni P and Uversky V N (Kulkarni P and Uversky V N (2019) Biomolecules 9,147, doi:10.3390/biom9040147).
As described in the examples, phase separation sensors were designed, produced and evaluated wherein the target component protein is a filaggrin family protein or paralog protein. In certain sensors, the artificial client protein sequence was derived from or based on a filaggrin protein sequence. Artificial client protein sequences were derived from or based on human filaggrin protein sequence and on a mouse filaggrin protein sequence. Exemplary artificial client proteins based from filaggrin sequence or designed to target or associate with filaggrin protein-containing biomolecular condensates include those provided in any of SEQ ID NOs: 17-21. The sensors designed based on human filaggrin sequence were effective and active in targeting and associating with filaggrin protein in biomolecular condensates in human cells and in vivo in mice. Alternative artificial client protein sequences derived from or based on a filaggrin protein repeat component sequence, including from any of the filaggrin homologs and paralogs is contemplated and provided in the invention. Reference is made to TABLE 1 and the sequences provided and referred to therein, including mouse and human filaggrin sequences, which provides alternative FLG and paralog sequences known and provided in the art.
The invention includes compositions of the phase separation sensors provided herein. The compositions include pharmaceutical compositions, optionally further comprising one or more vehicle, carrier or diluent. The present invention further contemplates therapeutic compositions or pharmaceutical compositions useful in practicing the methods of this invention, particularly in vivo or ex vivo and in mammals or humans. A subject therapeutic composition or pharmaceutical composition includes, in admixture, a pharmaceutically acceptable excipient (such as a carrier) and one or more phase separation sensor as described herein as an active ingredient.
The preparation of therapeutic compositions or pharmaceutical compositions which contain polypeptides, analogs or active fragments as active ingredients is well understood in the art. Such compositions may be prepared as injectables, either as liquid solutions or suspensions, however, solid forms suitable for solution in, or suspension in, liquid prior to injection can also be prepared. The preparation can also be emulsified. The active ingredient is often mixed with excipients which are pharmaceutically acceptable and compatible with the active ingredient. Suitable excipients are, for example, water, saline, dextrose, glycerol, ethanol, or the like and combinations thereof. In addition, if desired, the composition can contain minor amounts of auxiliary substances such as wetting or emulsifying agents, pH buffering agents which enhance the effectiveness of the active ingredient.
One or more phase separation sensor can be formulated into the therapeutic composition or pharmaceutical composition as neutralized pharmaceutically acceptable salt forms. Pharmaceutically acceptable salts include the acid addition salts (formed with the free amino groups of the polypeptide or antibody molecule) and which are formed with inorganic acids such as, for example, hydrochloric or phosphoric acids, or such organic acids as acetic, oxalic, tartaric, mandelic, and the like. Salts formed from the free carboxyl groups can also be derived from inorganic bases such as, for example, sodium, potassium, ammonium, calcium, or ferric hydroxides, and such organic bases as isopropylamine, trimethylamine, 2-ethylamino ethanol, histidine, procaine, and the like.
The therapeutic or pharmaceutical phase separation sensor-containing compositions may be administered intravenously or intramuscularly in one embodiment, as by injection of a unit dose, for example. In another embodiment, they may be injected subcutaneously. In another embodiment, they may be administered topically through a disrupted skin barrier. Any suitable form of recognized administration may be utilized. The term “unit dose” when used in reference to a therapeutic composition of the present invention refers to physically discrete units suitable as unitary dosage for humans, each unit containing a predetermined quantity of active material calculated to produce the desired therapeutic effect in association with the required diluent; i.e., carrier, or vehicle.
Another feature of this invention is the expression of the nucleic acids, including DNA sequences, encoding one or more phase separation sensor disclosed herein. As is well known in the art, nucleic acid sequences or DNA sequences may be expressed by operatively linking them to an expression control sequence in an appropriate expression vector and employing that expression vector to transform an appropriate unicellular host.
A wide variety of host/expression vector combinations may be employed in expressing the DNA sequences of this invention. Useful expression vectors, for example, may consist of segments of chromosomal, non-chromosomal and synthetic DNA sequences. Suitable vectors include derivatives of SV40 and known bacterial plasmids; phage DNAs, e.g., the numerous derivatives of phage lambda, and other phage DNA, e.g., M13 and filamentous single stranded phage DNA; yeast plasmids such as the 2 plasmid or derivatives thereof; vectors useful in eukaryotic cells, such as vectors useful in insect or mammalian cells; vectors derived from combinations of plasmids and phage DNAs, such as plasmids that have been modified to employ phage DNA or other expression control sequences; and the like. Any of a wide variety of expression control sequences—sequences that control the expression of a DNA sequence operatively linked to it—may be used in these vectors to express the DNA sequences of this invention. A wide variety of unicellular host cells are also useful in expressing the DNA sequences of this invention. These hosts may include well known eukaryotic and prokaryotic hosts, such as strains of E. coli, Pseudomonas, Bacillus, Streptomyces, fungi such as yeasts, and animal cells, insect cells, and human cells and plant cells in tissue culture.
In selecting an expression control sequence, a variety of factors will normally be considered. These include, for example, the relative strength of the system, its controllability, and its compatibility with the particular nucleic acid or DNA sequence or gene to be expressed, particularly as regards potential secondary structures. Suitable unicellular hosts will be selected by consideration of, e.g., their compatibility with the chosen vector, their secretion characteristics, their ability to fold proteins correctly, and their fermentation requirements, as well as the toxicity to the host of the product encoded by the nucleic acid or DNA sequences to be expressed, and the ease of purification of the expression products. Considering these and other factors a person skilled in the art will be able to construct a variety of vector/expression control sequence/host combinations that will express the DNA sequences of this invention on fermentation or in large scale animal culture. The nucleic acids or DNA encoding the phase separation sensors hereof may be administered via one or more vector or DNA construct which is capable of expressing the sensor(s) in a target cell or tissue.
In additional embodiments, methods are provided herein based on the characteristics and capabilities of the phase separation sensors. Thus, methods are provided comprising administering, transfecting or transducing, or otherwise contacting a cell, tissue, sample etc with a phase separation sensor wherein the sensor is capable of targeting or associating with a biomolecular condensate and comprises at least two protein domains, wherein the first domain comprises one or more accessory protein and the second domain comprises an artificial client protein having intrinsic disorder and capable of engaging in ultra-weak phase separation-specific amino acid interactions with one or more component protein in the condensate.
In one such embodiment, a method is provided for targeting a biomolecular condensate in a cell or tissue comprising administering to the cell or tissue or otherwise expressing in the cell or tissue one or more phase separation sensor of the invention.
In an embodiment, a method is provided for targeting a biomolecular condensate in a cell comprising transfecting or transducing the cell with a vector comprising nucleic acid encoding a sensor of the invention or otherwise capable of expressing the sensor of the invention in a cell.
Biomolecular condensates refer to and include membraneless compartments in cells and are two- and three-dimensional compartments in eukaryotic cells that concentrate specific collections of molecules without an encapsulating lipid-based membrane. Biomolecular condensates may be cytoplasmic or nuclear in cell location. Biomolecular condensates include keratohyalin granule (KG), P granule, Germ granule, Lewy bodies, synaptic condensates, stress granule, P bodies, T cell signalosome, crystalline condensates of the lens fibers, Nucleoli, Paraspeckles, Histone Locus Bodies, Cajal Bodies, Heterochromatin and other cytoplasmic or nuclear condensates or membraneless organelles assembled through liquid-liquid phase separation.
In another embodiment, a method is provided for detecting or visualizing a biomolecular condensate in a cell or tissue comprising administering to the cell or tissue or otherwise expressing in the cell or tissue one or more sensor of the invention as provided herein. In one such method embodiment, the sensor comprises at least one accessory protein comprising a detectable or functional label or marker, or a protein capable of tagging the condensate with a detectable or functional label or marker, including for example by association with or localization in the condensate. In a method embodiment, the sensor comprises at least one accessory protein selected from a fluorescent protein, a protein that creates contrast suitable for electron microscopy, or a protein capable of tagging the condensate with a detectable or functional label or marker.
Another method embodiment of the invention is provided in a method for monitoring one or more biomolecular condensate(s) in a cell comprising administering to the cell or otherwise expressing in the cell or tissue one or more sensor described and provided herein wherein the sensor is capable of tagging or labeling the condensate, such as with a detectable or functional label or marker. In an embodiment, the sensor is capable of tagging or labeling a protein in the condensate via a chemical interaction or enzymatic reaction. In an embodiment, the sensor is capable of tagging or labeling a protein in the condensate via ultra-weak bonding or by association with or localization in the condensate. In an embodiment, the sensor is capable of tagging or revealing the condensate with a detectable or functional label or marker without altering the condensate or any condensate protein.
A further method embodiment provides a method for manipulating one or more biomolecular condensate(s) in a cell comprising administering to the cell or otherwise expressing in the cell or tissue one or more sensor described and provided herein wherein the sensor is capable of modifying, labeling, or altering a protein in the condensate. Thus, for example, when a cargo protein modifies protein(s) within condensates, the material properties of the condensate can be manipulated, including being altered or tuned. In one such method embodiment, covalent or noncovalent cross-linking of condensate components may alter the material properties of the condensate.
A kit for evaluation of one or more biomolecular condensate(s) in cells or tissues is provided in another distinct embodiment of the invention, wherein the kit comprises a phase separation sensor as described and provided herein, a nucleic acid encoding a sensor hereof, or a vector comprising a nucleic acid or otherwise capable of expressing one or more sensor hereof in a cell.
In alternative methods of the invention, the phase separation sensors provided herein may be utilized in monitoring phase separation dynamics. The sensors can monitor the formation of condensates and their disassembly, including in a cell, tissue or organ, including as demonstrated herein in skin.
Further methods embodiments include use and application of one or more phase separation sensor to evaluate or screen compounds, drugs or agents for their effect on a condensate. This is particularly relevant wherein the formation of a condensate, the size, the material properties or location of a condensate, or the component make up is altered in or associated with a disease or condition, or is involved in a cellular response in an animal, particularly in a human. In one such embodiment, the sensors are utilized in screening for drugs that promote assembly or disassembly of target condensates.
In accordance with the above, an assay system for screening potential drugs effective to modulate the activity of the target biomolecular condensate may be prepared. The phase specific sensor may be introduced into a test system, and the prospective drug may also be introduced into the resulting cell culture, and the culture thereafter examined to observe any changes in the biomolecular condensate in the cells or of an activity or function associated with one or more embodiment of the biomolecular condensate, due either to the addition of the prospective drug alone, or due to the effect of added quantities of the known phase separation sensor.
The invention may be better understood by reference to the following non-limiting Examples, which are provided as exemplary of the invention. The following examples are presented in order to more fully illustrate the preferred embodiments of the invention and should in no way be construed, however, as limiting the broad scope of the invention.
At the body surface, the skin's stratified squamous epithelium is uniquely challenged by environmental extremes. Through processes poorly understood, enucleated surface squames derive from transcriptionally-active keratinocytes displaying filaggrin-containing keratohyalin granules (KGs) of unknown structure/function. Here we show that filaggrin assembles KGs through liquid-liquid phase separation, whose dynamics govern terminal differentiation and are disrupted by human skin barrier disease-associated mutations that compromise its critical concentration for phase separation. By engineering sensitive, innocuous fluorescent probes to interrogate endogenous phase behavior in mice, we discovered that phase transitions during epidermal differentiation crowd cellular spaces with KGs whose coalescence is restricted by keratin bundles. Strikingly, natural environmental gradients then profoundly alter KG phase dynamics to drive squame formation. Our findings expose skin as a tissue driven by phase separation. Phase separation sensors reveal abundant liquid-like organelles that are at the crux of skin barrier formation.
Liquid-liquid phase separation of biopolymers has emerged as a major driving force for assembling membraneless biomolecular condensates (1-3), including nucleoli (4), receptor signaling complexes (3, 5), germline granules (1, 6) and stress granules (7). This focus on phase separation has also unraveled unexpected insights into a range of biological processes, including genomic organization (8-10), RNA processing (11, 12), mitosis (13, 14), cell-adhesion (15) and carbon dioxide fixation in plants (16). Despite remarkable progress, the study of cellular phase separation remains challenging (17, 18), often relying upon truncated protein mutants, reconstituted systems in non-physiological buffers and overexpression/knockin of tagged fusions (17, 19) that can alter a protein's phase separation behavior.
In mammalian epidermis, a self-renewing inner (basal) layer of progenitors fuels an upward flux of non-dividing keratinocytes that stratify to form the skin's surface barrier that excludes pathogens and retains body fluids (
Previously, in an unbiased proteome-wide in silico search for phase transition proteins, we identified a major constituent of KGs, filaggrin (FLG) (22), whose truncating mutations are intriguingly linked to human skin barrier disorders (23) (
Phase Separation Behavior of Filaggrin and its Paralogs in Normal and Disease States.
Filaggrin and its less-studied (often less-abundant) paralogs are intrinsically disordered repeat proteins with a low complexity (LC) sequence. Though their sequences are poorly conserved (24) (25, 26), mouse and human filaggrin and their paralogs share similar repeat architecture, LC biases and localization in the cell within KG-like structures (
Mouse Flg1 sequence (SEQ ID NO:1)—Translated from mm10 mouse Flg assembled and repaired CDS
The human filaggrin sequence (NP_002007.1) is provided below for reference (SEQ ID NO:56):
Like many proteins that drive phase separation, filaggrin family proteins across species exhibit a striking bias for arginine (over similarly charged lysine) to engage in aromatic-type interactions (22) (
To directly interrogate filaggrin and its disease-associated variants for phase separation behavior, we first engineered expression vectors driving 1 to 16 human filaggrin repeats (humans have up to 12), each tagged with a fluorescent protein (sfGFP or mRFP)±the non-repeat domains (
SDISKQLGFSQSQRYYYYEG
RYYYYEG
PGLCGHSSDISKQLGFSQSQRYYYYEG
Notably, humans with filaggrin early truncation mutations fail to generate KGs. Such mutations account for >80% of cases among northern Europeans (27). To quantitatively determine how disease-associated mutations alter the critical concentration for phase separation, we incorporated a self-cleavable [p2a] sequence (28) to express equimolar amounts of mRFP-FLG variants and H2B-GFP (as a proxy for variant concentration) (
Filaggrin and its paralogs belong to the S100-fused type protein family which feature two short ‘EF hand’ calcium-binding motifs (˜2% of the protein), N-terminal to the IDP domain. The S100 domain is known to dimerize (29), and when fused to filaggrin variants, it reduced the critical concentration for phase separation (
To further explore whether compromised phase separation might underlie disease-severity, we next performed fluorescence recovery after photobleaching (FRAP). As expected for a diffusive process, highly truncated, smaller FLG repeat variants exhibited more rapid recovery than WT-sized proteins (
Liquid-Like Behavior of Filaggrin Granules
Live cell imaging revealed that HaCATs harboring our engineered filaggrins underwent granule rearrangements and fusion events that are hallmarks of liquid-like droplets (
To further probe their material properties, we employed atomic-force microscopy (AFM). By applying pressure with an AFM probe directly on top of filaggrin granules, they deformed, creating liquid-like streaming around the cell's nucleus (
Our photobleaching data suggested that the material properties of KGs may change as a function of filaggrin processing and disease-associated mutations. To test this hypothesis, we performed serial force-indentation measurements across the granule length in HaCATs harboring different filaggrin variants (
Engineering Phase Separation Sensors to Interrogate Endogenous KGs
While our tagged filaggrin variants assembled de novo into KG-like structures, it was critical to address whether endogenous KGs in skin assemble through phase separation of filaggrin and if so, how their putative liquid-like properties contributed to epidermal differentiation. To do so, we could not use direct filaggrin tagging to label endogenous KGs, as it alters its biophysical properties (
Thus, we sought to design novel clients that would permit probing the phase separation behavior of endogenous scaffold proteins as their concentration and processing change in living tissues. We aimed for soluble IDP clients that lack phase separation behavior of their own, but co-partition efficiently and innocuously into nascent phase-separated condensates by engaging in ultra-weak, phase-separation-specific (combinations of charge-charge, cation-pi, pi-pi, hydrogen-bonding and hydrophobic) interactions with the scaffold (
To engineer such ‘phase separation sensors’ for endogenous filaggrin, we exploited a) the non-pathogenic behavior of human filaggrin repeat mutants that possess His:Tyr mutations (
For live imaging, our sensors needed a fluorescent tag. Since surface charge of fusion proteins can affect IDP phase separation behavior (32), we first screened Tyr-high sensor variants fused to sfGFPs of varying net charges (33) for those that display high partition coefficients into KGs (
Due to FRAP's size-dependence, our studies in
Crowding of Liquid-Like KGs within Skin Cells in Tissue.
Our ultimate goal was to interrogate the dynamics of these liquid-liquid phase transitions in vivo in the skin epidermis. To this end, we employed our non-invasive in utero lentiviral delivery system to selectively, efficiently and stably transduce the single layer of embryonic E9.5 mouse skin epithelium with doxycycline-inducible transgenes encoding our sensors (
Once the skin barrier was fully mature (E18.5), doxycycline-fed embryos were subjected to live imaging and/or immunofluorescence microscopy. Sagittal confocal views revealed that bright sensor signal was confined to filaggrin-expressing granular layers, while planar views showed a robust array of sensor-labeled KGs in these cells (
The striking level of KG crowding seemed incompatible with liquid-like behavior. To gain further insights, we performed live imaging and monitored keratinocyte flux through the granular layers of skin (36). Early granular cells displayed only a few KGs, whose numbers appeared to increase through de novo granule formation (
Despite these liquid-like features, most existing KGs grew robustly without undergoing fusion (
Probing deeper, we noticed that granular cells exhibited substantial morphological changes as they transited through the granular layers and became increasingly crowded with KGs (
Stabilization of Liquid-Like Membraneless Organelles
Although the rarity of fusion events among densely packed KGs might simply reflect their apparent viscosity, it was also possible that additional facets of terminal differentiation might be contributing to this puzzling behavior. Notably, the granular layer also displays an abundant network of terminal differentiation-specific keratins 1/10 (K1/K10) filaments, prompting us to test whether they might be impeding KGs from fusing and allowing them to crowd the cytoplasm as stable organelles. When HaCATs were transduced with doxycline-inducible human mRFP-K10 (TABLE 4), hK10 incorporated into the endogenous network of basal K5/K14 filaments (37). Upon co-transfection with sfGFP-FLG to drive KG formation, many of the mRFP-tagged keratin bundles encased KGs (
KHYSSSRSGGGGGGGGCGGGGGVSSLRISSSKGSLGGGFSSGGFSGGSFSR
GSSGGGCFGGSSGGYGGLGGFGGGSFRGSYGSSSFGGSYGGSFGGGSFGG
GSFGGGSFGGGGFGGGGFGGGFGGGFGGDGGLLSGNEKVTMQNLNDRLA
SYLDKVRALEESNYELEGKIKEWYEKHGNSHQGEPRDYSKYYKTIDDLKN
QILNLTTDNANILLQIDNARLAADDFRLKYENEVALRQSVEADINGLRRVL
DELTLTKADLEMQIESLTEELAYLKKNHEEEMKDLRNVSTGDVNVEMNA
APGVDLTQLLNNMRSQYEQLAEQNRKDAEAWFNEKSKELTTEIDNNIEQI
SSYKSEITELRRNVQALEIELQSQLALKQSLEASLAETEGRYCVQLSQIQAQ
ISALEEQLQQIRAETECQNTEYQQLLDIKIRLENEIQTYRSLLEGEGSGSSGG
GGRGGGSFGGGYGGGSSGGGSSGGGHGGGHGGSSGGGYGGGSSGGGSSG
GGYGGGSSSGGHGGSSSGGYGGGSSGGGGGGYGGGSSGGGSSSGGGYGG
GSSSGGHKSSSSGSVGESSSKGPRY
MSVRYSSSKHYSSSRSGGGGGGGGCGGGGGVSSLRISSSKGSLGGGFSSGG
FSGGSFSRGSSGGGCFGGSSGGYGGLGGFGGGSFRGSYGSSSFGGSYGGIF
GGGSFGGGSFGGGSFGGGGFGGGGFGGGFGGGFGGDGGLLSGNSPGVSK
MSVRYSSSKHYSSSRSGGGGGGGGCGGGGGVSSLRISSSKGSLGGGFSSGG
FSGGSFSRGSSGGGCFGGSSGGYGGLGGFGGGSFRGSYGSSSFGGSYGGIF
GGGSFGGGSFGGGSFGGGGFGGGGFGGGFGGGFGGDGGLLSGNSPGVSK
GGYGGGSSGGGSSGGGHGGGHGGSSGGGYGGGSSGGGSSGGGYGGGSSS
GGHGGSSSGGYGGGSSGGGGGGYGGGSSGGGSSSGGGYGGGSSSGGHKS
SSSGSVGESSSKGPRY
Keratins possess a central coiled-coil ‘rod’ domain that initiates heterodimer formation and forms the backbone of the 10 nm intermediate filament (39). Whereas K5/K14 keratins of proliferative progenitors have short amino- and carboxy-LC domains, the large LC domains of K1/K10 keratins (40) are thought to protrude along the outer surface of the filament and bundle into cable-like filaments.
Intrigued by the packing of K10-containing filaments around filaggrin granules, we next asked whether their unique features might facilitate interactions with KGs. Examining the behaviors of mCherry fused to one, both or neither hK10 LC domains (TABLE 4), we found that in the absence of KGs, each was diffuse in the cytoplasm of cultured keratinocytes (
To further explore this possibility, we transduced both our phase separation sensor and suprabasal-inducible mRFP-hK10 constructs into E9.5 embryos and performed live-imaging on E18.5 skin explants. While early granular cells displayed small, relatively sparse KGs surrounded by a well-defined network of K10-containing filaments, mid-granular cells exhibited a denser keratin network interwoven among larger, more abundant KGs that remained caged and hence unable to fuse (
Liquid Phase KG Dynamics, Enucleation and Environmental Sensitivity
We posited that progressive crowding by keratin-stabilized KGs might distort the nucleus and other organelles in a fashion that could contribute to their destruction at the critical granular to stratum corneum transition. If so, this could explain why nuclei are often aberrantly retained in the outer skin layers of patients who also lack KGs (41).
Consistent with this notion, KGs assembled de novo in HaCATs from wild-type repeat filaggrins prominently deformed nuclei, while KGs assembled from disease-associated FLG mutants instead wetted the nuclear surface without deformation (
Enucleation events were difficult to capture, as the process was remarkably rapid, occurring over 2 hours (
Probing further, we found that Flg knockdown in skin not only depleted KGs, but also delayed the nuclear degradation process (
Given the inherent environmental responsiveness of intrinsically disordered proteins (42, 43), we wondered whether the marked shift in KG dynamics late in terminal differentiation might be fueled by the environmental changes that naturally occur near or at the skin surface. In particular, while proliferative basal progenitors experience physiological pH (7.4), the skin surface is acidic (pH-5.5) (44). Since filaggrin is rich in histidine, whose physiological pKa is ˜6.1 (45), we posited that this natural difference in extracellular pH may also reflect intracellularly and in part be triggering the KG changes in material properties that we had detected at the granular to stratum corneum transition.
To detect intracellular pH shifts, we first transduced HaCATs with either mNectarine or SEpHlourin reporters, which rapidly lose fluorescence upon shifting from pH7.4 to pH6.3 (
Given the pH sensitivity of KGs, we then turned to interrogating this process in vivo. To do so, we introduced our pH reporters into mice along with either our phase separation sensor or H2BRFP and through live imaging, monitored the natural intracellular pH shifts that we surmised would occur as granular cells approached the acidic skin surface. Over time, as each granular cell progressed to the critical granular-to-corneum transition, it experienced a sudden shift in pH, as detected by our intracellular reporters (
Finally, we took skin explants from embryos transduced with phase-sensor, H2BRFP and either scrambled or filaggrin shRNAs, and performed live imaging immediately after shifting the extracellular pH in the medium. Accelerating the natural intracellular pH transition, granular cell KGs showed signs of disassembly, and chromatin compaction became pronounced (
Discussion
Our design and deployment of a new class of innocuous client protein provides a general strategy to interrogate endogenous liquid-liquid phase separation dynamics across biological systems in a non-disruptive manner. We envision that these in vivo phase separation sensors may be further functionalized to incorporate enzymes evolved for proximity proteomics (46) (47), potentially enabling—without perturbing endogenous scaffold proteins—the molecular and biophysical interrogation of endogenous liquid-liquid phase separation in organoids, tissues and living organisms.
We used this new strategy to illuminate—through the lens of phase separation—the process of skin barrier formation, which entails the appearance of enigmatic KGs in the granular layer and then their sudden disappearance as epidermal cells undergo a poorly-understood transition to the stratum corneum. These granules, long puzzling the field (48), have been viewed as inert, cytoplasmic aggregates of filaggrin, which then become cleaved into smaller fragments and amino acid derivatives that promote keratin filament bundling (21) and hydrate the stratum corneum (24). Despite decades of research, and mutations linked to atopic dermatitis (23), no clear function had been established for KGs, filaggrin or filaggrin paralogs that also accumulate as granular deposits in epithelial tissues (49, 50).
Through the engineering of filaggrins and filaggrin disease-associated variants and also our phase separation sensors, we have now shown that KGs are abundant, liquid-like membraneless organelles, which through their phase-separation-driven assembly and then disassembly, function to structure the cytoplasm and drive an environmentally sensitive program of terminal differentiation in the epidermis. By virtue of their newfound mechanical and pH-sensitive properties, KGs are ideally equipped to confer environmental-responsiveness to the rapid and adaptive process of skin barrier formation. The discovery that filaggrin-truncating mutations and loss of KGs are rooted in altered phase-separation dynamics begins to shed light on why associated skin barrier disorders are exacerbated by environmental extremes. These insights open the potential for targeting phase behavior to therapeutically treat disorders of the skin's barrier.
Liquid-phase condensates have been typically viewed as reaction centers where select components (clients) become enriched for processing or storage within cells (2). Analogously, KGs may store clients, possibly proteolytic enzymes and nucleases, that are timely (in a pH-dependent fashion) and rapidly released to promote the self-destructive phase of forming the skin barrier. Additionally, squame formation likely exploits general biophysical consequences of KG assembly, as KGs interspersed by keratin filament bundles massively crowd the keratinocyte cytoplasm and physically distort adjacent organelles prior to the ensuing environmental stimuli that trigger KG disassembly. Overall, the remarkable environmentally-sensitive dynamics of liquid-like KGs, actionable by the skin's varied environmental exposures, expose the epidermis as a tissue driven by phase separation.
Materials and Methods
Sequence Analysis of Filaggrin and its Paralogs
Full proteomes were downloaded as FASTA files from UniProt (uniprot.org). For the human proteome, we used the (non-redundant) canonical proteome. For the analysis of protein domains known to drive liquid-liquid phase separation (
Synthesis of Repetitive DNAs Encoding Filaggrin and Filaggrin Variants
To assemble repetitive DNAs, we used a highly-efficient iterative plasmid-reconstruction approach (PRe-RDL), with minor modifications. We used a modified pET-24a(+) vector as published (53), but eliminated the terminal Tyr-stop-stop sequence to avoid altering the hydropathy of FLG sequences. Instead, the modified vector uses a terminal Gly-stop-stop-stop sequence. We refer to this modified vector as JMD2G. We purchased synthetic gblocks from IDT (Integrated DNA Technologies) encoding the eight repeat in human FLG (repeat #8, here referred as r8), sfGFP, mRFP1 and the S100 domain of human FLG (
Synthesis of Genes Related to Phase Separation Sensors
TABLE 3 includes the sequence information for all sensor domains reported in
Synthesis of Genes Encoding Human K10 and its Low Complexity Domains
Because of the large low complexity domains in human K10 we were unable to successfully amplify full-length KRT10 cDNA. Instead, we first PCR-amplified a fragment of KRT10 spanning the N-terminal LC domain and the complete central coiled-coil rod domain (forward primer: TAATCATCGATCGGATGGCTCTGTTCGATACAGCTCAAGCAAGCACTACTCTT (SEQ ID NO: 36), Reverse primer: TAAGCAGGGGATCCCTCTCCTTCTAGCAGGCTGCGGTAGGTTTG (SEQ ID NO: 55)) using KRT10 cDNA (NM_000421.2) purchased from Origene. These primers added restriction sites for PvuI and BamHI at the N and C-terminus, respectively, for seamless restriction into a pmax vector harboring an N-terminal mRFP sequence and the C-terminal LC domain. The C-terminal LC domain was synthesized as a gblock by IDT (by maximizing codon usage along the length of this highly repetitive low complexity sequence). Similarly, we also obtained a gblock encoding the N-terminal LC domain flanked by NheI and XmaI sites, which we inserted into our modified pmax vector for building a gene encoding a fusion to mCherry. This vector was further modified between BamHI and EcoRI sites to introduce the C-terminal LC domain and generate mCherry fusions harboring both K10 LC domains. These constructs are listed in TABLE 4.
Characterization of Filaggrin-Like Proteins in
To drive efficient expression of the relevant repetitive proteins, we transfected the corresponding pmax plasmids into HaCATs, a commonly used immortalized human keratinocyte cell line (57). First, we routinely expanded HaCATs in low calcium (50 μM) epidermal cell culture media (58). For transfection, we seeded 1.5-2×105 cells per well in glass-bottom 24-well plates (P24G-1.5-10-F, Mattek) using CnT-PR media (CELLnTEC, Switzerland) supplemented with 10% epidermal media. At this seeding density cells cover the glass-bottom wells at good cell density by 15-17 h post seeding. At this time, we typically transfected cells with 0.5 μg to 3.5 μg of each plasmid, using lipofectamine 3000 (Invitrogen) and following the instructions of the manufacturer (at 1.5 μl L3000 per transfection reaction, including P3000 reagent). In a typical experiment with plasmids encoding FLG repeat variants, we scaled the amount of plasmid DNA to account for differences in gene size (e.g. 1.5 μg for a gene with 4 FLG repeats compared with 3.45 μg for a gene with 12 FLG repeats). However, as little as 0.5 μg was sufficient to induce expression of most FLG variants in TABLE 2 and we have conducted experiments in which we transfected the same amount of plasmid DNA regardless of gene size and saw the reported behaviors. We note that the amount of plasmid DNA ultimately controls transfection efficiency, that is the number of cells that robustly express the plasmid of interest, but does not play a role in defining the properties of the resulting FLG granules. One day after transfection, we changed media to a pro-differentiation media (pre-warmed to 37° C.), CnT-PR-D (CELLnTEC, Swtizerland) supplemented with 1.5 mM CaCl2, and proceeded to image cells 6-9 h later using a spinning-disc microscope equipped with a 40× oil objective. Live imaging was conducted with cells at 37° C. and under a controlled C02 environment. In some specific cases and as indicated (e.g.
To calculate the phase separation propensity of FLG repeat proteins in
To calculate the density of FLG repeat within granules assembled from different FLG repeat variants (
To study protein dynamics within granules (
Characterization of Phase Separation Sensors in
The approach to studying the behavior of phase separation sensors in HaCATs (
Photobleaching experiments were also as described for experiments in
Atomic Force Microscopy (AFM) Measurements
To enable access of the AFM probe to filaggrin granules within cells, we seeded 1.5×106 HaCATs into 50 mm glass-bottom dishes (Fluorodish, FD5040, WPI) using CnT-PR media supplemented with 10% epidermal media. At 15 h post seeding, we transfected HaCATs with a mixture of two pmax vectors using lipofectamine 3000 (at 7.5 μl L3000 per transfection reaction, including P3000 reagent). One vector (at 1 ug per reaction) harbored a H2B-RFP gene and was common to all transfection reactions. The second vector (at 7.5 μg per reaction) encoded one of the following FLG variants: sfGFP-(r8)8, sfGFP-(r8)8-Tail and S100-sfGFP-(r8)8-Tail (TABLE 2). One day after transfection, we washed cultures with DPBS (pre-warmed to 37° C.) and added pro-differentiation media (pre-warmed to 37° C.), CnT-PR-D (CELLnTEC, Swtizerland) supplemented with to 1.5 mM CaCl2. Cells were transported (at 37° C.) soon after or up to 24 h later to the Molecular Cytology core facility of MSKCC for AFM measurements using a microscope stage at 37° C. AFM force measurements and manual deformations of sfGFP-tagged FLG granules were performed using an MFP-3D AFM (Asylum Research) combined with an Axio Scope inverted optical microscope (Zeiss). Silicon nitride probes with a 5 μm diameter spherical tip were used (Novascan). Cantilever spring constants were measured prior to sample analysis using the thermal fluctuation method, with nominal values of approximately 100 pN/nm. 5×5 μm force maps were acquired with 10 force points per axial dimension (0.5 μm spacing) atop sfGFP-tagged FLG granules identified using the bright-field and GFP optical images. Measurements were made using a cantilever deflection set point of 10 nN and scan rate of 1 Hz. Bright-field (AFM probe), GFP (FLG variant) and H2B-RFP (nuclei) images were acquired for each cell and granule measured to enable force map and optical image co-registration. Live-video bright-field images were also taken during force map acquisition to observe granule and cellular deformations. Force-indentation curves were analyzed using a modified Hertz model for the contact mechanics of spherical elastic bodies. The sample Poisson's ratio was 0.33 and a power law of 1.5 was used to model tip geometry. To observe granule displacement and flow following force application, the AFM tip was manually placed adjacent to sfGFP-tagged FLG granules using a micrometer. During live video-rate (14 frames/sec) image acquisition (bright-field and GFP), force was manually applied with the AFM probe in the absence of force set point feedback via micrometer manipulation.
Mice and Lentiviral Transduction
Mice were housed and cared for in an AAALAC-accredited facility, and all animal experiments were conducted in accordance with IACUC-approved protocols. We obtained the hIVL-rtTA FVB mouse line (ref. 27 of the main manuscript) from NIH as frozen embryos donated by Julie Segre at NHGRI. This line was genotyped as originally published.
For rapid generation of mice with genetically-modified skin, we used non-invasive, ultrasound-guided in utero lentiviral-mediated delivery of expression constructs and shRNAs, which selectively transduces single-layered surface ectoderm of living E9.5 mouse embryos as previously published (59). Lentiviral vectors with Scramble (not targeting) shRNAs and constitutive expression (PGK-driven) of H2B-RFP were previously reported and documented to have no adverse effects over no infection or a mock vector having no shRNA (59). We further modified this PLKO-based lentiviral vectors to replace the PGK promoter with a TRE promoter sequence (TRE3G, Clontech). We assembled the corresponding TRE repeats from small oligos synthesized by IDT and cloned them into this modified PLKO vector. PKG and TRE-based PLKO vectors were modified to include NheI and EcoRI sites downstream of the promoter. We shuffled sensor and mRFP-K10 genes from pmax vectors into PLKO-based vectors using these two restriction sites. Using these PLKO vectors we generated high titer viruses in 293FT cells as previously described (59). To induce expression of TRE-controlled genes in vivo, females fostering lentivirally-transduced embryos were fed with doxycycline starting 1 day after injection (i.e. at E10.5).
For knockdown of mouse filaggrin, we relied on our curated mouse Flg cDNA (largely based on the C57B6 mm10 genome) to identify hairpins with high intrinsic scores and no predicted off-targets using the GPP Web Portal (portals.broadinstitute.org/gpp/public). Notably, existing genome-wide shRNA libraries lack hairpins against filaggrin, likely due to the low quality of the current reference sequence and the inherent repetitiveness of most target sites. We modified our lentiviral vectors harboring H2BRFP to substitute their Scramble shRNA with two hairpins against (mouse) Flg (#01: with target sequence ATCAATCTCACAGCTATTATT (SEQ ID NO: 37) localized to the C-terminal domain, and #02: with target sequence CTCCGGATTCTACCCAGTATA (SEQ ID NO: 38) within the filaggrin repeats). We tested both in mouse skin and they efficiently depleted mFLG and its KGs (
To transduce human primary keratinocytes with lentiviral vectors harboring H2BRFP and phase separation sensors, we thawed frozen neonatal and adult human primary keratinocytes (i.e. two different human donors) that we purchased from Life Technologies. Expanded only for 1 passage prior to freezing, thawed cells were seeded into a 10 cm cell culture dish using Epilife media (ThermoFisher) supplemented with Human Keratinocyte Growth Supplement (ThermoFisher). At ˜80% confluency, cells were detached using StemPro Accutase (ThermoFisher) and seeded onto 6-well plates at a density of 2×105 or 4×105 per well. After overnight incubation, we transduced these cultures by diluting the corresponding high titer lentiviruses into supplemented Epilife media and centrifuging the 6-well plates at 1100 g for 30 min at 25-29° C. We discarded the media and added fresh media as needed as the cells expanded. Upon confluency, we detached the transduced cells using StemPro Accutase and seeded them in glass-bottom 35 mm dishes (MatTek, P35G-1.5-20-C) at about half the original cell density using CnT-PR media supplemented with 10% supplemented Epilife. Once confluent, we switched these cultures to pro-differentiation media (same as for HaCATs) and incubated them for at least 4 to 8 days prior to live imaging. We note that a similar in vitro lentiviral transduction approach was used to generate HaCATs with doxycycline-inducible expression of mRFP-K10.
Live Imaging
For live imaging of mouse skin, we harvested head and back skin from E18.5 mouse embryos that were in utero transduced as explained above. We predominantly imaged head skin as our lentiviral approach provides very high coverage in the head, though at high titer we could also routinely use back skin. Once removed from the embryos, we gently scraped off the fat leaving the dermis intact, cut out 1 cm2 pieces and placed them with the stratum corneum facing down on a glass-bottom 35 mm dish (MatTek, P35G-1.5-20-C) and on top of a 50 μl drop of Phenol-red free grow-factor reduced matrigel (Corning). We pressed the tissue flat against the glass surface using a transparent porous membrane (Whatman Nuclepore Track-Etched Membranes, 13 mm diameter, 5 μm pores) and a 12 mm cover glass (Fisherbrand). Once flat, we removed the cover glass and allowed the matrigel to solidify at 37° C. for 15 min. We then added 2 ml of warm (at 37° C.) CnT-Prime Airlift, Full Thickness Skin Airlift Medium (from CELLnTEC), typically supplemented with 2 μg/ml doxycycline (unless all genes were under constitutive promoters). We imaged these samples using a spinning-disc microscope equipped with a 40× oil objective and a live imaging chamber with constant supply of CO2 and maintained at 37° C. We imaged with up to two lasers (488 and 561 nm, at 5.2 mW) and with exposure times of 200 ms per laser. We obtained full z stacks of the suprabasal epidermis every 20 min (to limit phototoxicity) for about 16-20 h. For the pH-shift experiments in
Selection and Synthesis of pH Reporters
For the synthesis of genetically-encoded pH reporters that sensitively respond with a pKa near 6.5 (based on the known pH range of the extracellular skin pH gradient), we chose two previously published and well-characterized pH reporters: SEpHLuorin (60) and mNectarine (61). We PCR-amplified genes encoding these proteins from Addgene plasmids (#58500 and #80151, respectively) while adding NheI and XmaI restriction sites for insertion into pMAX vectors downstream of a CMV promoter. We used these pMAX-based vectors for validation of the pH reporters using immortalized human keratinocytes (
Design and Synthesis of Conventional Client Proteins for KGs
In
ENLYFQS-
ENLYFQR-
Immunofluorescence of Fixed Cells and Tissues
To prepare HaCATs for immunostaining, we fixed cultures at 37° C. for 10 min using 4% paraformaldehyde in DPBS. Cultures were washed multiple times with DPBS and stored at 4° C. prior to immunostaining. To prepare murine skin for whole-mount immunostaining, we treated whole skin with dispase for 30 min at 37° C. to isolate the epidermis. We fixed the epidermis at 37° C. for 30 min in 4% paraformaldehyde. After subsequent washes in DPBS, we stored the tissue at 4C in DPBS prior to immunostaining. In all cases, we permeabilized the tissue with an antibody blocking buffer (0.3% Triton X-100, 2.4% Gelatin (Sigma, 67765-1L), 1.2% BSA and 6% normalized Donkey serum in DPBS) for 3-4 hours prior to overnight incubation with primary antibodies. The following primary antibodies were used: chicken anti-GFP (1:2000, Abcam), rabbit anti-RPTN (1:200, Sigma HPA030483), rabbit anti-mFLG (1:1000, Fuchs Lab), rabbit anti-mFLG (1:1000, Abcam ab24584) and goat anti-hFLG (1:200, Santa Cruz, sc-25897). After washing with DPBS, we typically added species-specific secondary antibodies conjugated to RRX or AF647 and incubated the cultures and tissues for 4 h at room temperature. After washing with DAPI, the samples were mounted with ProLong Gold Antifade Mountant (Invitrogen) and cured overnight prior to imaging. For filaggrin immunostaining without secondary antibodies (i.e. direct detection) in mouse skin, we first conjugated anti-mFLG (abeam) to AF647 using an Alexa Fluor™ 647 Antibody Labeling Kit (ThermoFisher) and following the instructions of the manufacturer. We refer to this conjugated antibody as anti-FLG Ab-c(AF647). For immunostaining, we permeabilized the tissue in blocking buffer overnight and added anti-FLG Ab-c(AF647) (1:50) for 4 hours at room temperature. After washing with DAPI, we mounted the tissue as described above. Cultured cells and whole-mounted fixed tissues were imaged using a spinning-disc microscope equipped with a 40× oil objective. Images were analyzed using ImageJ and Imaris 8.3.1.
Skin Barrier Assay
To measure barrier quality, we obtained trans-epidermal water loss measurements (TEWL) using a Tewameter TM 300 (Courage+Khazaka electronic GmbH) on explanted neonatal back skin. Briefly, neonates were humanely sacrificed a few hours (<12 h) after birth and their back skin was harvested and immediately spread over a clean surface. In this format, we ensured that the TM-300 probe (˜1 cm in diameter) covered a large percentage of the back skin surface. We collected four TEWL measurements per sample and consistently discarded the first measurements to limit our analysis to fully acclimatized skin. Each measurement was allowed to stabilize for 30-60 seconds. The values reported by the instrument were not further processed and corresponded to grams of lost water per hour per cm2 of skin. We measured 2-3 animals in 3 independent experiments.
Statistical Analyses
Whenever we indicate statistical significance, these are cases where we reject, with a confidence greater than 0.05 (i.e., p-value<0.05). the null hypothesis that the difference in the mean values between two data sets is equal to zero. To perform this hypothesis testing we ran two-sample t-Tests using OriginPro. In all cases we verified that the statistical differences did not depend on the assumption of equal variance (Welch-correction) between samples.
Engineering of Clients that Function as Phase Separation Sensors
“Phase separation sensors” are a new class of clients optimally designed to interrogate dynamic liquid-liquid phase separation events in a way that does not perturb the process. As we define them, phase separation sensors do not bind specific domains on the scaffold protein. Rather, they engage in ultra-weak molecular interactions with key residues of the scaffold (in this case filaggrin). Consequently, only upon a liquid-liquid phase separation do we expect these proteins to become sufficiently concentrated to enable the sensors to appreciably interact with the scaffold. As our data show, these sensors can exhibit a uniquely high signal:noise ratio and participate innocuously without altering the phase separation process (
By contrast, conventional clients were initially defined as macromolecules that are recruited to a condensate by binding to free sites in its protein scaffold (30). The underlying assumption has been that clients bind to specific domains within the scaffold protein and engage in specific protein-protein interactions between a domain in the client and a domain in the scaffold protein. Such clients have been successful in directing cargo to test-tube and/or phase separated compartments in cell lines in vitro. However, these traditional clients that bind domains within scaffolds have caveats as probes for endogenous phase behavior. They may, for instance, bind to the scaffold prior, regardless of the phase separation process. Moreover, as with fluorescently tagged scaffold proteins (
We illustrate these critical differences between phase separation sensors and conventional clients by providing a direct experimental comparison with a conventional client that we engineered to bind a small domain within a filaggrin-like protein (
We anticipate the development and utilization of new client-based technologies to study native phase separation processes. This will be important to move beyond protein-tagging and into complex biological systems like tissues and living organisms.
Implementation of Phase Separation Sensors to Study Phase Separation Dynamics in Skin
Genetically-encoded phase separation sensors shown here feature two domains: a sensing domain proper and a fluorescent reporter consisting of a fluorescent protein with suitable surface characteristics. The overall rationale is explained above in this example and is also further depicted in
Considering that a single filaggrin repeat does not drive phase separation in keratinocytes (
We also generated additional (distant) sensor variants that are smaller than a filaggrin repeat but with similar compositional biases (eFlg1, ieFlg1 and eFlg2). Specifically, while the original filaggrin repeat is 324 amino acid resides in length and lacks a clearly discernable internal repeat structure (though low complexity in nature), we engineered a new sensor domain (eFlg1) composed of 5 repeats of a minimal filaggrin repeat that is only 40 amino acid residues in length (GRDGSHSYQGDRSGHSHQRQGYHEQSDRA GHGDSGHRGYS) (SEQ ID NO: 44). This minimal repeat does not occur in filaggrin, but some short motifs do occur within r8 (RQGYH (SEQ ID NO: 45), DRAGHG (SEQ ID NO: 46), EQS (SEQ ID NO: 47), RDGS (SEQ ID NO: 48), DSGHRGYS (SEQ ID NO: 49)) and the overall design is modeled after a canonical UCST phase transition protein-polymer with low phase separation propensity (22). We then generated ieFlg1 by sequence-reversal of eFlg1. eFlg2 corresponds to a sensor domain that approximates the size of r8 by directly fusing a 4-mer of the eFlg1 repeat with a 4-mer of the ieFlg1 repeat (TABLE 3).
Overall, we suggest that the use of naturally-occurring non-pathogenic mutations within specific domains of human phase-sensitive proteins followed by sequence-reversal constitutes a straightforward and general approach to the design of highly sensitive phase separation sensors that are tailored for specific biological systems. Other strategies are also feasible, as we also demonstrate here with the design of fully engineered sensor domains (e.g. ieFlg1 in Sensor B,
Phase separation sensors having multiple domains, particularly more than two domains have been contemplated, designed, constructed and evaluated for activity. These particularly include sensors having an artificial client protein or molecule domain and additionally at least one domain providing an accessory protein or molecule. Engineering of and exemplary phase separation sensor domains, including multi-domains, are depicted in
As described above in Example 1, two-domain sensors were designed, constructed and evaluated based on a two-domain structure, with the sensors having a first domain comprising a fluorescent protein or marker and a second domain comprising an IDP sensing domain or an artificial client protein sequence. A general two-domain sensor architecture is as follows:
Two-domain: [Fluorescent marker/Enzyme/Cargo]-[Optional Linker]-[IDP sensing domain]
The two domain structure was utilized in Example 1, where exemplary two-domain sensors comprised a fluorescent marker which is exemplified by GFP, particularly a GFP with positively-charged amino acids exposed on the protein surface, such as exemplary +15GFP, linked to an IDP sensing domain, such as the artificial IDP sequence based on filaggrin sequence. The design and characteristics of the IDP sensing domain sequence(s) are described in Example 1. The Sensor A and Sensor B utilized in Example 1 are also described below. These sensors included a nuclear export signal (denoted NES) to direct the protein sensor to the cytoplasm of the cell when expressed. Each of Sensor A and Sensor B comprises a fluorescent marker protein (+15GFP was utilized), followed by a nuclear export signal (denoted NES). The NES sequence is indicated below in bold and underlined. The nuclear export signal sequence utilizes the NES sequence LELLEDLTL (SEQ ID NO: 57), which is an optimized export signal reported by Woerner A T et al (Woerner A C et al Science (2016) 351(6269):173-176. The reported LELLEDLTL was further expanded in sensors A and B to incorporate two N-terminal additional linker residues SG, to provide a full NES sequence of SGLELLEDLTL (SEQ ID NO: 58). A flexible linker sequence of 4 amino acids, particularly GSPG (SEQ ID NO: 59) was incorporated in the Sensor A and B constructs (double underlined in the sensor sequences below). Alternative or additional linker sequence GRSDGVPGSG (SEQ ID NO: 60) was also incorporated and utilized in some sensor designs and sequences (double underlined in the sensor sequences below). A suitable alternative optional linker would be from about 2-10 residues in length and lack charged residues or be zwitterionic and have equal numbers of positively-charged and negatively-charged amino acid residues.
LEDLTLGSPGYLFSGSHGSTEGSRGRASEQYSQQH
LEDLTLGSPGSYGRHGSDGHGARDSQEHYGQRQHS
In further studies, multi-domain sensors were designed, constructed and evaluated based on a three-domain structure (see for example
Three-domain: [Fluorescent marker]-[Optional linker]-[Enzyme/Cargo]-[Optional Linker]-[IDP sensing domain]
In an exemplary set of three-domain sensors, the Enzyme/Cargo is an enzyme and can, for example, be a peroxidase. BioID2 may be used as the enzyme domain. BioID2 biotinylates a target protein and uses endogenous or exogenously-added biotin to label a biomolecular condensate's components.
In a further example, the peroxidase Apex2, which is an engineered peroxidase enzyme developed by Ting and collaborators, was used as an accessory protein (Rhee H-W et al Science (2013) 339:1328-1331; Hung V et al Mol Cell (2014) 55:332-341). Apex2 does not function with regular native biotin (the one normally found in our bodies), but requires a chemically-modified biotin, biotin phenol (denoted BP), that is added prior to tissue processing/fixation. Addition of the small molecule BP substrate for Apex2, results in the covalent biotinylation of endogenous proteins within 1-10 nm of APEX over a 1 minute time window in living cells. Hydrogen-peroxide (H202) can be optionally added to accelerate the biotinylation reaction.
The full Apex2 sequence (including a C-terminal NES sequence) was derived from Addgene plasmid pcDNA3 APEX2-NES (#49386).
LEDLTLGRSDGVPGSGGKSYPTVSADYQDAVEKAK
GYLFSGSHGSTEGSRGRASEQYSQQHSGAQGHASV
LEDLTLGRSDGVPGSGGKSYPTVSADYQDAVEKAK
GSYGRHGSDGHGARDSQEHYGQRQHSHGSRDGQYS
The Apex2-SensorA was evaluated in skin cells in a set of experiments in line with those outlined in Example 1. In skin from nice genetically-engineered to express the Apex2-SensorA sensor, low levels of biotinylation (above background) are observed when BP is added and H202 is omitted (not shown). Upon addition of both the substrate BP and H202 to enhance the reaction, biotinylated KG granule components are clearly visualized (
In addition to biotinylation, when provided with diaminobenzidine(DAB) (instead of BP), Apex2-based sensors may be used to enable visualization of condensates via electron microscopy) (Hung V et al Nat Protocol (2016) 11(3):456-475, doi.10.1038/mprot.2016.018).
In addition to Apex2 and BioiD2, experts in the art can reasonably further modify our sensor designs to include newly-evolved and more efficient enzymes, including tagging and modifying enzymes, such as alternative ligases, peroxidases, etc. Examples include TurboID and miniTurboID (Branon T C et al Nature Biotechnology (2018) 36:880-887). TurboID and miniTurboID are engineered mutants of biotin ligase which provide enzyme-catalyzed proximity labeling in living cells with much greater efficiency than BioID.
Experts in the art will further recognize that the Apex2 domain in our phase separation sensors may be further modified to include other enzymes or proteins of interest, including as to exploit the phase separation sensors as vehicles that deliver cargo to biomolecular condensates. Said cargo may include but is not limited to fluorescent proteins, proteases, nucleases, ligases, peroxidases, phosphatases, kinases and other proteins capable of modifying proteins and nucleic acids or showing a biological activity of interest.
Additional sensors in line with the tri-domain constructs above and using an alternative positively charged GFP sequence (+15GFP-Kv) are also provided. These are two-domain constructs as shown below. The fluorescent marker +15GFP-Kv is a variant of +15GFP engineered by us in which all (surface-exposed) Arg residues that contribute to +15GFP (i.e. not present in sfGFP) were replaced by Lys residues. +15GFP was engineered with eight X>R substitutions and five X>K substitutions, whereas in our +15GFP-Kv all 13 mutations are X>K substitutions.
LEDLTLGSPGYLFSGSHGSTEGSRGRASEQYSQQH
Enzymatic-based labeling sensors such as Apex2-based sensors may be used to tag (for example with biotin using Apex2) all protein and RNA components present within biomolecular condensates. Quantitative approaches, however, will require control experiments in which tagging can also or alternatively be directed to components outside of biomolecular condensates. We engineered and validated an Apex2-based construct, Apex2-excluded, that lacks a phase separation sensing domain and which was multimerized into a trimer (via its Foldon domain) to prevent its trafficking into biomolecular condensates.
LEDLTLGRSDGVPGSGGKSYPTVSADYQDAVEKAK
This invention may be embodied in other forms or carried out in other ways without departing from the spirit or essential characteristics thereof. The present disclosure is therefore to be considered in all aspects and embodiments illustrative and not restrictive, the scope of the invention being indicated by the appended Claims, and all changes which come within the meaning and range of equivalency are intended to be embraced therein.
Various references are cited throughout this Specification, each of which is incorporated herein by reference in its entirety.
This invention was made with government support under RO1-AR27883 awarded by the NIH. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/014684 | 1/22/2021 | WO |
Number | Date | Country | |
---|---|---|---|
62964706 | Jan 2020 | US |