The present disclosure relates to protein biogenesis and in particular to translocon-associated biogenesis features (TABFs) and related methods, systems and products.
Correct control of protein biogenesis has been a challenge in the field of biological molecule analysis, especially when aimed in particular, at protein expressed through the co-translational translocation pathway.
Whether for fundamental biology studies, medical applications or drug design, several methods are commonly used for the prediction of protein biogenesis, in particular for proteins, such as membrane proteins, that are intricately involved with a hydrophobic, or membrane micro-environment.
In particular, to date, there are several prediction software tools to determine a proteins structure, function, and ability to fold and insert properly into membrane micro-environment.
However, achievement of accurate prediction and control of protein biogenesis is still challenging.
Provided herein are methods allowing in several embodiments prediction and/or control of translocon-associated protein biogenesis through control of translocon associated biogenesis features (TABFs) for proteins expressed in homologous or heterologous biological system in vitro or in vivo, and related, computer-based methods and systems as well as recombinant proteins.
According to a first aspect, a computerized trajectory-based method to represent translocon-associated protein trajectories is provided, comprising: i) representing, by a processor, amino and/or nucleic acids corresponding to a protein and an associated translocon as a plurality of coarse grain particles; ii) representing, by a processor, confinement and driving force effects of an active protein inserter; iii) representing, by a processor, interactions between the coarse grain particles; iv) calculating, by a processor, evolution of a chain of the coarse grain particles corresponding to the protein in the translocon as a function of time; v) based on steps i)-iv), building, by a processor, translocon-associated protein trajectories; and vi) providing, by a processor, spatial representation of the translocon-associated protein trajectories to a user.
According to a second aspect, a computer-based method to provide a protein or protein sequence with a desired translocon-associated biogenesis feature is provided, comprising: i) establishing a desired translocon-associated biogenesis feature of a protein sequence, the desired translocon-associated biogenesis feature selected from a) desired protein topology, b) desired partitioning between protein integration and protein secretion and c) desired protein expression level; ii) providing the protein or protein sequence with an initial set of translocon-associated biogenesis feature determinants; iii) representing by a computer one or more initial trajectories of the protein or protein sequence with the initial set of translocon-associated biogenesis feature determinants, the one or more trajectories determining an initial translocon-associated biogenesis feature of the protein or protein sequence with the initial translocon-associated biogenesis feature determinants; iv) comparing the initial translocon-associated biogenesis feature of the protein or protein sequence with the desired translocon-associated biogenesis feature of the protein or protein sequence; v) if the initial translocon-associated biogenesis feature of the protein or protein sequence is different from the desired translocon-associated biogenesis feature of the protein or protein sequence, modifying the initial set of translocon-associated biogenesis feature determinants, thus providing the protein or protein sequence with a modified set of translocon-associated biogenesis feature determinants; vi) representing by a computer one or more modified trajectories of the protein or protein sequence with the modified set of translocon-associated biogenesis feature determinants, the one or more modified trajectories determining a modified translocon-associated biogenesis feature of the protein or protein sequence with the modified translocon-associated biogenesis feature determinants; and vii) repeating steps iv)-vi) with the modified translocon-associated biogenesis feature in place of the initial translocon-associated biogenesis feature until the desired translocon-associated biogenesis feature is obtained, thus obtaining a set of translocon-associated biogenesis feature determinants suitable to be used for production of the protein or protein sequence with the desired translocon-associated biogenesis feature.
According to a third aspect, a computer-based method of screening proteins or protein sequences to provide proteins or protein sequences with a desired translocon-associated biogenesis feature is provided, comprising: i) establishing a desired translocon-associated biogenesis feature of a protein or protein sequence, the desired translocon-associated biogenesis feature selected from a) desired protein topology, b) desired partitioning between protein integration and protein secretion and c) desired protein expression level; ii) providing proteins or protein sequences, each having an associated set of translocon-associated biogenesis feature determinants; iii) for each protein or protein sequence, representing by a computer a trajectory of the protein or protein sequence, the trajectory determining a translocon-associated biogenesis feature of the protein or protein sequence; iv) for each protein or protein sequence, comparing the translocon-associated biogenesis feature of the protein or protein sequence with the desired translocon-associated biogenesis feature of the protein or protein sequence; and v) screening proteins or protein sequences for which the desired translocon-associated biogenesis feature has been determined in step iii) from proteins or protein sequences for which the desired translocon-associated biogenesis feature has not been determined in step iii).
According to a fourth aspect, a computer-based method of screening translocon-associated biogenesis feature determinants to provide proteins or protein sequences with a desired translocon-associated biogenesis feature is provided, comprising: i) establishing a desired translocon-associated biogenesis feature of a protein or protein sequence, the desired translocon-associated biogenesis feature selected from a) desired protein topology, b) desired partitioning between protein integration and protein secretion and c) desired protein expression level; ii) providing a protein or protein sequence, and multiple sets of associated translocon-associated biogenesis feature determinants; iii) for each set of translocon-associated biogenesis feature determinants, representing by a computer a trajectory of the protein or protein sequence, the trajectory determining a translocon-associated biogenesis feature of the protein or protein sequence; iv) for each set of translocon-associated biogenesis feature determinants, comparing the translocon-associated biogenesis feature of the protein or protein sequence with the desired translocon-associated biogenesis feature of the protein or protein sequence; and v) screening sets of translocon-associated biogenesis feature determinants for which the desired translocon-associated biogenesis feature has been determined in step iii) from sets of translocon-associated biogenesis feature determinants for which the desired translocon-associated biogenesis feature has not been determined in step iii).
According to a fifth aspect, a computer-based method for identifying translocon-associated biogenesis feature determinants of a given protein sequence is provided, comprising: i) providing a protein sequence with an associated set of translocon-associated biogenesis feature determinants; ii) establishing one or more desired translocon-associated biogenesis features of the protein sequence, the desired translocon-associated biogenesis features selected from a) desired protein topology, b) desired partitioning between protein integration and protein secretion and c) desired protein expression level; iii) providing one or more modified versions of the protein sequence by changing the translocon-associated biogenesis feature determinants associated with the protein sequence; iv) for each modified version of the protein sequence, representing by a computer a trajectory of the modified version of the protein sequence, the trajectory determining a translocon-associated biogenesis feature of the modified version of the protein sequence; and v) if the translocon-associated biogenesis feature of the modified version of the protein sequence matches one of the desired translocon-associated biogenesis features of the protein sequence, identifying the translocon-associated biogenesis feature determinants bringing to the desired translocon-associated biogenesis features of the protein sequence.
According to a sixth aspect, a computer-based protein sequence identification method is provided, comprising: i) providing a set of constraints on translocon-associated biogenesis feature determinants to be associated to a protein sequence; ii) providing a plurality of protein sequences, each having translocon-associated biogenesis feature determinants matching the set of constraints; iii) establishing a desired translocon-associated biogenesis feature of a protein sequence, the desired translocon-associated biogenesis feature selected from a) desired protein topology, b) desired partitioning between protein integration and protein secretion and c) desired protein expression level; iv) for each protein sequence, representing by a computer a trajectory of the protein sequence, the trajectory determining a translocon-associated biogenesis feature of the modified version of the protein sequence; and v) identifying the protein sequence bringing to the desired translocon-associated biogenesis feature.
According to a seventh aspect, a computer-based method for identifying correlations in a set of protein sequences is provided, comprising: i) providing a set of protein sequences, each protein sequence being associated to translocon-associated biogenesis feature determinants; ii) for each protein sequence, representing by a computer a trajectory of the protein sequence, the trajectory determining a translocon-associated biogenesis feature of the modified version of the protein sequence, the translocon-associated biogenesis feature being a protein topology, a partitioning between protein integration and protein secretion, or a protein expression level; and iii) for each protein sequence, comparing the translocon-associated biogenesis feature with the translocon-associated biogenesis feature determinants bringing to the translocon-associated biogenesis feature to determine possible correlations between translocon-associated biogenesis features and translocon-associated biogenesis feature determinants.
According to an eighth aspect, a computer-based method for identifying correlations between experimental data and computer-generated data in a protein sequence is provided, comprising: i) providing a protein sequence; ii) representing by a computer a plurality of trajectories of the protein sequence, each trajectory being determined in accordance with a distinct set of computer-executed parameters, each trajectory determining a translocon-associated biogenesis feature of the protein sequence, the translocon-associated biogenesis feature being a protein topology, a partitioning between protein integration and protein secretion, or a protein expression level; iii) correlating the translocon-associated biogenesis features determined in step ii) with experimentally obtained translocon-associated biogenesis features; and iv) identifying which of the translocon-associated biogenesis features determined in step ii) best correlate with the experimentally obtained translocon-associated biogenesis features.
According to a ninth aspect, a computer-based method for determining which modifications of a protein sequence do not substantially affect a translocon-associated biogenesis feature of the protein sequence is provided, comprising: i) providing a protein sequence; ii) representing by a computer a trajectory of the protein sequence, the trajectory determining a translocon-associated biogenesis feature of the protein sequence, the translocon-associated biogenesis feature being a protein topology, a partitioning between protein integration and protein secretion, or a protein expression level; iii) providing a plurality of modified versions of the protein sequence; iv) for each modified version, representing by a computer a trajectory of the modified version of the protein sequence, the trajectory determining a translocon-associated biogenesis feature of the modified version of the protein sequence; v) comparing the translocon-associated biogenesis feature of the protein sequence with the translocon-associated biogenesis features of the modified versions of the protein sequence; and vi) based on the comparison, determining which modifications of the protein sequence do not substantially affect the translocon-associated biogenesis feature of the protein sequence.
Further aspects of the present disclosure are provided in the specification, claims and drawings of the present application.
The methods and systems and related engineered proteins herein described allow in some embodiments predicting and/or refining biogenesis of membrane proteins, and in particular integral membrane protein insertion into cell membranes.
The methods and systems and related engineered proteins herein described allow in some embodiments predicting and/or refining protein translocation across cell membranes.
The methods and systems and related engineered proteins herein described allow in some embodiments increasing expression of integral membrane proteins or other protein expressed through a co-translational translocation pathway.
The methods and systems and related engineered proteins herein described allow in some embodiments predicting and/or refining biogenesis of secretory proteins that are translocated via the Sec cotranslational pathway, as well as integral membrane proteins that are integrated via the Sec pathway.
The methods and systems and related engineered proteins herein described allow in some embodiments predicting and/or refining biogenesis of proteins that undergo post-translocation integration/secretion via the Sec translocon.
The methods and systems and related engineered proteins herein described can be used in connection with applications wherein control of translocon associated protein biogenesis in in vivo or in vitro biological systems, is desired. Exemplary applications comprise laboratory applications, fundamental biological studies, diagnostics and medical applications, agriculture, food, biotechnology and pharmaceutical industries, as well as academic laboratories and other applications related to proteins (such as eukaryotic or bacterial secretory or membrane proteins and in particular integral membrane proteins), which are identifiable by a skilled person.
The details of one or more embodiments of the present disclosure are set forth in the description below. Other features, objects, and advantages will be apparent from the description and from the claims.
More in particular,
In particular,
Provided herein are methods allowing in several embodiments prediction and/or control of protein biogenesis, and related, computer-based models, methods and systems as well as recombinant proteins.
The term “protein” as used herein indicates a polypeptide with secondary, tertiary, and possibly quaternary structure. The protein's secondary, tertiary, and quaternary structure can occur on a variety of length scales (tenths of A to nm) and time scales (ns to s), so that in various instances the secondary, tertiary and possibly quaternary structures are dynamic and not perfectly rigid.
The term “polypeptide” as used herein indicates a polymer composed of two or more amino acid monomers and/or analogs thereof wherein the portion formed by the alpha carbon, the amine group and the carboxyl group of the amino acids in the polymer forms the backbone of the polymer. As used herein the term “amino acid”, “amino acid monomer”, or “amino acid residue” refers to any of the naturally occurring amino acids, any non-naturally occurring amino acids, and any artificial amino acids, including both D and L optical isomers of all amino acid subsets. In particular, amino acid refers to organic compounds composed of amine (—NH2) and carboxylic acid (—COOH), and a side-chain specific to each amino acid connected to an alpha carbon. Different amino acids have different side chains and have distinctive characteristics, such as charge, polarity, aromaticity, reduction potential, hydrophobicity, and pKa. Amino acids can be covalently linked to forma polymer through peptide bonds by reactions between the amine group of a first amino acid and the carboxylic acid group of a second amino acid.
The term “polypeptide” includes amino acid polymers of any length including full length proteins, as well as analogs and fragments thereof. The polypeptide provides the primary structure of a protein wherein the term “primary structure” of a protein refers to the sequence of amino acids in the polypeptide chain covalently linked to form the polypeptide polymer. A protein “sequence” indicates the order of the amino acids that form the primary structure. Covalent bonds between amino acids within the primary structure can include peptide bonds or disulfide bonds. Polypeptides in the sense of the present disclosure are usually composed of a linear chain of amino acid residues covalently linked by peptide bond. The two ends of the linear polypeptide chain encompassing the terminal residues and the adjacent segment are referred to as the carboxyl terminus (C-terminus) and the amino terminus (N-terminus) based on the nature of the free group on each extremity. Unless otherwise indicated counting of residues in a polypeptide is performed from the N-terminal end (NH2-group), which is the end where the amino group is not involved in a peptide bond to the C-terminal end (—COOH group) which is the end where a COOH group is not involved in a peptide bond. A C-terminal end of a polypeptide can be comprised within a “tail” of the protein which indicates a segment at the C-terminus of the protein. The term “segment” or “domain” as related to the protein indicates any continuous part of a protein sequence from single amino acid up to the full protein associated to an identifiable structure within the protein. An “identifiable structure” in the sense of the disclosure indicates a spatial arrangement of the primary structure or portions thereof which can be detected by techniques such as crystallography, hydrophobicity analysis or additional techniques known by a skilled person. In some instances a protein segment can comprise one or more secondary structures of the protein.
The “secondary structure” of a protein refers to local sub-structures with a repeating geometry identifiable within crystal structure of the protein, circular dichroism or by additional techniques identifiable by a skilled person. In some instances, a secondary structure of a protein can be identified by the patterns of hydrogen bonds between backbone amino and carboxyl groups. Secondary structures can also be defined based on a regular, repeating, geometry, being constrained to approximate values of the dihedral angles ψ and φ of the amino acids in the secondary structure unit on the Ramachandran plot. Two main types of secondary structure are the alpha helix and the beta strand or beta sheets as will be identifiable by a skilled person. Both the alpha helix and the beta sheet represent a way of establishing non-covalent hydrogen bonds between constituents of the peptide backbone. Secondary structure formation can be promoted by formation of hydrogen bonds between backbone atoms. Amino acids that can minimize formation of a secondary structure by destabilizing the structure of the hydrogen bonding interactions are referred to as secondary structure breakers. Amino acids that can promote formation of a secondary structure by stabilizing formation of hydrogen bonding interactions are referred to as structure makers.
Several sequential secondary structures may form a “supersecondary unit” or “structural motif.” A “supersecondary unit” or “structural motif” indicates a segment of the protein that forms an identifiable three-dimensional structure formed by adjacent secondary structure elements optionally linked by unstructured protein regions. In structural motifs the secondary structures are typically comprised with a same orientation one with respect to another. In particular some structural motifs (e.g. zinc fingers, a Greek key or helix-turn helix) are conserved in different proteins as will be understood by a skilled person.
The “tertiary structure” of a protein refers to the three-dimensional structure of a protein, stabilized by non-covalent interactions among non-adjacent segments of the protein and optionally by one or more additional compounds or ions interacting through covalent or non-covalent interactions with one or more segments of the proteins. Exemplary non-covalent interactions stabilizing the three dimensional structure of the proteins comprise non-specific hydrophobic interactions, burial of hydrophobic residues from water, specific tertiary interactions, such as salt bridges, hydrogen bonds, the tight packing of side chains, chelation and disulfide bonds and additional interactions identifiable by a skilled person. Exemplary covalent interactions among compounds or ions and segments of the protein comprise, N-linked glycosylation, cytochrome C haem attachment and additional interaction identifiable by a skilled person. In some instances, multiple proteins can form a protein complex, also called a multimer, with one or more identifiable three dimensional structures stabilized by non-transitory interactions between the multiple proteins. The three-dimensional structure of the protein complex is also called “quaternary structure” of the complex. Accordingly, the quaternary structure can be stabilized by some of the same types of non-covalent and covalent interactions as the tertiary structure as will be understood by a skilled person. Multimers made up of identical subunits are referred to with a prefix of “homo-” (e.g. a homotetramer) and those made up of different subunits are referred to with a prefix of “hetero-”, for example, a heterotetramer, such as the two alpha and two beta chains of hemoglobin. A “cofactor” indicates any molecule that forms non-transitory covalent or non-covalent interactions with a protein in vitro or in vivo. “Non-transitory interactions” as used herein indicates interactions between proteins or related segments—that are detectable by laboratory techniques such as immunoprecipitation, crosslinking and Forster Resonance Energy Transfer (FRET), crystallography, Nuclear Magnetic Resonance (NMR) and additional techniques identifiable by a skilled person.
Proteins can be identified by x-ray crystallography, purification and direct sequencing, immuno precipitation, and a variety of other methods as understood by a person skilled in the art. Proteins can be provided in vitro or in vivo by several methods identifiable by a skilled person.
Embodiments herein described relates to biogenesis of a protein. “Protein biogenesis” indicates a multistep biological pathway leading to the protein synthesis in a biological system, wherein the biological system indicates any system wherein protein expression is performed in connection with a biological membrane or a biomimetic environment. The biological membrane indicates enclosing or separating membrane that acts as a selectively permeable barrier within a biological cell. Exemplary biological membrane comprises the Endoplasmic Reticulum or mitochondrial membranes of eukaryotes and inner membrane of bacteria. The biomimetic environment indicates an amphipathic lipid bilayer or any other amphipathic lipid environment suitable to accommodate segments of a protein Exemplary biomimetic comprise micelles lipid cubic phase o, a proteolyposome and additional biomimetic identifiable by a skilled person. Exemplary biological systems comprise a eukaryotic cell, a prokaryotic cell, an archeal cell, or a cell-free system. A “cell-free” system indicates an in vitro system that contains the basic components required for the multistep biological pathway to take place outside a cellular environment. In some embodiments of biological systems of the disclosure, protein synthesis can occur via a process called translation, During translation, mRNA is read by ribosomes to generate a protein polypeptide chain. This process is performed by an array of components within the cell including ribosomes and transfer RNA (tRNA), which serves as an adaptor by binding amino acids on one end and interacting with mRNA at the other end; the latter pairing between the tRNA and mRNA ensures that an amino acid corresponding to a codon in the mRNA is added to the chain. The term “ribosome” as used herein refers to a minute particle consisting of RNA and associated proteins found in large numbers in the cytoplasm of living cells. Ribosomes bind messenger RNA and transfer RNA to synthesize polypeptides and proteins. Examples include the 80S ribosome in Eukaryotes. Translation usually proceeds in an N- to C-terminal direction with additional amino acids added by the ribosome to the C-terminus as determined by the mRNA sequence which encodes the primary structure of the protein. Once translated, a protein's primary structure can be modified by modifications of the polypeptide chain and/or amino acid monomers identified as posttranslation modifications. Post translational modifications can affect topogenesis and folding of a protein. Exemplary posttranslational modifications comprise protease digestion (e.g. secreted proteins containing signal sequences which can be cleaved), attachment of functional groups (such as acetate, phosphate, various lipids and carbohydrates), changes in the chemical nature of an amino acid (e.g. citrullination), formation of intrapolypeptide bonds (e.g. formation of disulfide bridges) and additional modification of the covalent bonds in the polypeptide chain and/or amino acid residues not performed by the ribosome The term “protein folding” as used herein indicates the creation of secondary, tertiary, and quaternary structure during and after translation. Protein folding is driven a wide array of forces such as the non-specific hydrophobic interactions and the burial of hydrophobic residues from water and specific tertiary interactions, such as salt bridges, hydrogen bonds, the tight packing of side chains, and disulfide bonds. Protein folding occurs by creating non-covalent interactions that increase the stability of the protein. The term “topogenesis” refers to the establishment of the topology of a protein. The term “topology” refers to the orientation of segments in regards to the membrane such that a given segment is either on the same side of the membrane as the inserter, the opposite side of the membrane as the inserter, or within the interior of the membrane. Topology can be determined by x-ray crystallography, NMR, FRET, crosslinking studies, and additional techniques identifiable by a skilled person. Folding of the protein can result in one or more tertiary structure depending on the primary structure, posttranslational modifications, biological environment where the folding occurs. Typically a specific tertiary structure is associated to one or more biological functions of the proteins detectable with techniques identifiable by a skilled person. Exemplary functions comprise catalysis, binding of one or more ligand, transport across the membrane and additional functions identifiable by a skilled person. Typically a protein expressed in a biological system that provides the native environment of the protein folds to form a native structure associated with one or more tertiary structures and a topology which characterizes a functionality of the protein in the native environment. Accordingly a native structure when formed gives the protein the ability to perform one or more required functions in the native environment. An example would be an enzyme that when folded to a tertiary structure and/or resulting in a topology that differ from the ones of the native structure, has a diminished capacity to perform its catalytic activity.
The term “express” as used herein in reference to “protein expression” indicates the production of a protein in a biological system with one or more defined topologies associated with a stabilized tertiary and/or quaternary structure of the protein. “Expression level” indicates the amount of an expressed protein that achieves a defined topology. Expression of a protein is typically associated with translocation of the protein to an appropriate destination in the cell or outside of the cell. For proteins that are membrane proteins or secretory proteins translocation typically comprises movement of the protein with respect to the biological membrane and includes passage of the protein or segments therefore through and/or into the membrane.
The term “co-translational translocation pathway” as used herein relates to a process where translocation of a protein across the membrane or integration into the membrane begins while the protein is still being synthesized on the ribosome through interaction of the nascent protein with a translocon. The term “post-translational translocation pathway” as used herein relates to a process where translocation of a protein across the membrane or integration into the membrane begins after the protein has been synthesized (e.g. by ribosome) through interaction of the protein with a translocon with the assistance of a SecA protein. The term “translocation” is intended to mean any translocon-associated process, such as any process mediated by the translocon, and in particular changes in the position of at least one protein segment relative to a membrane due to interaction of the protein or segments thereof with the translocon.
The term “translocon”, as used herein, indicates a protein complex that forms a channel in the biological membrane through which insertion of membrane proteins and translocation of a secreted proteins occur. Translocons comprise an internal core pore structure and lateral gates helices. The “lateral gate” and “lateral gate helices” indicate the area of the translocon, which opens to allow interaction between the core of the translocon and lipids in the membrane, facilitating transfer of protein segments into the membrane. A translocation process results in the protein crossing a hydrophobic lipid bilayer all or in part. Accordingly a translocon can be used to integrate nascent proteins into the membrane itself (membrane proteins) by passing segments of the protein typically comprising secondary structures such as alpha helices of the protein (e.g. nascent chain in a co-translational translocation pathway) through the lateral gate into the membrane. Eukaryotes can also have translocons associated with the chloroplast and the mitochondria. In prokaryotes, a translocon transports polypeptides across the plasma membrane or integrates membrane proteins. Known translocons include the heterotrimeric Sec61 protein complex in eukaryotes or SecYEG protein complex in bacteria. The major structural components of the translocon are transmembrane helices (TM), alpha helices that lie in the membrane or at least partially cross the membrane. TMs are composed mainly of hydrophobic resides. “Transmembrane segments” or TMS″ indicates segments that are primarily composed of transmembrane helices. Transmembrane helices can be connected by structured or unstructured segments herein called loops. “non-TM segment” indicates segments of membrane protein that are not part of a transmembrane helix. The translocon pore can open to allow passage of material across the membrane. The lateral gate of the translocon can also open to allow material to pass laterally from the interior of the channel into the membrane.
A “protein inserter” is defined as any molecular machine and in particular any protein or protein complexes (possibly comprising a nucleic acid moiety), that interacts with the protein to be translocated, provides confinement of the protein to be translocated on one side of the translocon while also providing a driving force for the movement of protein to be translocated through the translocon. This driving force comprises the creation of additional amino acids or portions thereof (e.g. side chains or alpha carbon) associated with the protein to be translocated and/or forces of restraint between the C-terminus of the protein to be translocated and/or mechanical forces of pushing the protein to be translocated towards the translocon. In particular, inclusion of a protein inserter in a biological system increases the rate of insertion of a polypeptide to the translocon in a processive fashion, which is detectable by measuring a rate of translocation for a protein undergoing translocation or of a portion thereof, as compared to the rate a same system without the protein inserter. Exemplary protein inserters are the ribosome (for co-translational translocation or membrane integration) or the SecA motor (for post-translational translocation or membrane integration).
The position of the protein inserter can be used as a reference point to identify sides of the membrane comprising the translocon where the side comprising a protein inserter can also be identified as “protein-inserter side of the membrane” while the side opposite to “protein inserter side of the membrane” of the membrane is called and the “Trans-protein-inserter side of the membrane”. Accordingly, a protein is inserted into the translocon by a protein-inserter that is positioned at one end of the translocon. Since the translocon spans the membrane, the non-membrane region of space is thus divided into a protein inserter side which is on the same side as the membrane inserter and a trans-protein-inserter side in that which is on the opposite side of the membrane from the protein inserter. Exemplary protein inserter sides comprise cytosolic side of an ER membrane having a luminal side as a trans-protein-inserter side.
In case of protein inserter formed by a ribosome translocating a protein through a translocon in the ER membrane occurs in the cytosolic space, so that in eukaryotic cells the protein inserter side in also indicated as the “cytosolic side of the membrane” and the non-protein inserter side is also indicated as the “luminal side of the membrane”.
Proteins that are transported across membranes through a translocon comprise proteins that contain one or more membrane spanning helices such as integral membrane proteins, proteins residing in the ER, periplasmic proteins, and extracellular proteins (e.g. secretory proteins).
Proteins that are transported across membranes through a translocon are targeted to the translocon by a signal sequence which can be a cleavable or not cleavable during or following translocation of the protein. Examples of proteins having a cleavable signal sequence are secretory proteins and type I membrane proteins as will be understood by a skilled person. A cleavable signal sequence typically comprise a hydrophobic stretch of 7-15 predominantly apolar residues, and then anchored in the membrane by a subsequent stop-transfer sequence, a segment of about 20 hydrophobic residues that halts the further translocation of the peptide and acts as a transmembrane anchor. Examples of membrane proteins having a non-cleavable signal sequence (signal anchor sequence) comprise type II and type III membrane proteins. A “signal anchor sequence” as used herein is generally longer than a cleaved signal (about 18-25 mostly apolar amino acids), since it spans the lipid bilayer as a transmembrane helix. A signal anchor sequence lacks a signal peptidase cleavage site and they can be positioned internally within the polypeptide chain. However, like cleaved signals, a signal anchor sequence induces the translocation of their C-terminal end across the membrane.
The term “secretory proteins” as used herein refers to a protein that is targeted to the translocon, passes through the translocon and results in a stabilized tertiary structure having no segment inserted into the membrane. In some instances, targeting of secretory proteins or other proteins (e.g. integral membrane proteins) to the translocon is performed by signal sequence “Signal sequence” or “signal peptide” indicates a protein segment that causes it to be targeted to the translocon. Examples of secretory proteins include collagen and insulin. Secretory proteins may be transiently attached the membrane due the integration of the signal sequence into the membrane. These proteins are distinct from membrane proteins. Cleavage often occurs between the signal sequence and the remainder of the protein (“cleavable signal sequence”). In some instances, however the signal sequence is not cleaved and anchors the protein to the membrane (“signal anchor sequence”).
The term “membrane protein” or “integral membrane protein” “IMP” or “transmembrane proteins” as used in the present disclosure indicate a protein including at least one transmembrane domain (TMD) or (TM) which indicates any protein segment which is thermodynamically stable in a membrane, as will be understood by a skilled person. In particular in integral membrane proteins a TMD is typically formed by a single transmembrane alpha helix. The translocon facilitates the insertion of alpha helical transmembrane proteins by movement of TMs trough the lateral gate. “Alpha helical membrane protein” indicates a membrane protein with at least one alpha helix TM.
Three types of membrane proteins can be distinguished based on topology and of the type of a signal sequence presented at the N-terminus or C-terminus of the protein. Type I, Type II, and Type III. The term “Type I” as used herein refers to membrane proteins that are initially targeted to the ER by an N-terminal, cleavable signal sequence. Examples of Type I include Glycophorin and LDL receptor. The term “Type II” as used herein refers to membrane proteins wherein a “signal-anchor sequence” is responsible for both insertion and anchoring. Examples of Type II proteins include Transferrin receptor and galactosyl transferase. The term “Type III” as used herein refers to membrane proteins which translocate their N-terminal end across the membrane. Examples of Type III proteins include Synaptotagmin, neuregulin, and cytochromes P-450.
Both membrane proteins and secretory proteins in some embodiments can be chimaeras, fusion proteins, wild type proteins and non-naturally occurring or engineered proteins. “Chimaera” or “chimera” indicates a protein or protein sequence produced by swapping segments between multiple protein sequences having a different protein primary structure one related to the other. A “fusion protein” indicates a protein or protein sequenced produced by attaching multiple domains or proteins in sequel. “Wild-type” in reference to a protein from a specific species refers to the protein with the same primary structure as what is found in nature in that species. A “non-naturally occurring” or “engineered” protein refers to a protein with a primary structure differing from a wild type protein at least by one amino acid residue.
As the protein to be translocated (e.g. the nascent chain) is inserted into the translocon, segments of the protein to be translocated (e.g. nascent chain) undergo either integration, retention or secretion. “Integration” indicates the partitioning of a protein segment into the membrane during translocon associated biogenesis. “Secretion” indicates the passage of a segment from the protein-inserter side of the membrane to the trans-protein-inserter side of the membrane during translocon associated biogenesis. “Retention” indicates the preservation of a protein segment on the protein-inserter side of the membrane during translocon associated biogenesis. The “degree” to which a particular segment undergoes integration, retention, or translocation occurs refers to the proportion of trajectories which end with that segment being integrated, retained, or translocated.
The term “Translocon Associated Biogenesis Feature” or “TABF”, refers to features of a protein that are due to the protein's interaction with a translocon and are detectable from the three dimensional structure of a protein undergoing or having completed translocation. Exemplary TABFS comprise topology of the translocated protein the frequency of any segment of the protein residing within the membrane, and the levels of protein expression.
In experimental settings TABF topology of the translocated protein and the frequency of any segment of the protein residing within the membrane, can be detected by topological mapping methods including tagging of the translocated protein with a label such as a fluorescent protein or a catalytic domain, substituted cysteine accessibility method, site specific label detection, deuterium exchange mass spectrometry, oxidative labeling and additional techniques identifiable by a skilled person based on the specific translocated protein observed.
In experimental setting TABF the levels of protein expression can be detected by measurement of activity of the protein, measurement of the amount of the protein translocated (e.g. by quantitative mass spectrometry isolation and measurement of the protein, concentration), tagging of the translocated protein with a label such as a fluorescent protein or a catalytic domain, observation of results from polyacrylamide gel electrophoresis or any other chromatographic techniques (e.g. liquid chromatography, gas chromatography, PAGE and additional techniques identifiable by a skilled person) and affinity techniques such as Western blot, immunoprecipitation and additional techniques identifiable by a skilled person.
The term “TABF determinants” indicates any feature of the protein primary structure or the biological environment that can have an effect on TABFs as can be detected in experimental settings and/or modeling settings by comparing the TABFs at issue for a protein at issue before and after a modification of one or more of the TABFs determinants.
TABF determinants can be classified into three types: TABF determinants associated to the primary structure of non-TM segments (non-TM segments TABF determinant), TABF determinants associated to the primary structure of TM segments (TMs) (TM segments TABF determinant), and TABF determinants associated with the biological system where the protein expression occurs and that do not depend on the protein primary structure (biosystem TABF determinants).
In particular, non-TM segments TABF determinants and TM segment TABF determinants (herein also indicated as sequence related or primary structure related TABF determinants) comprise features of the TM segments and non-TM-segment respectively and related residues which can affect one or more TABFs as will be understood by a skilled person
Exemplary non-TM segments TABF determinants and TM segments TABF determinant comprise:
Non-TM segment TABF determinants further comprise
TM segment TABF determinants and non-TM segment TABF determinant also comprise residue attributes of one or more residues of a non-TM segment. The term “residue attributes” refers to properties of an amino acid that distinguish the amino acid from other amino acids. Exemplary residues attributes that constitute TABF determinants comprise an amino acid charge, amino acid polarity, amino acid hydrophobicity (e.g., water/membrane transfer free energy), amino acid hydrogen bonding capability, amino acid aromaticity, amino acid size/excluded-volume, amino acid reduction potential, amino acid covalent bonding capability, and chelation ability or ligand binding ability.
In particular, with respect to “charge” of an amino acid residue, all amino acids can carry a charge on their carboxyl and amino groups. In addition their side-chain is either neutral or carries a positive or a negative charge. Arginine (R) and lysine (K) have a side-chain with an amino group that under physiological conditions can be positively charged. Glutamate (E) and aspartate (D) have side-chains with a carboxyl group that is negatively charged under physiological conditions. Histidine (H) has a secondary amine in its ring with a pKa of 6.5. Hence, histidines may also carry a positive charge but because they are not charged as often as R, K, E, and D, H is not always classified or treated as a charged residue. Non-naturally occurring amino acid can be charged based on the one or more moieties presented on the backbone alpha carbon, amine and/or carboxyl group.
As to the residue attribute “polarity” of an amino acid residue, in addition to charged side chains amino acids can also have side-chains that are neutral but polar. Serine (S), Threonine (T), and Tyrosine (Y) contain —OH as a functional group. Due to the oxygen's high electronegativity, shared electrons are shifted towards the alcohol group and thus producing a polar moment. The amino acids glutamine (Q) and asparagine (N) are also polar due to a terminal amide group. Non-naturally occurring amino acid can have a polarity based on the one or more moieties presented on the backbone alpha carbon, amine and/or carboxyl group.
With respect to the residue attribute “hydrophobicity” of an amino acid refers to the physical value that can be related to the amino acid residue attraction a non-aqueous solvent which is measurable by methods such as octanol/water partition (octanol scale) values (Wimley et al., 1996) and the free energy contribution of replacing a residue in a transmembrane segment with the residue whose hydrophobicity is being measured (interface scale) (Hessa et al., 2005) (2) (Hessa et al., 2007) (3). Exemplary hydrophobicity values are ΔG 0.50±0.12 in the octanol scale or ΔG 0.17±0.06 in the interface scale (for naturally occurring Alanine) as well as ΔG−2.09±0.11 in the octanol scale and/or ΔG−1.85±0.06 (for naturally occurring Tryptophan) The interior of the membrane is hydrophobic so hydrophobic residues are more stable in the membrane than hydrophilic residues. Hydrophobicity generally decreases with increasing polarity and charge.
Residue attribute “hydrogen bonding” relates presence or absence in an amino acid residue of a hydrogen attached to an electronegative atom such as Nitrogen (N), Oxytegen (0) and Fluorine (F) and/or and an atom that has a free electron pair, such as nitrogen, or oxygen in the amino acid. Typical energies for hydrogen bonds range between 4 and 13 kJoule/mol. Intramolecular hydrogen bonds of a polypeptide's backbone carboxy group with the backbone's amide group can provide stability to secondary structure elements.
Residue attribute “aromaticity” refer to presence or absence in the amino acid of a conjugated ring of unsaturated bonds, lone pairs, or empty orbitals that exhibit a stabilization stronger than would be expected by the stabilization of conjugation alone. The following naturally occurring amino acids contain aromatic side-chains: tyrosine (Y), tryptophan (W), phenylalanine (F), and histidine (H).
The residue attribute “excluded volume” is a measurement of the size of a molecule. In case of amino acids excluded volume refers to the volume that is inaccessible by water molecules as will be understood by a skilled person.
Residue attribute “reduction potential” is the measure of the ability of a chemical species to acquire electrons and therefore to be reduced, which can be measured by the amino acid residue tendency to acquire electrons as compared to a reference electrode (e.g. a saturated hydrogen electrode).
Residue attribute “covalent bonding capability” indicates the capability to engage in covalent bonds which is measurable by NMR or Mass spectrometry indicating formation or non-formation of a covalent bond with respect to a reference amino acid. For naturally occurring amino acid, cysteines provides an example of an amino acid residue having a covalent bonding capability in particular through the —SH group of the cys side-chains. It can form disulfide bonds with other cysteines and it can be exploited for crosslinking with other molecules that also contain a —SH group. Other side chains such as for example the primary amine in lysine or the carboxy groups in glutamate and aspartate can also be used to form covalent bonds with non-amino acids. This is often exploited in crosslinking experiments. Posttranslational modifications also utilize the covalent bonding capabilities of various amino acids, e.g, asparagine in the case of N-linked glycosylation.
Residue attribute “chelation ability” or “ligand binding ability” refers to an amino acid ability to reversibly bind ligands or ions. Atoms with free electron pairs have the potential to coordinate ions. The free electron pair of the secondary amide in histidine for example can be employed for this purpose, alone or in combination with additional amino acid residues having chelation ability (e.g. ability to chelate iron and nickel).
TABF determinants associated with the biological system where the protein expression occurs and that do not depend on the nascent protein sequence comprise:
In accordance with embodiments of methods and systems herein described, modification of TABF determinants resulting in modification of TABF for one or more proteins can be determined based on a model providing simulated trajectories of a protein with a sequence simulated with coarse-grain particles in a system comprising the protein, the translocon and a protein inserter, where the driving forces of the protein inserter can also be represented by creation of additional CG beads associated with the nascent protein. Forces of restraint between the C-terminus of the nascent protein and/or mechanical forces of pushing the protein to be translocated (e.g. a nascent protein) towards the translocon can also be taken into account.
The term “coarse-grain particles”, “CG particles”, or “CG beads”, as used herein is defined as an entity in the simulation method that has coordinates, interactions, and properties. CG particles can be time-evolved in the simulation approach and are used to describe the protein-inserter, the nascent protein, and the translocon. CG particles represent at least one atom, and can be used to represent groups of atoms if loss is detail is required to make the simulations tractable, as is also done in the example section where one CG particle represents three amino acids. CG particles have a bead type; the bead type of a particle determines the interactions it has in the context of the simulated system.
A “simulation trajectory” as used herein refers to the exact coordinates of the CG particles that are time evolved in the simulation method over the course of the simulation. For each CG particle the simulation trajectory describes the exact location in space at every point in time for the simulated event. For CG beads that can correspond to more than one discrete state (such as the open/closed states of the CG beads for the translocon lateral gate helices), then the trajectory additionally describes the discrete state for the CG beads at every point in time for the simulated event. The trajectory data completely specifies the nascent chain TABF. The simulation data produced by the computational methods described herein can be assessed to determine the TABF for a given amino acid sequence. The coordinates of the protein undergoing or having completed translocation can be obtained from the simulated trajectory, during and after the translocation event at any point in time. The TABFs are fully specified by the coordinates of the protein undergoing or having completed translocation (e.g. a nascent protein in a co-translation translocation pathway). In the case of integration for any segment of the protein, the coordinates of all particles in that segment of the protein will be inside the lipid region, as shown per example in
The fraction of simulations for a given sequence that exhibit desired TABFs serves as a metric for the propensity of the simulated conditions to lead to the desired TABFs. A desired TABF in the sense of the disclosure indicates a TABF that meets a set of specified requirements or criteria.
The TABF after the translocon-associated biogenesis has been completed can be assessed by considering the coordinates of the protein sequence at a point in time where translocon associated biogenesis has fully completed in the simulation, for example 30 seconds after the last amino acid has exited the ribosomal exit tunnel, as was done in the specific example concerning TatC, or after the nascent protein particles are all at a minimum distance away from the translocon, for example 16 nm as was done in the simulations discussed in connection with the description of
From the simulation trajectory of the TABF protein expression levels, are determined by assessing the fraction of independent translocon-associated protein trajectories that exhibit the topology in which the protein should express. For example, in the case of TatC and YidC, this is the topology in
In embodiments herein described methods and systems can be used to predict and/or control TABF for a protein expressed in homologous or heterologous systems. A heterologous expression is defined as an expression of a protein from one species in an expression system of another species. A homologous expression is defined as an expression of a protein from one species in an expression system of the same species.
In the present disclosure, a computer-based method is described, which uses a coarse-grained (CG) simulation method that enables simulation of the translocon and its associated macromolecular components on timescales beyond the scope of previously employed methodologies. In particular, according to such method, ribosomal translation and membrane integration of nascent proteins can be simulated.
Reference will now be made to a two-dimensional (2D) embodiment of the computer-based method according to the disclosure.
A “kinetic pathway” is a notion that can be used to analyze ensembles or sets of simulated trajectories. Suppose that the space associated with all possible configurations/positions for the nascent protein is divided into a collection of subspaces (which can be called “macrostates”). Then, any given trajectory will pass through a series of these macrostates. This series of macrostates followed by the trajectory is the “kinetic pathway” that the trajectory has followed. Depending on how the macrostates are chosen, the different information about the trajectories will be extracted from analyzing the kinetic pathways, such as whether the protein underwent a flipping transition, or whether it first passed into the membrane interior before undergoing secretion across the membrane.
As shown in
A simplification employed in some embodiments of the simulation method is the projection of the nascent protein dynamics onto the plane that passes along the translocon channel axis and between the helices of the LG as shown in
Coarse-grained representations reduce the cost of calculations and assist in making tractable the minute-timescale trajectories for protein translocation and membrane integration.
Parameterization of the simulation method utilizes molecular dynamics (MD) simulations and transferable experimental data. Free energy calculations and direct MD simulations determine the energetics and timescales of LG opening, including the dependence of the LG energetics on the nascent-protein amino acid sequence; microsecond-timescale all-atom simulations and experimental measurements determine the diffusive timescale for the CG representation of the nascent protein. Experimental amino acid water/membrane transfer free energies determine the solvation energetics of the CG nascent protein residues.
The method employed in accordance with the embodiments of the present disclosure is also described in the paper Long-Timescale Dynamics and Regulation of Sec-Facilitated Protein Translocation, B. Zhang and T. F. Miller, Cell Reports 2, 927-937 and S1-S24, Oct. 25, 2012, which paper is incorporated herein by reference in its entirety.
The person skilled in the art will understand that the method employed in accordance with the embodiments of the present disclosure has some limitations. In addition to enforcing planar constraints on the motion of the nascent protein, the method provides a coarsened representation for nascent-protein, translocon, and membrane bilayer that includes only simple aspects of electrostatic and hydrophobic driving forces. Potentially important details of residue specific interactions are thus neglected. Backbone interactions along the nascent protein chain are also neglected, such that effects due to the onset of nascent protein secondary structure are ignored, and effects due to translocon conformational changes other than LG motion are not explicitly included. Moreover, the possible roles of membrane-bound chaperones or oligomerization of the translocon channel are not considered. In principle, the simulation method can be modified to incorporate greater accuracy and detail, as well as additional complexity and computational expense. The embodiment described in detail below provides a minimalist description of Sec-facilitated protein translocation and membrane integration.
The protein nascent chain is represented as a freely jointed chain of particles or beads, where each bead represents, according to a specific embodiment of the disclosure, 3 amino acids and has a diameter of 8 Å, the typical Kuhn length for polypeptide chains (Hanke et al., 2010) (4) (Staple et al., 2008) (5). A similar representation is used to describe the Sec translocon, the hydrophobic membrane interior and confinement effects due to the translating ribosome.
The beads are constrained to the plane that lies normal to the lipid bilayer membrane and that bisects the translocon channel interior and the LG helices. CG beads corresponding to the residues of the translating nascent chain evolve subject to overdamped Brownian dynamics, whereas beads representing the Sec translocon (101, 102) and the docked ribosome (100) are fixed with respect to the membrane bilayer. To explicitly incorporate the conformational gating of the translocon LG helices, beads representing the LG helices (102) undergo stochastic transitions between closed-state interactions, which occlude the passage of the nascent chain from the Sec channel to the membrane interior, and open-state interactions, for which the steric barrier to membrane integration is removed. Structural features of the channel and ribosomal confinement are obtained from crystallographic and electron microscopy studies. Exemplary positions for the translocon and ribosome beads are shown in the table of
A general mode of operation of the computer-based method of the present disclosure is shown in the diagram of
In particular, a set of TABF determinants (210) and initial coordinates (220) are input to the computerized system (230) according to the disclosure. Forces of the CG particles (beads) are calculated (240) based on interaction potentials. By way of example, bond interactions, non-bonded interactions, electrostatic interactions and solvation energies are calculated. The protein CG particle positions based on the forces as calculated in (240) are then updated (250), and any stochastic transitions between states of the CG beads (e.g., opening and closing of the lateral gate) are performed, thus providing a trajectory as a function of time. The coordinates of the entire system at any point in time along a trajectory are represented by a matrix of numbers that specify the spatial positions and the state (i.e., LG open or closed) of each CG bead. A full trajectory is thus represented as a time-ordered series of these matrices of numbers. By way of example, such time evolution of the protein can be simulated using Brownian dynamics.
The system can further simulate lateral gate opening corresponding to an opening appearing between the translocon interior and the membrane interior through a stochastic simulation (260). A determination is then made (270) if the simulated protein trajectory is complete. By way of example, the protein trajectory is completed i) following completion of translation/insertion of the full protein sequence (i.e. when the C-terminus of the protein is released from the protein inserter (e.g. the ribosome) at the end of insertion) or ii) prior to completion of translation/insertion of the full protein sequence (i.e. when the C-terminus of the nascent protein is still attached to the protein inserter.
If the simulation is not complete (NO output of step 270), ribosomal translation is performed if required (280), and steps (240-270) are performed again. Once the simulation is complete (YES output of step 270), the translocon associated protein trajectory is generated (290), either by way of graphical representation or as a set of spatial representations as a function of time.
A TABF (300) is determined from the protein trajectory (290). In particular, TABF (300) is determined from the final configuration of the protein (i.e. the spatial distribution of the CG particles) at the end of the simulated trajectory. From this final configuration of the protein, the following can be determined:
(1) The partitioning between protein integration and protein secretion, i.e. whether any given segment of the protein was “secreted” (i.e. passed across the membrane to the lumenal side), was “retained” (i.e. remains on the cytosolic side of the membrane), or was “integrated” (i.e. occupies the spatial region of the membrane).
(2) The topology of the protein, i.e. whether any integrated segment of the protein exhibits (a) a “Type II topology” in which its N-terminal end is in the cytosolic side of the membrane and its C-terminal end is in the lumenal side of the membrane, (b) a “Type III topology” in which its C-terminal end is in the cytosolic side of the membrane and its N-terminal end is in the lumenal side of the membrane, or (c) alternative topologies in which both the N-terminal and C-terminal ends of the segment are on the same side of the membrane.
(3) The expression level of the protein, i.e. the percentage of a given protein to get successfully expressed. In particular, the amount of successful protein expression corresponds to the fraction of simulated trajectories for a given protein that lead to final configurations in which the protein exhibits a “desired” topology, i.e. whether the membrane-integrated segments of the protein traverse the membrane with the intended combination of Type II vs. Type III topologies. Incidentally, the criteria for reaching the desired topology can be applied to all or any subset of the integrated segments. For example, one might require that all of the membrane-spanning protein segments match specified topologies in order for the configuration to qualify as having the desired topology. Alternatively, one might only require that one or a subset of the membrane-spanning segments match specified topologies.
Reference will now be made to some specific embodiments of the methods for evaluating the CG particle potentials, the CG particle time evolution, the lateral gate helices time evolution and the ribosomal translation.
1. CG Particle Interaction Potentials
It should be noted that for each type of interaction potential described there are alternative descriptions possible, as will be apparent to a skilled person. The equations provide a specific example of an embodiment of the claimed simulation method.
CG particles that share a covalent bond are connected using a finite extension nonlinear elastic (FENE) potential (Kremer and Grest 1990) (6);
where κ=7ε/σ2, R0=2σ and ε=0.833kBT. The parameters should be adjusted depending on the extend to which the nascent chain is coarse grained, the parameters provided here are suitable for CG particles representing 3-4 amino acids. R0 corresponds to the maximum length the covalent bond can reach, and κ is a force constant.
In general, covalent linkages between atoms or amino-acid residues would correspond to additional linkages between the corresponding CG beads. Additionally, the person skilled in the art will understand that the use of a FENE potential is one example of a potential, which could also include a harmonic potential, quartic potential, Morse potential, rigid-linkages, or many other functional forms that prescribe some energetic penalties upon deviation from particular inter-bead distances.
Short-range, pairwise, non-bonded interactions between pairs of CG particles are described by the Lennard-Jones potential,
with εcr the value of the Lennard-Jones potential at the right cut-off radius, rcr. Values for ε, rcl, and rcr depend on the particles involved in the interaction (
Electrostatic interactions are simulated using the Debye-Hückel potential,
where the Debye length κ=1.4σ, and qi is the charge of bead i. The Debye length can be modified depending on the electrostatic screening in the simulated system, i.e. changes in the salt concentration, a value of κ=1.4σ corresponds to electrostatic screening under physiological conditions. When a pair of CG particles does not have strong repulsion, as occurs between the nascent protein and the lateral gate when the lateral gate helices are in the open configuration, the Debye-Hückel potential is capped from bellow to avoid the singularity that would otherwise occur, such that
Also in this case, that electrostatic interactions and non-bonding interactions need not only be described using Debye-Huckel or Lennard-Jones potentials. A great variety of other pairwise potentials that incorporates the associated energy scales and lengthscales could also be employed.
Solvation energetics for each CG bead are described using the position-dependent potential energy function
where g is the water-lipid transfer free energy of the CG bead, b=0.25σ is the switching lengthscale and ϕ and ψ indicate the borders of the membrane region. All beads feel a membrane potential with ϕx=−2.0σ, ψx=2.0σ, ϕy=−1.5σ and, ψy=1.5σ, hydrophilic beads feel an additional core-membrane potential with ϕx=−1.0σ, ψx=1.0σ, ϕy=−2.5σ and, ψy=2.5σ. Additional position dependent potentials acting on specific beads can be implemented, as was done in specific cases presented here, where there are additional position dependent potentials they are described.
2. CG Particle Time Evolution
In the specific examples the time evolution of the nascent chain is simulated using overdamped Brownian dynamics with a first order Euler integrator (Stoer and Bulirsch, 2002) (9),
where xi(t) is a single Cartesian degree of freedom for nascent chain bead i at time t, U(x(t)) is the potential energy function for the full system, β=1/kBT, D is the diffusion coefficient, and η is a random number drawn from a Gaussian distribution with zero mean and unit variance. In the specific embodiments of the method described herein D=768.0 nm2/σ, this value of the diffusion coefficient agrees with atomistic simulations, and available experimental data, but the simulation method is robust to variations in this parameter. In the 2-dimensional embodiment of the method each nascent chain CG particle is time-evolved in 2-dimensions, while in the 3-dimensional embodiment of the method each nascent chain particle is time-evolved in 3-dimensions. Although the example described in the present paragraph was found suitable for obtaining translocon-associated protein trajectories, the person skilled in the art will understand that a different description of the dynamics is possible.
3. Lateral Gate Helices Time Evolution
Conformational gating of the translocon lateral gate helices corresponds to the lateral gate helices moving out of the plane of confinement for the CG beads in the 2-dimensional embodiment of the simulation method, allowing the nascent chain to pass into the membrane bilayer. In the 3-dimensional embodiment of the simulation method lateral gate opening corresponds to an opening appearing between the translocon interior and the membrane interior. The rate of stochastic LG opening and closing is dependent on the sequence of the nascent protein CG particles that occupy the translocon channel;
and,
where the timescale for lateral gate opening and closing events, τLG=500 ns, and ΔGtot is the free energy cost associated with LG opening. The free energy cost for LG opening is given by
where ΔE is the difference in interaction energy between the nascent chain beads and the translocon in the open conformation compared to the interaction between the nascent chain beads and the translocon in the closed conformation, ΔGTF is the water-lipid transfer free energy for nascent chain beads inside the channel, ΔGempty=16ε is the free energy cost for opening the LG in the empty channel, and Xempty is the fraction of the channel that does not contain nascent chain particles. The value for ΔGempty depends on the translocon that is included in the simulation system; a value of 16ε was used here to simulate the translocon for E. coli and S. cerevisiae. It should be noted that other more general hydrophobic-hydrophilic transfer free energies can be taken into account, where ΔGTF represents a contribution to the free energy of the LG opening that depends on the position and attributes of the CG beads, such as charge, hydrophobicity and size. Additionally, it should also be noted that while in some embodiments the sum in the above equation is extended over only the CG beads within the channel, other embodiments can take into account contributions of CG beads at other positions as well.
4. Ribosomal Translation
Where applicable, ribosomal translation can be simulated by adding bead to the nascent chain sequentially starting from the N-terminal end of the nascent chain amino acid sequence and continuing to the C-terminal end of the nascent chain amino acid sequence. Translation can be stalled at any point along the nascent chain sequence, or can be continued until the entire nascent chain sequence has been translated, at which point the nascent chain is released from the ribosome. After release the ribosome can be kept present, or can be allowed to dissociate. Dissociation of the ribosome is modeled by removing interactions between the nascent chain CG particles and the CG particles describing the ribosome. Beads that have yet to be translated are not time evolved, and are not interacting with other beads in the system. Various rates of translation have been tested using the simulation method, in a range from 6 res/s to 24 res/s, the rate of translation is system dependent, and can be chosen to be distinct for each CG bead type.
Dissociation of the protein-inserter can be modeled by eliminating interactions associated with the protein-inserter CG beads. In the embodiment of the method where the ribosome is utilized as protein-inserter, ribosomal translation proceeds at a pace of approximately 10-20 amino acid residues per second (res/s) (Bilgin et al., 1992) (10) (Boehlke and Friesen, 1975) (11), although this rate can be reduced approximately 4-fold upon addition of cycloheximide (Abou Elela and Nazar, 1997) (12) (Goden and Spiess, 2003) (13). Exemplary ribosomal translation rates in the range of 6-24 res/s (2-8 beads/s) have been considered in the embodiments of the present disclosure. Other protein-inserters and different translation rates can be simulated using the same embodiment of the method by introducing minimal changes in the parameters of the method, as will be understood by a skilled person.
The binding immunoglobulin protein (BiP) is an essential component of the eukaryotic Sec translocon (Brodsky et al., 1995) (14). However, explicit inclusion of BiP binding within the simulation method gives rise to only modest effects in the calculated results for protein translation and membrane integration. Unless otherwise stated, explicit BiP binding has not been included in the simulations performed by the inventors.
The simulation methods described herein can be utilized for simulations in 2-dimensions, or in 3-dimensions. Time evolution of the nascent chain, interactions between CG particles, and dynamics of the lateral gate helices are not affected by changes in the dimensionality of the method. The coordinates of the protein inserter and the translocon are altered in order to describe the appropriate geometry in 3-dimensions. Reference can be made, for example, to
Particles in the simulation methods can represent single atoms, with the corresponding atomistic interaction potentials, or multiple atoms, with corresponding CG interaction potentials as described herein.
In particular, for implementations of the simulation method in which each CG bead corresponds to one or more amino acid residues, then the primary sequence of the nascent protein is reflected in the connectivity of the CG beads into a linear chain. For implementations in which the CG beads correspond to collections of atoms that are smaller than a single amino acid (such as individual atoms, or such as backbone moieties and side-chain moieties), then a branched connectivity of the chain of CG beads can be employed.
The connectivity of the CG beads should reflect the connectivity of the covalent bonds in the underlying protein sequence. For example, for the case in which the CG beads correspond to either backbone or side-chain moieties of the nascent protein, then the nascent protein should be represented in terms of a linear chain of CG beads associated with the backbone residues and with each backbone moiety connected to an additional CG bead that is associated with the corresponding side-chain moiety. As another example, for the case in which the CG beads correspond to individual atoms, then the nascent protein should be represented in terms of a chain of CG beads with linkages that correspond to the connectivity the covalent bonds in the physical protein.
Based on the teachings of the present disclosure, the person skilled in the art can also envision versions of the simulation method in which some segments of the nascent protein are modeled at a more coarsened level (with multiple amino-acid residues per CG bead, for example) whereas other segments of the nascent protein are modeled at a less coarsened level (with only a single amino-acid residue per CG bead, for example).
Some embodiments of the present disclosure can describe implementations for which the constituents of the solvent and membrane environments are represented as spatial fields (as in Equations 5 and 6), as opposed to representing those constituents in terms of CG beads. However, the person skilled in the art will understand that this need not be the case. In particular, the simulation method can be implemented using CG beads so that the constituents of the spatial regions corresponding to “protein-inserter side” and the “trans-protein-inserter side” of the membrane are described in terms of CG beads (e.g., a CG bead for each water molecule or a set of water molecules, a CG bead for each solvated ion or a solvated ion and a set of water molecules, or one CG bead for each atom, thus providing full resolution at the atomic scale). Similarly, further embodiments of the present disclosure can use CG beads to describe the constituents of the spatial regions corresponding to the membrane (e.g., a CG bead for each lipid molecule and each other membrane constituent (such as cholesterol molecules), a CG bead for the head-group of each lipid molecule and other CG beads for all or part of the tail-groups of the lipid molecules, or one CG bead for each atom, thus providing full resolution at the atomic scale).
Additionally, embodiments are provided in the present disclosure where the translocon and the protein inserter (such as the ribosome) are described in terms of CG beads that represent groups of amino-acid residues or groups of nucleic-acid residues (while the description of BiP and other lumenal factors is even more coarse in nature). However, as above, finer-resolution implementations of the simulation method could be straightforwardly employed, by having the CG beads for the translocon and ribosome correspond to smaller groups of atoms, including (i) individual amino-acid residues or nucleic-acid groups, (ii) groups of atoms that correspond to subsets of the amino-acid or the nucleic acids, or (iii) individual atoms in the translocon, protein inserter, and the other co-factors. As above, the connectivity of the CG beads would reflect the connectivity of the atoms in the associated physical biomolecules.
Each simulation yields a simulation trajectory with coordinates of the simulated system. The coordinates fully specify the resulting TABF. Multiple independent trajectories can be simulated for the simulated system an ensemble of initial conditions to provide statistical distributions of the possible resulting TABF of the simulated system. Lumenal co-factors (such as BiP) can be explicitly included in the simulation methods and their effect on TABF can be assessed by analysis of the resulting trajectories.
The simulation method described in the above paragraphs enables simulation of the translocon and its associated molecular components on timescales beyond the scope of previously employed methodologies. The simulation method explicitly describes the configurational dynamics of the nascent protein chain, conformational gating in the Sec translocon, and the slow dynamics of ribosomal translation.
In accordance with the present disclosure, such method can be used to perform minute-timescale CG trajectories to investigate the role of the Sec translocon in governing both stop-transfer efficiency (i.e. propensity of transmembrane domains (TMD [TODO: change throughout]) to undergo integration into the cell membrane versus secretion across the membrane) and integral membrane protein topogenesis (i.e. the propensity of TMD to undergo membrane integration in the (Ncyt/Cexo) orientation versus the (Nexo/Ccyt) orientation).
Computer-based simulations performed with the computerized method according to the present disclosure can provide a direct probe of the mechanisms, kinetics and regulation of Sec-facilitated protein translocation and membrane integration. In particular, analysis of the full ensemble of nonequilibrium CG trajectories reveals the molecular basis for experimentally observed trends in integral membrane protein topogenesis and TMD stop-transfer efficiency; it demonstrates the role of competing kinetic pathways and slow conformational dynamics in Sec-facilitated protein targeting; and it provides experimentally testable predictions regarding the long-timescale dynamics of the Sec translocon.
5. Direct Simulation of Cotranslational Protein Integration
Signal peptide (SP) orientation is a determining factor in integral membrane protein topogenesis. The orientation of N-terminal signals helps to establish the topology of multidomain integral membrane proteins and to dictate whether N-terminal or C-terminal domains undergo translocation across the membrane. Biochemical studies have established the dependence of SP orientation upon a range of factors, including SP flanking charges, SP hydrophobicity, protein mature domain length (MDL), and the ribosomal translation rate. In accordance with the present disclosure, the simulation method described above is employed to directly simulate co-translational protein integration and to determine the molecular mechanisms that give rise to these experimentally observed relations.
In particular, the process in which co-translational integration of a signal anchor protein yields either the type II (Ncyt/Cexo) or type III (Nexo/Ccyt) orientation of the uncleaved SP domain is considered.
Integration of proteins that vary with respect to both SP sequence and MDL is considered. Three different kinds of SP are being considered: an SP composed of a canonical sequence of CG beads (RL4E), an SP composed of a sequence in which the positive charge on the N-terminal group is eliminated (QL4E), or an SP composed of a sequence with enhanced SP hydrophobicity (RL6E).
To model the hydropathy profile of the engineered protein H1ΔLeu22H1ΔLeu22, proteins that include a hydrophilic mature domain with a hydrophobic patch near the SP are being considered. Specifically, the protein mature domain is modeled using the Q5LQn sequence of CG beads, such that the total peptide length ranges from 30 to 80 beads (90-240 residues [res]). The sensitivity of protein topology to hydrophobic patches on the mature domain is exemplified in
CG trajectories of the above described method are continued until the protein nascent chain reaches either type II or type III integration. Depending upon the rate of ribosomal translation and the MDL, each CG trajectory thus ranges from 2 to 20 s of simulation time; the corresponding CPU time required to perform each trajectory is approximately 0.2-10 hrs. Each data point in
6. Competition Between Kinetic Pathways Governs Topogenesis
Inspection of the ensemble of CG trajectories reveals multiple kinetic pathways by which the protein nascent chain achieves type II or type III integration (
To analyze the flow of trajectories among these competing mechanisms, the CG trajectories are categorized according to the chronology with which they pass through the states a-g in
Differences between the RL4E and QL4E data sets in
Comparison of the data for the RL4E and RL6E sequences in
Differences between the RL6E and RL6E-slow data sets in
These differences are remarkable since they involve no change in the interactions of the system. The shifts in SP topology (
The decrease in type III integration upon slowing translation arises from the important role of the flipping transition from state c to state f, which enables the nascent chain to reach the more thermodynamically favorable configurations associated with the Ncyt/Cexo SP orientation.
The final trend left to explain in
With increasing MDL (
7. Loop Versus Flipping Mechanisms
Observation of competing pathways for type II integration is an unexpected and significant feature of the CG simulations described in the present application. The observed coexistence of the loop and flipping mechanisms in the CG simulations according to embodiments of the present disclosure helps to reconcile experimental findings in prior literature, and it provides a basis for understanding the competing influences of SP hydrophobicity, SP charge distribution, MDL, and ribosomal translation rate in regulating Sec-facilitated type II and type III protein integration.
In assessing the role of the type II flipping mechanism in physiological systems, it is noted that many naturally occurring proteins exhibit longer N-terminal domains and less hydrophobic SP than the protein sequences considered in the present disclosure. Attention is drawn to
8. Additional Validation and Prediction for Protein Topogenesis
8A. Hydrophobic Patches in the Mature Domain
It should be emphasized that the SP flipping transition gives rise to a slow timescale in type II membrane integration that leads to characteristic trends in protein topogenesis, and the hydrophobic patches in the mature domain play a significant role in the simulation method of facilitating this flipping transition.
8B—Charged-Residue Mutations on the Translocon
8C—Charged-Residue Mutations on the Nascent-Protein Mature Domain (Multispanning Protein Example)
One of the most remarkable recent experimental results on protein topogenesis is that distant C-terminal residues can control the overall topology of a multispanning integral membrane protein. In
Membrane integration of Proteins 1 and 2 is directly simulated using the same membrane topogenesis protocol described above. The CG trajectories are terminated when all of the following criteria are met: (i) ribosomal translation is completed, (ii) all three TMDs span the membrane (see
8D—Positive Versus Negative N-Terminal Changes on the Nascent Protein
The results in
As is seen in panel
9. Regulation of Stop-Transfer Efficiency
In addition to facilitating the translocation of proteins across the phospholipid membrane, the Sec translocon plays a key role in determining whether nascent protein chains become laterally integrated into the membrane. Strong correlations between the hydrophobicity of a TMD and its stop-transfer efficiency have led to the suggestion of an effective two-state partitioning of the TMD between the membrane interior and a more aqueous region. However, models for this process based purely on the thermodynamic partitioning of the TMD do not account for the experimentally observed dependence of stop-transfer efficiency on the length of the protein nascent chain, nor would such models anticipate any change in TMD partitioning upon slowing ribosomal translation. Furthermore, recent theoretical and experimental work point out that the observed correlations between stop-transfer efficiency and substrate hydrophobicity can also be explained in terms of a kinetic competition between the secretion and integration pathways under the substrate-controlled conformational gating of the translocon.
To further elucidate the mechanism of Sec-facilitated regulation of protein translocation and membrane integration, the simulation method according to the present disclosure has been employed to directly simulate cotranslational stop-transfer regulation and to analyze the role of competing kinetic and energetic effects, as detailed in the following paragraphs.
10. Direct Simulation of Co-Translational TMD Partitioning
Following recent experimental studies, the cotranslational partitioning of a stop-transfer TMD (i.e., the H-domain) is considered, where the protein nascent chain topology is established by an N-terminal anchor domain. Stop-transfer efficiency is defined as the fraction of translated proteins that undergo H-domain membrane integration, rather than translocation.
The translated protein sequence is comprised of three components, including the N-terminal anchor domain, the H-domain, and the C-terminal tail domain. In all simulations, the N-terminal anchor domain includes 44 type-Q CG beads that link the H-domain to an anchor TM that is fixed in the Ncyt/Cexo orientation (
Stop-transfer efficiency is studied as a function of the hydrophobicity of the H-domain, the C-terminal tail length (CTL), and the ribosomal translation rate. For the purposes of an embodiment of the present disclosure, CTL has been considered in the exemplary range of 5-45 beads (15-135 residues), and water-membrane transfer free energies for the H-domain has been considered in the exemplary range of ΔG/kBT=[−5,5]ΔG/kBT=[−5,5], where ΔGΔG corresponds to the sum over the individual transfer free energies of the CG beads in the H-domain.
CG trajectories are initialized with the H-domain occupying the ribosome-translocon junction, prior to translation of the C-terminal domain (
In
PI(ΔG)=(1+exp[−βαΔG+γ])−1, [Equation 111]
where α=−0.80, γ=0.29 and β=(kBT)−1 is the reciprocal temperature, see also
Panels B1-B4 of
In each case, the 95% certainty threshold for the sigmoidal fit is also indicated. The cases shown in panels B1-B3 OF
11. Origin of Hydrophobicity Dependence in TMD Partitioning
In addition to the dominant pathways depicted in
The results in
It should be noted that a mechanism involving local equilibration of the H-domain between the translocon and membrane interiors is consistent with the interpretation of recent experimental studies of stop-transfer efficiency. However, the analysis presented in this disclosure additionally reconciles the roles of both kinetic and thermodynamic effects in governing stop-transfer efficiency, and provides a basis for understanding the lateral shifting of the sigmoidal curves both in
12. Kinetic and CTL Effects in TMD Partitioning
The direction of the lateral shifts of the curves in panels B1-B4 of
13. Additional Validation and Prediction for Stop-Transfer Efficiency
13A—Hydrophobic Patches in the C-Terminal Domain
13B—Charged-Residue Mutations Flanking the H-Domain
Experimental studies have also found that charged residues flanking the nascent-protein H-domain affect stop-transfer efficiency.
As is seen in
13C—Dependence of Protein Translocation Time on Nascent Protein Hydrophobicity
Previous stop-transfer experiments have concluded that hydrophobic nascent-protein segments exhibit stalling, or pausing, in the translocon channel. Protein translocation modeling has also led to the prediction that hydrophobic segments retard translocation due to lateral partitioning.
For hydrophilic and amphiphilic H-domain sequences ΔG>−2kBT, the simulation method predicts relatively weak dependence of the protein translocation time on the H-domain hydrophobicity (
14. Model Parametrization and Validation
14A. CG Bead Transfer Free Energies and Charges
Transfer free energy (FE) values for bead-types R, E, L, Q, V, P used in the embodiments of the present application (see Table of
14B—Translocon Geometry and Charges
The positions of the CG beads that model the Sec translocon (see Table of
14C—Ribosome Geometry
Confinement effects due to the ribosome (or any protein inserter) are explicitly included in the simulation method (see
In particular, since the protein inserter is modeled as an enclosure of CG beads (as well as a point at which newly-translated CG beads of the nascent protein appear), then there are physical effects that are predicted by the simulation method as a result of the nature of this enclosure of the CG beads. Such effects are referred to as “confinement effects.” For example, only so many CG beads of the nascent protein can fit in the ribosomal enclosure on the cytosolic side of the membrane.
Electron microscopy (EM) structures of the ribosome in complex with the translocon reveal a large lateral opening above the cytosolic cup of the translocon, which is about 20 Å wide. The simulation method likewise includes a ribosomal enclosure that is of comparable size with respect to the volume occupied by nascent chain residues in the CG representation. Near the translocon LG, the ribosomal enclosure is partially open to the cytosol, as is seen in the EM structures. This opening prevents steric hindrance of membrane integration in the simulation method and enables access of the protein nascent chain to the cytosolic exterior of the membrane. The description of this geometry in the 2-dimensional embodiment of the simulation method is provided in
14D—Timescale for LG Opening
The opening and closing of the translocon LG is modeled stochastically with rates defined in Equations in the text above. In these expressions, the parameter τLG corresponds to the timescale for attempting LG opening or closing events. As in classical rate theory, this attempt timescale is related to the timescale required for the system to transiently pass between the open and closed configurations for the LG, which has been observed in previous MD simulations of translocon/peptide substrate/membrane systems. In a previous work by the inventors, it was shown that spontaneous translocon LG closing in the presence of a peptide substrate occurs on the timescale of approximately 300-500 ns. To explore the robustness of the simulation method to this parameter, the dependence of the type II integration has been calculated as a function of MDL for the RL6E SP sequence (see
14E—FE for LG Opening
In the simulation method according to the present disclosure, a simple relationship between LG energetics and substrate hydrophobicity is being used, as described in Equation 10 and as also described in pages S3-S4 of the paper Long-Timescale Dynamics and Regulation of Sec-Facilitated Protein Translocation, B. Zhang and T. F. Miller, Cell Reports 2, 927-937 and S1-S24, Oct. 25, 2012, incorporated herein by reference in its entirety.
14F—CG Bead Diffusion Coefficient
The diffusion coefficient D for the CG beads of the protein nascent chain is parameterized to reproduce the experimentally observed timescale for protein diffusion across the Sec translocon. Specifically, the inventors consider the measurements by Rapoport and colleagues of posttranslational translocation times for the 165-residue pre-pro-a factor (ppaF) (Matlack et al., 1999) (15). In these experiments, the protein substrate is initially bound to the Sec translocon in proteoliposomes; translocation is initiated via addition of adenosine triphosphate (ATP) and binding immunoglobulin protein (BiP), and the fraction of translocated protein is monitored as a function of time.
Description of modeling of such experiment is provided at pages S5-S6 of the paper Long-Timescale Dynamics and Regulation of Sec-Facilitated Protein Translocation, B. Zhang and T. F. Miller, Cell Reports 2, 927-937 and S1-S24, Oct. 25, 2012, incorporated herein by reference in its entirety.
15. Protocols
15A—Trajectory Initialization and Termination (Protein Topogenesis Simulations)
Ribosomal translation is directly modeled in the CG simulations via growth of the nascent chain at the ribosome exit channel (shown in
Upon completion of protein translation, the C terminus of the inserted protein detaches from the ribosome exit channel, and the small subunit of the ribosome releases from the cytosolic mouth of translocon. Experimentally observed leakage of small molecules across the translocon following this ribosomal release suggests that the ribosome no longer seals the cytosolic mouth of the translocon. Ribosomal release is thus modeled by eliminating interactions associated with the ribosome CG beads.
Membrane integration trajectories are terminated after full translation of the protein mature domain, either when the SP integrates into the membrane in the type III orientation and diffuses to a distance of 16 nm from the translocon (state d,
As shown in
15B—Trajectory Initialization and Termination (Stop-Transfer Simulations)
As in the topogenesis simulations, ribosomal translation is modeled via addition of peptide residues to the nascent chain at the ribosomal exit channel (shown in
Unbinding of the ribosome at the end of translation is modeled as in the topogenesis simulations. Upon completion of translation, the constraint on the C terminus of the protein nascent chain is removed and interactions between the CG beads of the ribosome and protein nascent chain are eliminated.
Each CG trajectory is terminated after full translation of the protein C-terminal domain, either when the H-domain integrates into the membrane and diffuses a distance of 16 nm from the translocon (state f,
15C—Definition of State c in
State c includes protein nascent chain configurations for which (i) the SP adopts the Nexo/Ccyt orientation, (ii) all the hydrophobic beads in the SP occupy the membrane interior (
15D—Definition of States in
For the purposes of quantitatively defining the states in
State b includes configurations for which the center-of-mass of the H-domain occupies the translocon region, while none of the three X-type beads occupies the membrane region. State c* includes configurations for which all three of the X-type beads occupies the membrane region, at least one of the other CG beads in the H-domain occupies the translocon region, and the translocon LG is the open state.
15E—Equilibrium Rate Calculations
The thermal rate constants reported in
This protocol is repeated for the different values of ΔG reported in
The computational method of the present disclosure can be modified to improve the accuracy of the TABFs as compared to any given experimental data. The model can be changed to better match a set of constraints by changing parameters of the model such as the temperature, pressure, pH, electromagnetic fields and additional parameters affecting the TABF in the model. For example, growth temperature can have an effect on experimental TABFs for a protein expressed in a biological pathway. Therefore, changing the temperature parameter in the model can lead to model derived TABFs value that better match TABFs values from experiments performed with different growth temperatures. As an additional example, the model used in the experimental section as applied to IMP TatC can be modified by changing the number of amino acids represented by each bead to increase resolution, by modifying the protein inserter from ribosome to SecA to simulate post-translational translocation from co-translational translocation, by changing from a two dimensional projection to a three dimensional model to obtain a more representative model of the biological system, or by changing the physical properties of the membrane environment to simulate the physical properties of various biological membranes. In general modifications of the model can be performed to improve matching of the model derived TABF values with any set (e.g. predefined or experimentally derived) TABF values. Modifications of the model can also performed to change the computational difficulty of the TABF calculations (e.g. to simplify the difficulties and/or to increase the speed of the calculations). Exemplary modifications affecting computational difficult comprise changes of the model that result in a decrease of the amount of time required to calculate TABFs for any protein such as the use of a two dimensional projection instead of a three dimensional model and/or using CG beads with less than full atomistic resolution. Resolution of the model can also be increased or decreased to adjust the required time to determine TABFs or change the number of trajectories measured, which can also affect the accuracy of the TABFs derived from the model as compared to experimentally observed TABFs.
The present disclosure is also directed to a software package encompassing the features of the model in accordance with the present disclosure. Such a software package is useful in case a stand-alone product for performing the TABF modeling as such is desired. Such a software package can allow for a number of different inputs and outputs given set of constraints on either. By way of example, a software program can be realized in accordance with the teachings of the present disclosure, so that the program provides the TABF given the model as applied in the TatC examples of the present disclosure when given a protein primary structures. Additionally, a computer could be provided that contains such a software package.
In a further embodiment of the present disclosure, shown in
In particular, given a protein sequence or a set of protein sequences and desired a set of resulting TABFs, modifications to the TABF determinants for the simulation or the protein can be performed to affect the desired TABFs. This can be performed by changing the primary sequence of the protein to affect the TABF determinates for that protein. This can be done in a number of ways, including, but not limited to, randomly changing the sequence of the protein, changing the sequence of the protein to incorporate features seen in homologs, or using the TABF from the original sequence to find how the TABF differs and perform guided changed that rectify these particular TABF flaws. In each case the protein could be modified in ways such as by inserting, deleting, or changing individual natural or artificial amino acids, segments, and whole domains and proteins, adding post translational modifications such as glycosylation, and/or adding covalent bonds between amino acids such as by inserting cysteines. An example comprises that given the Mycobacterial TatC, which experimentally expresses poorly, the model would show that integration of the final TM is aberrant. Charged amino acids could be inserted into the protein sequence in a guided fashion until the desired TABF is returned from the model. This could also be performed by looking at TatC homolog distributions of charged residues after the final TM in TatC homologs and modifying the Mt TatC to add charges residues to the tail of the protein due to the presence of the charged residues on the tail of the other TatC proteins.
With reference to the predictions of the above embodiment, physical products (e.g. synthesized protein, plasmids and additional products identifiable by a skilled person.), see also related box in
In particular, the output of the above situation would be an ideal protein sequence or sequences. To produce any physical products desired there exist protocols for deriving them. The proteins sequence could be used to create any number of nucleotide sequences that code for the sequence. The protein sequence or sequences can be used to generate a number of physical products including an mRNA strand coding for the sequence, a DNA strand coding for the sequence, a plasmid containing a gene for the sequence, or the protein in a purified form. Short polynucleotides can be synthesized using techniques such as solid-phase oligonucleotide synthesis. Large polynucleotides could be created by connecting several short polynucleotides. Nucleotide sequences can be inserted into expression or any other vector using standard cloning procedures such as using a restriction enzyme to create complementary nucleotide overlaps between the vector and the inserted fragment followed by using a DNA ligase enzyme to covalently link the vector to the inserted fragment. Proteins can be provided by inserting a nucleotide sequence into an expression vector, expressing the protein in a suitable expression organism (such as E. coli strains B121 Gold DE3 and Rosetta PLysS), and recovering and purifying the protein. An example would be that given the MtTatC sequence, the modifications of TABF determinants (e.g. values of each TABF determinant or a combination of values for a set of TABF determinants) that lead to the desired TABF can be inserted into an expression plasmid, such as pBAD containing the gene with the modification.
According to a further embodiment of the present disclosure, protein sequences can be screened from a class of proteins or a set of candidates, to identify those with desired TABF, as shown in
In particular, given a set of protein sequences, the above described simulation method can be applied to all of the proteins in the set to find the protein sequence or sequences that produce TABFs that most closely match the desired TABF constraints. The set of proteins could be provided or could be found by searching for proteins that match a given a set of required traits. Additionally, if desired, the set of TABF determinants associated to the primary structure or primary structures that most closely match the desired TABF constraints can be provided, as also shown in
Physical products (e.g. synthesized protein, plasmids, and additional products identifiable by a skilled person.) can also be provided based on the predictions of the above embodiment, as also shown in a box of
In particular, the output of the above situation would be an ideal protein sequence or sequences. To produce any physical products desired, protocols for deriving them exist. The proteins sequence could be used to create any number of nucleotide sequences that code for the sequence. The protein sequence or sequences can be used to generate a number of physical products including an mRNA strand coding for the sequence, a DNA strand coding for the sequence, a plasmid containing a gene for the sequence, or the protein in a purified form. Short polynucleotides can be synthesized using techniques such as solid-phase oligonucleotide synthesis. Large polynucleotides could be created by connecting several short polynucleotides. Nucleotide sequences can be inserted into expression or any other vector using standard cloning procedures such as using a restriction enzyme to create complementary nucleotide overlaps between the vector and the inserted fragment followed by using a DNA ligase enzyme to covalently link the vector to the inserted fragment. Proteins can be provided by inserting a nucleotide sequence into an expression vector, expressing the protein in a suitable expression organism (such as E. coli strains B121 Gold DE3 and Rosetta PLysS), and recovering and purifying the protein. An example can be that given the need for any well expressed TatC, once a TatC that expresses well such as AaTatC is found it can be placed in an expression plasmid.
In accordance with yet another embodiment of the disclosure, protein sequences can be screened from a class of candidate expression systems and a given protein sequence or set of protein sequences, to identify the TABF determinant or determinants (e.g. an expression system or a sequence) that leads to desired TABF, as shown in
By way of example, given a protein or a set of proteins and the desired TABFs, the method can be adjusted to simulate different candidate expression systems such as different expression hosts or expression conditions such as temperature. This can be done by changing physical constants associated with the model, changing the effect of the membrane to simulate different membranes among species, or modifying the inserter. An example would be to find the best organism for expressing AaTatC. The membrane environments for a variety of organisms could be replicated using the model. The membrane environment model variant that provides the desired TABF would inform which organism would be useful for expressing AaTatC.
In addition, physical products (e.g. synthesized protein, plasmids, and additional products identifiable by a skilled person.) can be provided based on the predictions of the above embodiment, as also shown in a box in
In particular, the output of the above situation would be an ideal protein sequence or sequences. To produce any physical products desired there exist protocols for deriving them. The proteins sequence could be used to create any number of nucleotide sequences that code for the sequence. The protein sequence or sequences can be used to generate a number of physical products including an mRNA strand coding for the sequence, a DNA strand coding for the sequence, a plasmid containing a gene for the sequence, or the protein in a purified form. Short polynucleotides can be synthesized using techniques such as solid-phase oligonucleotide synthesis. Large polynucleotides could be created by connecting several short polynucleotides. Nucleotide sequences can be inserted into expression or any other vector using standard cloning procedures such as using a restriction enzyme to create complementary nucleotide overlaps between the vector and the inserted fragment followed by using a DNA ligase enzyme to covalently link the vector to the inserted fragment. Protein can be provided by inserting a nucleotide sequence into an expression vector, expressing the protein in a suitable expression organism (such as E. coli strains B121 Gold DE3 and Rosetta PLysS), and recovering and purifying the protein. An example can be that given the AaTatC primary structure, AaTatC can be expressed in the organism corresponding to the membrane environment model that best fits the TABF, recovered and purified.
In accordance with a further embodiment of the disclosure, for a given protein sequence, constraints on the TABF determinants can be identified, that will ensure that TABF remain within targeted ranges.
In particular, given a protein primary structure or a set of protein primary structure and a range of desired TABFs, a protein primary structure can be modified to change their TABF determinants prior to applying the model to the sequences to determine to what extent changes to of a variety of TABF determinants can be modified still maintaining favorable TABFs, or those that cause the greatest change in TABFs. An example would be that provided the AaTatC primary structure and a desire to know how TABF determinants affect TABFs for AaTatC, the protein can be modified with a variety of changes such as adding or removing charged amino acids or increasing or decreasing TM length, which can have varying effects on the TABF. The TABF determinants changes based on the modifications that have the desired effect on AaTatC TABF, whether small or large or otherwise, can be identified.
In another embodiment of the disclosure, shown in
In particular, given a set of constraints on TABF determinants related to given protein sequences and a desired TABF, a variety of protein candidates such as representatives from each integral membrane protein pFam can be run through the model to find those that best meet the desired TABF. Additionally the model can be modulated to determine the expression system, natural or otherwise, and expression conditions that best meet the desired TABFs over all the proteins tested.
In a further embodiment, for a given protein sequence, new TABF determinants can be discovered from analysis of the predicted TABF levels, as shown in
In particular, given a set of protein sequences, the resulting TABFs and the TABF determinants derived from these sequences could be analyzed to determine which TABF determinants correlate with each TABF. An example would be that given a set of TatC, the resulting TABFs could be used to determine that charge on the C-terminal tail correlates the most with TABFs. Therefore, the charge TABF determinant that is most helpful for determining TABFs for TatCs.
In yet another embodiment, given existing TABF experimental data, the simulation models can be used to provide explanations and interpretations for these data, as shown in
In particular, given a protein sequence or set of proteins sequences and TABF experimental data, the proteins sequences could be run through the model after modifying the attributes of the model such as the inserter, the rate of translocation, and the membrane attributes. After running the sequences through the model with different modifications, those modifications that cause the TABFs to that most closely resemble the TABFs observed from the experimental data could be identified and used to provide explanations and interpretations of the experimental data. An example can be that given experimental data about the expression levels for a set of proteins, it could be determined that the modification to the model comprising slowing down the translocation rate that leads TABFs that correlate best to the experimental data are slowing down the rate at which amino acids are translocated by the inserter. If the mRNA sequences for the set of proteins all are enriched in rare codons, which is thought to slow translation by the ribosome, the presence of rare codons could be given as an explanation for the TABFs observed experimentally. In a further embodiment, for a given protein sequence and TABFs, modifications that do not affect TABF can be identified, as shown in
In particular, given a protein sequence or a set of protein sequence, modifications to the protein sequence that affect TABF can be performed to determine which modifications do not substantially affect the TABF By way of example, a modification does not substantially affect a TABF when the difference between the TABF value before the modification and the TABF value after modification is not above a set threshold (e.g. a threshold considered indicative for the specific biological system where the translocation occurs).
By way of example, given the MtTatC sequence and a desire to include cysteine residues so that cross-linking with other proteins can be achieved, sequences with various cysteine insertions can be tested to determine which modifications least affect the TABF.
TABF determinants can be modified by changing the model itself or by changing the primary structure of the protein modeled. The TABF determinants that affect the model can be adjusted in a number of ways including changing values of single TABFs determinants such as a specific temperature, changing the characteristics of the membrane, changing the dynamics of the lateral gate opening, changing the inserter, modifying the inserter, increasing or decreasing translation rate, or explicitly modeling translocation cofactors. All of these modifications can be performed in parallel in corresponding experimental methods. For example, changing the translation rate can be performed experimentally by adding cycloheximide to the biological system where the translocation occurs. The temperature of the biological system can be changed by inducing expression of a protein in an organism at different temperatures of the related growth media. TABF determinants that affect the primary structure can be modified by changing the sequence of the protein modeled. Changes in the sequence of a protein can be performed by inserting, deleting, or modifying segments of the sequence, e.g. in a laboratory setting by modifying the sequence using any of a number of cloning or PCR based methods suitable for use by a skilled person. Therefore, many of the changes to TABF determinants performed in the model can be emulated experimentally. This allows the information derived from the simulations to be directly applied to a physical experiment. Using the information will allow for the TABFs observed using the model to be achieved experimentally.
The processor (15) is a hardware device for executing software, more particularly, software stored in memory (20). The processor (15) can be any commercially available processor or a custom-built device. Examples of suitable commercially available microprocessors include processors manufactured by companies such as Intel, AMD, and Motorola.
The memory (20) can include any type of one or more volatile memory elements (e.g., random access memory (RAM, such as DRAM, SRAM, SDRAM, etc.)) and nonvolatile memory elements (e.g., ROM, hard drive, tape, CDROM, etc.). The memory elements may incorporate electronic, magnetic, optical, and/or other types of storage technology. It must be understood that the memory (20) can be implemented as a single device or as a number of devices arranged in a distributed structure, wherein various memory components are situated remote from one another, but each accessible, directly or indirectly, by the processor (15).
The software in memory (20) may include one or more separate programs, each of which comprises an ordered listing of executable instructions for implementing logical functions. In the example of
Executable program (30) is a source program, executable program (object code), script, or any other entity comprising a set of instructions to be executed in order to perform a functionality. When a source program, then the program may be translated via a compiler, assembler, interpreter, or the like, and may or may not also be included within the memory (20), so as to operate properly in connection with the OS (25).
The I/O devices (40) may include input devices, for example but not limited to, a keyboard, mouse, scanner, microphone, etc. Furthermore, the I/O devices (40) may also include output devices, for example but not limited to, a printer and/or a display. Finally, the I/O devices (40) may further include devices that communicate both inputs and outputs, for instance but not limited to, a modulator/demodulator (modem; for accessing another device, system, or network), a radio frequency (RF) or other transceiver, a telephonic interface, a bridge, a router, etc.
If the computer system (10) is a PC, workstation, or the like, the software in the memory (20) may further include a basic input output system (BIOS) (omitted for simplicity). The BIOS is a set of essential software routines that initialize and test hardware at startup, start the OS (25), and support the transfer of data among the hardware devices. The BIOS is stored in ROM so that the BIOS can be executed when the computer system (10) is activated.
When the computer system (10) is in operation, the processor (15) is configured to execute software stored within the memory (20), to communicate data to and from the memory (20), and to generally control operations of the computer system (10) pursuant to the software. The audio data spread spectrum embedding and detection system and the OS (25), in whole or in part, but typically the latter, are read by the processor (15), perhaps buffered within the processor (15), and then executed.
When the various embodiments described herein are implemented in software, it should be noted that the software can be stored on any computer readable storage medium for use by, or in connection with, any computer related system or method. In the context of this document, a computer readable storage medium is an electronic, magnetic, optical, or other physical device or means that can contain or store a computer program for use by, or in connection with, a computer related system or method.
The various embodiments described herein can be embodied in any computer-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. In the context of this document, a “computer-readable storage medium” can be any non-transitory tangible means that can store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The computer readable storage medium can be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: a portable computer diskette, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM, EEPROM, or Flash memory) an optical disk such as a DVD or a CD.
In an alternative embodiment, where the various embodiments described herein are implemented in hardware, the hardware can implemented with any one, or a combination, of the following technologies, which are each well known in the art: a discrete logic circuit(s) having logic gates for implementing logic functions upon data signals, an application specific integrated circuit (ASIC) having appropriate combinational logic gates, a programmable gate array(s) (PGA), a field programmable gate array (FPGA), etc.
The translocon associated protein trajectory simulations in accordance with the present disclosure have been performed on a variety of systems, including iMac® desktop machines, the Paso® system at Caltech, and the Hopper® and Carver® systems at the National Energy Research Supercomputing Center (NERSC).
In particular, Paso® is a cluster of 64 rack-mounted compute nodes with dual, quad-core 2.6 GHz Xeon Intel processors and 12 GB of memory per node. The nodes are connected via Gigabit Ethernet® and Infiniband®, with 9.6 TB of disk space. Hopper® is a peta-flop system, a Cray XE6®, with a peak performance of 1.28 Petaflops/sec, 153,216 compute cores for running scientific applications, 217 Terabytes of memory, and 2 Petabytes of online disk storage. Carver® is a liquid-cooled IBM iDataPlex system, having 1202 compute nodes (9,984 processor cores). This represents a theoretical peak performance of 106.5 Teraflops/sec. The above node count includes hardware that is dedicated to various strategic projects and experimental testbeds (e.g., Hadoop). As such, not all 1202 nodes will be available to all users at all times. All nodes are interconnected by 4×QDR InfiniBand® technology, providing 32 Gb/s of point-to-point bandwidth for high-performance message passing and I/O.
The translocon-associated protein trajectories can be, for example, generated using the code “TAPTgenerator” that is written in the FORTRAN90® programming language and that was fully written by the inventors. The “TAPTgenerator” code is comprised of a series of subroutines that evaluate the forces among the CG beads, evolve the positions of CG beads among subsequent timesteps of the translocon associated protein trajectory, describe the initialization and growth of the nascent chain from the exit channel of the protein inserter, and describe the opening/closing of the translocon lateral gate at each timestep in the trajectory.
An additional software component that can be used as part of the present disclosure is a script written in the Python programming language that performs the following analysis of the generated translocon associated protein trajectories: (1) Identification of the transmembrane segments of integral membrane proteins from the position of the nascent protein position along the trajectory; (2) Identification of the soluble loops of integral membrane proteins from the position of the nascent protein position along the trajectory; soluble loops are categorized as being positioned on the cytosolic side of the membrane, the lumenal side of the membrane, or in the interior of the membrane; (3) Identification whether and at what times a transmembrane segment underwent topology flipping (from Type II to Type III orientation) during the course of the generated trajectory; (4) Identification whether a given protein segment underwent “integration” vs. “secretion” vs. “retention”.
A further software component that can be used as part of the present disclosure is a script written in the Python® programming language that translates a specific amino acid sequence into a sequence of CG beads. The CG beads so obtained are then simulated in the translocon associated protein trajectories.
In some embodiments, the model herein described can be used in methods and systems to provide a protein expressed through a co-translational translocation pathway with a set targeting and/or topology. In those embodiments computer generated trajectories can be obtained for one or candidate proteins and the related targeting and/or topologies relative to the translocon-nascent protein-ribosome system can be determined based on the kinetic pathways defined by the trajectories. The determined at least one targeting and/or topology for each of the one or more candidate proteins can be compared with the set targeting and/or topology to select a candidate protein among the one or more candidate proteins having the set targeting and/or topology. At least one targeting and/or topogenic determinant of the -translocon-nascent protein-ribosome system associated to the set targeting and/or topology for the protein based on the comparing can then be selected to produce the protein with the selected targeting and/or topology. In particular the protein can be produced by expressing the selected candidate through a co-translational translocation pathway with the selected at least one targeting and/or topogenic determinant of the -translocon-nascent protein-ribosome system.
Further effects and characteristics of the present disclosure will become more apparent hereinafter from the following detailed disclosure by way or illustration only with reference to an experimental section.
The methods and system herein described are further illustrated in the following examples, which are provided by way of illustration and are not intended to be limiting.
In particular, in the following examples a further a description of the methods and systems of the present disclosure and related engineered protein is provided with reference to TaTc protein, i.e. a protein expressed through a cotranslational translocation pathway which is a membrane protein. A person skilled in the art will appreciate the applicability of the features described in detail for TatC other membrane proteins and to other proteins expressed by way of a cotranslational translocation pathway. In particular, a skilled person reading the present disclosure will appreciate that TaTC is only one exemplary protein expressed through a cotranslational translocation pathway and that proteins expressed through a cotranslational translocation pathway can include other membrane proteins as well as additional proteins expressed through a cotranslational translocation pathway, such as proteins that are secretory, membrane-bound, or reside in the endoplasmic reticulum (ER), golgi or endosomes.
In the following examples, exemplary uses of the model herein described are provided with reference to TatC, a component of the bacterial twin-arginine translocase (6) and related chimeras. In particular correct targeting of TatC and related correlation with degree of expression in was simulated and verified experimentally according to exemplary steps of methods and systems of embodiments herein described.
Accordingly, the protocols and procedures utilized in the following examples provide exemplary procedures for a method for controlling the targeting herein described is further described and demonstrated with reference to exemplary embodiments where the protein is provided by TatC and related chimeras.
The protocols and procedures utilized in the following examples also provide exemplary procedures for a method to provide a protein expressed through a co-translational translocation pathway with a set targeting and/or topology, wherein the protein is provided by TatC, and the set targeting and/or topology is provided by a correct integration and folding within the cell membrane and wherein the TatC chimeras con provide related candidate proteins in accordance with embodiments herein described as will be apparent to a skilled person.
The protocols and procedures utilized in the following examples further provide exemplary procedures for a method to engineer a protein expressed through a co-translational translocation pathway to obtain an engineered protein with a set targeting and/or topology wherein the protein is provided by TatC and the set targeting and/or topology is provided by a correct integration and folding within the cell membrane.
The following examples also provide further provide exemplary engineered proteins expressed through a co-translational translocation pathway with a set targeting and/or topology, wherein the protein is provided by TatC and the engineered proteins can be provided by the TatC chimeras.
In general, the PIPE cloning protocol was used (1). In short, TatC homologs wild type and chimeras were PCR amplified with the following primers:
AaTatc_PIPE-for (5′-GGTGAAAACCTGTACTTCCAGAGCATGCCACTGACCGAACACC-3′) (SEQ ID. NO: 12) AaTatc_PIPE-rev (5′-TGGTCCCTGAAACAAGACTTCCAAAGCCTTCTGAATCTCCTTCTTTTTGC) (SEQ ID. NO:13) MtTatC_PIPE-for (5′-GGTGAAAACCTGTACTTCCAGAGCTCTCTCGTAGACCACCTCAC-3′) (SEQ ID. NO: 14) MtTatC_PIPE-rev (5′-TGGTCCCTGAAACAAGACTTCCAAGTGAACACGCGCGATCTG-3′) (SEQ ID. NO: 15).
The pETKat vectors were PCR amplified with vector PIPE-for (5′-TTGGAAGTCTTGTTTCAGGGACCA-3′) (SEQ ID. NO: 16) and vector PIPE-rev (5′-GCTCTGGAAGTACAGGTTTTCACC-3′) (SEQ ID. NO: 17). 1-2 μl of insert and vector were combined on ice and 50 μl NovaBlue (Invitrogen) competent cells were added. PIPE cloning compatible vectors were generated based on a pET-33 vector (Novagen) containing a N-terminal 9×His-tag (pETKatN9) or a C terminal GFP and 8×His-tag (pETKatGFP). For better cloning efficacy a suicide cassette was also included, derived from pDest53 (Invitrogen). The TEV and 3C protease recognition sequences as PIPE cloning sites (vector maps in
The wild type M. tuberculosis and A. aeolicus TatC genes were synthesized by primer extension as applied in DNAWorks http://helixweb.nih.gov/dnaworks/ (2). TMD prediction was performed with HMMTMM2.0 (3). For TMD swaps, topology prediction as well as conserved flanking residues were taken into consideration (
Constructs were transformed into BL21 Gold cells (Agilent technologies) and transferred onto LB-Kan plates. The next morning colonies were combined into a 5 ml 2×YT medium. After determination of OD600 values, 50 ml 2×YT cultures were inoculated to a starting OD600 of 0.0 l. Cultures were grown in an orbital shaker at 37° C. until they reached an OD600 of approximately 0.2. The temperature of the orbital shaker was then reduced to 16° C. Upon reaching an OD600 of 0.4, IPTG was added to final concentration of 1 mM to induce expression. Cultures were grown over night and 500 μl of each culture was harvested and centrifuged. The pellet was washed 3 times with 2 ml PBS, before re-suspending in 2 ml of PBS and dispensing 200 μl of each into a 96 well plate. In addition, a 2× dilution of the sample buffer in PBS was performed and 200 μl of this was plated into the 96-well plate.
GFP expression per cell was quantified using a MACSQuantl0 Analyzer (Miltenyi, Auburn, Calif.). This flow cytometry measures forward scattering, side scattering and total fluorescence at 488 nm. Both scattering plots give indication of cell size (
To investigate whether the TatC chimera proteins were folded correctly, protein expression and purification was performed as previously described using a N-terminal His-tag variant (Ramasamy 2013) (18). In short, the procedure described above was scaled up to four 11 cultures of selected constructs. Cells were harvested and 10 g of wet cell mass was resuspended in 100 ml buffer A (300 mM NaCl, 10% glycerol, and 50 mM Tris pH=7.5). After homogenization the cells were lysed in a microfluidizer. The remainder of the lysate was centrifuged in a JA-17 fixed angle rotor (Beckman-Coulter) at 11,000 rpm for 30 minutes. The supernatants were then subjected to 30 minutes of centrifugation at 204526.3 g. Next the pellet was resuspended in 50 ml buffer B (Buffer A+1% DDM and 30 mM imidazole) and incubated at 4° C. under gentle shaking for 1 h. The membrane extract was obtained by a final centrifugation run with conditions identical to those described above. The supernatants containing the solubilized IMPs were mixed with 0.5 ml of NiNTA (Qiagen), that had been equilibrated with buffer B, and incubated at 4° C. under gentle shaking for 1 h. NiNTA was then isolated by 5 min centrifugation at 700 g, and resuspended in 20 ml of Buffer C (Buffer A+30 mM imidazole+0.03% DDM) for removal of unbound protein. NiNTA was isolated again as described above and the IMPs were eluted by resuspending the NiNTA in 5 ml buffer D (Buffer A+300 mM imidazole+0.03% DDM). After a final centrifugation step (5 minutes at 700 g) the supernatants were concentrated to a final volume of 0.5 ml using Amicon Ultra-4 (Millipore) concentrator with a 30 kDa cutoff membrane. The concentrated sample was then injected onto a 30 ml Superdex 200 column (GE Healthcare).
Statistical analysis was performed with Prism Graph Pad 6. An unpaired, two-tailed student T-test was employed to compare two groups. A p-value equal or lower than 0.05 was deemed statistically significant. For analysis of differences in expression of the different tail/linker MtTatC constructs. A one-way ANOVA was employed followed by the Dunnett's test. All were compared to MtTatC.
Modeling of integral membrane protein (IMP) integration in the current example was performed using a 2-dimensional embodiment of the simulation method for the direct simulation of co-translational protein translocation and membrane integration. Ribosomal translation and membrane integration of nascent proteins are thus simulated on the minute timescale, enabling direct comparison between theory and experiment.
Here, the method was applied to verify the effect of IMP sequence on the membrane integration of various TatC and YidC homologues. The simulation method is employed with only minor modifications from the method described in the initial part of the present description and in the paper Long-Timescale Dynamics and Regulation of Sec-Facilitated Protein Translocation, B. Zhang and T. F. Miller, Cell Reports 2, 927-937 and S1-S24, Oct. 25, 2012, which paper is incorporated herein by reference in its entirety, all of which modifications are specified below.
As described above and in such paper, the simulation method explicitly describes the configurational dynamics of the nascent-protein chain, conformational gating in the Sec translocon, and the slow dynamics of ribosomal translation. The nascent chain is represented as a freely jointed chain of beads, where each bead represents 3 amino acids and has a diameter of 8 Å, the typical Kuhn length for polypeptide chains. Bonding interactions between neighboring beads are described using the finite extension nonlinear elastic (FENE) potential (Equation 1), short-ranged nonbonding interactions are modeled using the Lennard-Jones potential (Equation 2), electrostatic interactions are modeled using the Debye-Huckel potential (Equations 3-4), and solvent interactions are described using a position-dependent potential based on the water-membrane transfer free energy for each CG bead (Equations 5-6). All parameters are as described in the specifications, unless otherwise stated.
The time evolution of the nascent protein is modeled using overdamped Brownian dynamics (Equation 7), with the CG beads confined to a two-dimensional plane that runs along the axis of the translocon channel and between the two helices of the lateral gate (LG). Conformational gating of the translocon LG is with the LG helices moving out of the plane of confinement for the CG beads, allowing the nascent chain to pass into the membrane bilayer. The rate of stochastic LG opening and closing is dependent on the sequence of the nascent protein CG beads that occupy the translocon channel (Equations 8-10). Ribosomal translation is directly simulated via growth of the nascent protein at the ribosome exit channel. Throughout translation, the C-terminus of the nascent protein is held fixed, and new beads are sequentially added at a rate of 24 residues per second. Upon completion of translation, the C-terminus is released from the ribosome, and the ribosome remains bound to the translocon. It has been confirmed by the inventors that the results herein are robust with respect to changes in the rate of ribosomal translation
In the procedure reported in the present example, amino-acid sequences for the TatC homologs are mapped onto sequences of CG beads as follows. Each consecutive trio of amino acid residues in the nascent protein sequence is mapped to an associated CG bead. The water-membrane transfer free energy for each CG bead is taken to be the sum of the contributions from the individual amino acids; these values are taken from the experimental water-octanol transfer free energies for single residues. The charge for each CG bead is taken to be the sum of the contribution from the individual amino acids. As in the above mentioned paper, positively charged residues (Arginine and Lysine) were modeled with a +2 charge to capture significant effects on topology due to changes in the nascent protein. Histidine residues were modeled with only a +1 charge to account for the partial protonation of these residues, and negatively charged residues (Glutamate and Aspartate) were modeled with a change of −1. For the results in
For the results in
To determine whether a given trajectory leads to correct integration of the TatC homolog in the correct multispanning topology, the following criteria were used. The topology of a nascent protein configuration is determined by the location of the soluble loops that connect the TMD. A collective variable λi was defined for each loop, with i=1 corresponding to the loop that leads TMD 1 in the TatC sequence (i.e., the N-terminal sequence) and i=7 corresponding to the loop that follows TMD 6 (i.e., the C-tail). If loop i is in the cytosol, then λi=1; if loop i is in the periplasm, then λi=−1; otherwise, λi=0. The multi-spanning TatC topology corresponds to configurations for which λi=1 for i=1, 3, 5 and 7 and for which λi=−1 for i=2, 4, and 6. A given trajectory is determined to have reached correct IMP integration if a topology with the loops in the right orientation is sampled during a time window of 2.5 seconds taken 25 seconds after the end of translation, a time window was used to reduce noise due to loops temporarily entering the lipid membrane. The time window was taken 25 seconds after the end of translation, which was found sufficient to allow the nascent-protein to finish the integration/translocation of TMD 6.
The simulations revealed that only the integration of the final TMD was affected by sequence modifications in the C-terminal loop (
To determine the TatC insertion efficiency simulations were conducted using the CG model of Example 6.
1200 independent CG trajectories were calculated for Aquifex aeolicus TatC (Aa), Mycobacterium turberculosis TatC (Mt), Bordetella parapertussis (Bp), Campylobacter jejuni (Cj), Deinococcus radiodurans (Dr), Staphylococcus aureus (Sa), Vibrio cholera (Vc), and Wolinella succinogenes (Ws), both with and without the replacement of the native tail with the Aa tail.
The results for the fraction of trajectories ending in the correct topology calculated for Aquifex aeolicus TatC (Aa), Mycobacterium turberculosis TatC (Mt), and Mycobacterium turberculosis TatC with an Aquifex aeolicus tail is shown in
For TatCs from Mycobacterium turberculosis (Mt), Bordetella parapertussis (Bp), Escherichia coli (Ec), Campylobacter jejuni (Cj), Deinococcus radiodurans (Dr), Staphylococcus aureus (Sa), Vibrio cholera (Vc), and Wolinella succinogenes (Ws) with and without the replacement of the tail with the Aa-tail the fraction of simulations exhibiting the desired TABF, correct insertion of the TMDs in the topology shown in
To identify the property of the Aa-tail sequence that acts as a TABF determinant in TatC further simulations were performed where the C-terminal charges were adjusted, leaving all other TABF determinants unchanged. This is similar to the embodiment shown schematically in
In order to verify that the computations simulations can reasonably predict protein expression, TatCs from Aquifex aeolicus (Aa), Mycobacterium turberculosis (Mt), Bordetella parapertussis (Bp), Campylobacter jejuni (Cj), Deinococcus radiodurans (Dr), Staphylococcus aureus (Sa), Vibrio cholera (Vc), Escherichia coli (Ec), and Wolinella succinogenes (Ws), with and without the native tail replaced with the Aa tail were cloned into the pETKatGFP expression constructs preceding the GFP domain. When these constructs express the cloned gene with a TEV cleave site attached to the N-terminus and a 3C cleave site followed by an eGFP molecule and a 6×Histidine tag attached to the C-terminus.
In general, the PIPE cloning protocol was used (1). In short, TatC homologs wild type and chimeras were PCR amplified with the following primers: AaTatc_PIPE-for (SEQ ID. NO: 12); AaTatc_PIPE-rev (SEQ ID. NO: 13); MtTatC_PIPE-for (SEQ ID. NO: 14); and MtTatC_PIPE-rev (SEQ ID. NO: 15).
The pETKat vectors were PCR amplified with vector PIPE-for (SEQ ID. NO: 16) and vector PIPE-rev (SEQ ID. NO: 17). 1-2 μl of insert and vector were combined on ice and 50 μl NovaBlue (Invitrogen) competent cells were added. PIPE cloning compatible vectors.
The expression constructs used are based on a pET-33 vector (Novagen) containing a N-terminal 9×His-tag (pETKatN9) or a C terminal GFP and 8×His-tag (pETKatGFP). For better cloning efficacy a suicide cassette was also included, derived from pDest53 (Invitrogen). TEV and 3C protease recognition sequences were chosen as PIPE cloning sites (vector maps in
A map of pETKatGFP, a PIPE cloning vector based on pET33 can be seen in
A map of pETKatN9, is shown in
The wild type M. tuberculosis and A. aeolicus TatC genes were synthesized by primer extension as applied in DNAWorks http://helixweb.nih.gov/dnaworks/. TMD prediction was performed with HMMTMM2.0. For TMD swaps, topology prediction as well as conserved flanking residues were taken into consideration (
Sequencing results indicated successful integration of the TatC homologs into their respective vectors.
In order to verify the computational simulations ability to predict protein expression TatCs from Aquifex aeolicus (Aa), Mycobacterium turberculosis (Mt), Bordetella parapertussis (Bp), Campylobacter jejuni (Cj), Deinococcus radiodurans (Dr), Staphylococcus aureus (Sa), Vibrio cholerae (Vc), Escherichia coli (Ec), and Wolinella succinogenes (Ws) and their Aquifex Aeolicus tails swaps in the pETKatGFP constructs and the Aquifex Aeolics Tatc and solube GFP in independent pETKatN9 constructs were expressed.
Constructs were transformed into BL21 Gold cells (Agilent Technologies) and transferred onto LB-Kan plates. The next morning colonies were combined into a 5 ml 2×YT medium.
After determination of OD600 values, 50 ml 2×YT cultures were inoculated to a starting OD600 of 0.0 l. Cultures were grown in an orbital shaker at 37° C. until they reached an OD600 of approximately 0.2. The temperature of the orbital shaker was then reduced to 16° C. Upon reaching an OD600 of 0.4, IPTG was added to final concentration of 1 mM to induce expression of the fusion proteins. Cultures were grown over night and 500 μl of each culture was harvested and centrifuged. The supernatant was discarded and the pellet was washed 3 times with 2 ml PBS, before re-suspending in 2 ml of PBS and dispensing 200 μl of each into a 96 well plate.
TatC-GFP fusion protein expression per cell was quantified using a MACSQuant10 Analyzer (Miltenyi, Auburn, Calif.) flow cytometer. This flow cytometer measures forward scattering, side scattering and fluorescence at 488 nm of particles passing through the detector. Both scattering plots give indication of cell size (
Expression tests comparing various TatC homologs with their native tail and with their tails replaced with the Aa tails are shown in
As shown in the expression tests various tail combinations and variants of TatC result in dramatically different expression.
Simulation results from the procedure of Example 7, and expression results from the experiments of Example 9 were compared.
The fraction of AaTatC, MtTatC, and Mt(Aa C-tail) simulation trajectories that yield the correct membrane topology, normalized with respect to the AaTatC wild type as shown in
The results show that simulations of the integration correlate well with the actual expression levels. In addition, the simulations provide an explanation for the experimentally observed expression levels (see, by way of example, the embodiment of
The examples set forth above are provided to give those of ordinary skill in the art a complete disclosure and description of how to make and use the embodiments of the materials, compositions, systems and methods of the disclosure, and are not intended to limit the scope of what the inventors regard as their disclosure.
All patents and publications mentioned in the specification are indicative of the levels of skill of those skilled in the art to which the disclosure pertains.
The entire disclosure of each document cited (including patents, patent applications, journal articles, abstracts, laboratory manuals, books, or other disclosures) in the Background, Summary, Detailed Description, Examples and List of References is hereby incorporated herein by reference. All references cited in this disclosure are incorporated by reference to the same extent as if each reference had been incorporated by reference in its entirety individually. However, if any inconsistency arises between a cited reference and the present disclosure, the present disclosure takes precedence. Further, the computer readable form of the sequence listing of the ASCII text file P1471-US-Sequence-Listing_ST25 being filed concurrently with the present paper is incorporated herein by reference in its entirety.
The terms and expressions which have been employed herein are used as terms of description and not of limitation, and there is no intention in the use of such terms and expressions of excluding any equivalents of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the disclosure claimed. Thus, it should be understood that although the disclosure has been specifically disclosed by embodiments, exemplary embodiments and optional features, modification and variation of the concepts herein disclosed can be resorted to by those skilled in the art, and that such modifications and variations are considered to be within the scope of this disclosure as defined by the appended claims.
It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the content clearly dictates otherwise. The term “plurality” includes two or more referents unless the content clearly dictates otherwise. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the disclosure pertains.
When a Markush group or other grouping is used herein, all individual members of the group and all combinations and possible subcombinations of the group are intended to be individually included in the disclosure. Every combination of components or materials described or exemplified herein can be used to practice the disclosure, unless otherwise stated. One of ordinary skill in the art will appreciate that methods, device elements, and materials other than those specifically exemplified may be employed in the practice of the disclosure without resort to undue experimentation. All art-known functional equivalents, of any such methods, device elements, and materials are intended to be included in this disclosure. Whenever a range is given in the specification, for example, a temperature range, a frequency range, a time range, or a composition range, all intermediate ranges and all subranges, as well as, all individual values included in the ranges given are intended to be included in the disclosure. Any one or more individual members of a range or group disclosed herein may be excluded from a claim of this disclosure. The disclosure illustratively described herein suitably may be practiced in the absence of any element or elements, limitation or limitations which is not specifically disclosed herein.
A number of embodiments of the disclosure have been described. The specific embodiments provided herein are examples of useful embodiments of the invention and it will be apparent to one skilled in the art that the disclosure can be carried out using a large number of variations of the devices, device components, methods steps set forth in the present description. As will be obvious to one of skill in the art, methods and devices useful for the present methods may include a large number of optional composition and processing elements and steps.
In particular, it will be understood that various modifications may be made without departing from the spirit and scope of the present disclosure. Accordingly, other embodiments are within the scope of the following claims
The present application claims priority to U.S. Provisional Application No. 61/833,250 entitled “Method and Software for the Prediction and Refinement of Integral Membrane Protein Insertion into Cell Membranes and Protein Translocation Across Cell Membranes” filed Jun. 10, 2013, and to U.S. Provisional Application No. 61/872,103 entitled “Computational Algorithm for Designing Enhanced Expression of Integral Membrane Proteins” filed Aug. 30, 2013, each of which is incorporated herein by reference in its entirety. The present application is also related to U.S. patent application Ser. No. 14/301,069, filed on even date herewith, entitled “Translocon-Associated Biogenesis Features and Related Methods, Systems and Products”, which is incorporated herein by reference in its entirety.
This invention was made with government support under N00014-10-1-0884 awarded by the Office of Naval Research and 5DP1GM105385 awarded by the National Institutes of Health. The government has certain rights in the invention.
Entry |
---|
Goder et al. FEBS Letters 504 (2001) 87-93. |
Rychkova et al. PNAS, vol. 107, No. 41, 2010, pp. 17598-17603. |
Lindahl et al. Current Opinion in Structural Biology 2008, 18:425-431. |
Kamerlin et al. Annu. Rev. Phys. Chem. 2011. 62:41-64. |
Abou Elela S, Nazar RN: Role of the 5.8S rRNA in ribosome translocation. Nucleic Acids Res. 1997, 25:1788-1794. |
Andersson, H., Bakker, E., and von Heijne, G. (1992). Different positively charged amino acids have similar effects on the topology of a polytopic transmembrane protein in Escherichia coli. J. Biol. Chem. 267, 1491-1495. |
Bader, M., Lundin, C., Kim, H., Nilsson, I., and von Heijne, G. (2008). Contribution of positively charged flanking residues to the insertion of transmembrane helices into the endoplasmic reticulum. Proc. Natl. Acad. Sci. USA 105, 4127-4132. |
Becker, T., Bhushan, S., Jarasch, A., Armache, J.-P., Funes, S., Jossinet, F., Gumbart, J., Mielke, T., Berninghausen, O., Schulten, K., et al. (2009). Structure of monomeric yeast and mammalian Sec61 complexes interacting with the translating ribosome. Science 326, 1369-1373. |
Beckmann, R., Spahn, C.M., Eswar, N., Helmers, J., Penczek, P.A., Sali, A., Frank, J., and Blobel, G. (2001). Architecture of the protein-conducting channel associated with the translating 80S ribosome. Cell 107, 361-372. |
Beltzer, J.P., Fiedler, K., Fuhrer, C., Geffen, I., Handschin, C.,Wessels, H.P., and Spiess, M. (1991). Charged residues are major determinants of the transmembrane orientation of a signal-anchor sequence. J. Biol. Chem. 266, 973-978. |
Berendsen, H.J.C., Postma, J.P.M., Vangunsteren, W.F., Dinola, A., and Haak, J.R. (1984). Molecular-dynamics with coupling to an external bath. J. Chem. Phys. 81, 3684-3690. |
Bieker, K.L., and Silhavy, T.J. (1990). PrIA (SecY) and PrIG (SecE) interact directly and function sequentially during protein translocation in E. coli. Cell 61, 833-842. |
Bilgin, N., Claesens, F., Pahverk, H., and Ehrenberg, M. (1992). Kinetic properties of Escherichia coli ribosomes with altered forms of S12. J. Mol. Biol. 224, 1011-1027. |
Boehlke, K.W., and Friesen, J.D. (1975). Cellular content of ribonucleic acid and protein in Saccharomyces cerevisiae as a function of exponential growth rate: calculation of the apparent peptide chain elongation rate. J. Bacteriol. 121, 429-433. |
Bogdanov, M., Xie, J., Heacock, P., and Dowhan, W. (2008). To flip or not to flip: lipid-protein charge interactions are a determinant of final membrane protein topology. J. Cell Biol. 182, 925-935. |
Bogsch et al., J. Biol Chem. 273, 18003 (1998). |
Bonardi, F., Halza, E., Walko, M., Du Plessis, F., Nouwen, N., Feringa, B.L., and Driessen, A.J. (2011). Probing the SecYEG translocation pore size with preproteins conjugated with sizable rigid spherical molecules. Proc. Natl. Acad. Sci. USA 108, 7775-7780. |
Bondar, A.N., del Val, C., Freites, J.A., Tobias, D.J., and White, S.H. (2010). Dynamics of SecY translocons with translocation-defective mutations. Structure 18, 847-857. |
Brodsky, J.L., Goeckeler, J., and Schekman, R. (1995). BiP and Sec63p are required for both co- and posttranslational protein translocation into the yeast endoplasmic reticulum. Proc. Natl. Acad. Sci. USA 92, 9643-9646. |
Buchete NV, Hummer G: Coarse master equations for peptide folding dynamics. J Phys Chem B. 2008, 112:6057-6069. |
Chandler, D. (1986). Roles of classical dynamics and quantum dynamics on activated processes occurring in liquids. J. Stat. Phys. 42, 49-67. |
Chandler, D. (2005). Interfaces and the driving force of hydrophobic assembly. Nature 437, 640-647. |
Chauwin, J.F., Oster, G., and Glick, B.S. (1998). Strong precursor-pore interactions constrain models for mitochondrial protein import. Biophys. J. 74, 1732-1743. |
Cheng, Z., and Gilmore, R. (2006). Slow translocon gating causes cytosolic exposure of transmembrane and lumenal domains during membrane protein integration. Nat. Struct. Mol. Biol. 13, 930-936. |
Chuang, J., Kantor, Y., and Kardar, M. (2002). Anomalous dynamics of translocation. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 65, 011802. |
Crowley, K.S., Liao, S., Worrell, V.E., Reinhart, G.D., and Johnson, A.E. (1994). Secretory proteins move through the endoplasmic reticulum membrane via an aqueous, gated pore. Cell 78, 461-471. |
Crowley, K.S., Reinhart, G.D., and Johnson, A.E. (1993). The signal sequence moves through a ribosomal tunnel into a noncytoplasmic aqueous environment at the ER membrane early in translocation. Cell 73, 1101-1115. |
Denzer, A.J., Nabholz, C.E., and Spiess, M. (1995). Transmembrane orientation of signal-anchor proteins is affected by the folding state but not the size of the N-terminal domain. EMBO J. 14, 6311-6317. |
Devaraneni, P.K., Conti, B., Matsumura, Y., Yang, Z., Johnson, A.E., and Skach, W.R. (2011). Stepwise insertion and inversion of a type II signal anchor sequence in the ribosome-Sec61 translocon complex. Cell 146, 134-147. |
Dill, K.A., Bromberg, S., Yue, K., Fiebig, K.M., Yee, D.P., Thomas, P.D., and Chan, H.S. (1995). Principles of protein folding—a perspective from simple exact models. Protein Sci. 4, 561-602. |
Do, H., Falcone, D., Lin, J., Andrews, D.W., and Johnson, A.E. (1996). The cotranslational integration of membrane proteins into the phospholipid bilayer is a multistep process. Cell 85, 369-378. |
Dowhan, W., and Bogdanov, M. (2009). Lipid-dependent membrane protein topogenesis. Annu. Rev. Biochem. 78, 515-540. |
Drew, D. E., von Heijne, G., Nordlund, P. & de Gier, J. W. (2001). Green fluorescent protein as an indicator to monitor membrane protein overexpression in Escherichia coli. FEBS Lett. 507, 220-224. |
Duong, F., and Wickner, W. (1998). Sec-dependent membrane protein biogenesis: SecYEG, preprotein hydrophobicity and translocation kinetics control the stop-transfer function. EMBO J. 17, 696-705. |
Egea, P.F., and Stroud, R.M. (2011). Lateral opening of a translocon upon entry of protein suggests the mechanism of insertion into membranes. Proc. Natl. Acad. Sci. USA 107, 17182-17187. |
Elston, T.C. (2000). Models of post-translational protein translocation. Biophys. J. 79, 2235-2251. |
Erlandson, K.J., Miller, S.B., Nam, Y., Osborne, A.R., Zimmer, J., and Rapoport, T.A. (2008). A role for the two-helix finger of the SecA ATPase in protein translocation. Nature 455, 984-987. |
Frauenfeld, J., Gumbart, J., Sluis, E.O., Funes, S., Gartmann, M., Beatrix, B., Mielke, T., Berninghausen, O., Becker, T., Schulten, K., and Beckmann, R. (2011). Cryo-EM structure of the ribosome-SecYE complex in the membrane environment. Nat. Struct. Mol. Biol. 18, 614-621. |
Fujita, H., Kida, Y., Hagiwara, M., Morimoto, F., and Sakaguchi, M. (2010). Positive charges of translocating polypeptide chain retrieve an upstream marginal hydrophobic segment from the endoplasmic reticulum lumen to the translocon. Mol. Biol. Cell 21, 2045-2056. |
Garrison, J.L., Kunkel, E.J., Hegde, R.S., and Taunton, J. (2005). A substratespecific inhibitor of protein translocation into the endoplasmic reticulum. Nature 436, 285-289. |
Go, N., and Taketomi, H. (1978). Respective roles of short- and long-range interactions in protein folding. Proc. Natl. Acad. Sci. USA 75, 559-563. |
Gobas, F.A., Lahittete, J.M., Garofalo, G., Shiu, W.Y., and Mackay, D. (1988). A novel method for measuring membrane-water partition coefficients of hydrophobic organic chemicals: comparison with 1-octanol-water partitioning. J. Pharm. Sci. 77, 265-272. |
Goder V, Spiess M: Molecular mechanism of signal sequence orientation in the endoplasmic reticulum. EMBO J. 2003, 22:3645-3653. |
Goder, V., and Spiess, M. (2001). Topogenesis of membrane proteins: determinants and dynamics. FEBS Lett. 504, 87-93. |
Goder, V., Junne, T., and Spiess, M. (2004). Sec61p contributes to signal sequence orientation according to the positive-inside rule. Mol. Biol. Cell 15, 1470-1478. |
Goldschmidt, D.R. Cooper, Z.S. Derewenda, D. Eisenberg, Protein Sci. 16, 1569 (2007). |
Gumbart, J., and Schulten, K. (2006). Molecular dynamics studies of the archaeal translocon. Biophys. J. 90, 2356-2367. |
Gumbart, J., and Schulten, K. (2007). Structural determinants of lateral gate opening in the protein translocon. Biochemistry 46, 11147-11157. |
Gumbart, J., Chipot, C., and Schulten, K. (2011). Free-energy cost for translocon-assisted insertion of membrane proteins. Proc. Natl. Acad. Sci. USA 108, 3596-3601. |
Haider, S., Hall, B.A., and Sansom, M.S. (2006). Simulations of a protein translocation pore: SecY. Biochemistry 45, 13018-13024. |
Hanke, A. Serr, H. J. Kreuzer, R. R. Netz, Stretching single polypeptides: The effect of rotational constraints in the backbone. EPL, 92 (2010). |
Harley, C.A., Holt, J.A., Turner, R., and Tipper, D.J. (1998). Transmembrane protein insertion orientation in yeast depends on the charge difference across transmembrane segments, their total hydrophobicity, and its distribution. J. Biol. Chem. 273, 24963-24971. |
Hedin, L.E., Ojemalm, K., Bernsel, A., Hennerdal, A., Illerga° rd, K., Enquist, K., Kauko, A., Cristobal, S., von Heijne, G., Lerch-Bader, M., et al. (2010). Membrane insertion of marginally hydrophobic transmembrane helices depends on sequence context. J. Mol. Biol. 396, 221-229. |
Heinrich, S.U., Mothes, W., Brunner, J., and Rapoport, T.A. (2000). The Sec61p complex mediates the integration of a membrane protein by allowing lipid partitioning of the transmembrane domain. Cell 102, 233-244. |
Heritage, D., and Wonderlin, W.F. (2001). Translocon pores in the endoplasmic reticulum are permeable to a neutral, polar molecule. J. Biol. Chem. 276, 22655-22662. |
Hessa T, White SH, von Heijne G: Membrane insertion of a potassium-channel voltage sensor. Science. 2005, 307:1427. |
Hessa, T., Kim, H., Bihlmaier, K., Lundin, C., Boekel, J., Andersson, H., Nilsson, I., White, S.H., and von Heijne, G. (2005). Recognition of transmembrane helices by the endoplasmic reticulum translocon. Nature 433, 377-381. |
Hessa, T., Meindl-Beinker, N.M., Bernsel, A., Kim, H., Sato, Y., Lerch-Bader, M., Nilsson, I., White, S.H., and von Heijne, G. (2007). Molecular code for transmembrane- helix recognition by the Sec61 translocon. Nature 450, 1026-1030. |
Hessa, T., Monné , M., and von Heijne, G. (2003). Stop-transfer efficiency of marginally hydrophobic segments depends on the length of the carboxyterminal tail. EMBO Rep. 4, 178-183. |
Higy, M., Gander, S., and Spiess, M. (2005). Probing the environment of signalanchor sequences during topogenesis in the endoplasmic reticulum. Biochemistry 44, 2039-2047. |
Hikita, C., and Mizushima, S. (1992). Effects of total hydrophobicity and length of the hydrophobic domain of a signal peptide on in vitro translocation efficiency. J. Biol. Chem. 267, 4882-4888. |
Hizlan, D., Robson, A., Whitehouse, S., Gold, V.A., Vonck, J., Mills, D., Kü hlbrandt, W., and Collinson, I. (2012). Structure of the SecY complex unlocked by a preprotein mimic. Cell Rep. 1, 21-28. |
Hoover, J. Lubkowski, DNAWorks: an automated method for designing oligonucleotides for PCR-based gene synthesis. Nucleic Acids Res. 30, e43. |
Hummer, G. (2004). From transition paths to transition states and rate coefficients. J. Chem. Phys. 120, 516-523. |
Huopaniemi, I., Luo, K., Ala-Nissila, T., and Ying, S.-C. (2006). Langevin dynamics simulations of polymer translocation through nanopores. J. Chem. Phys. 125, 124901. |
Jungnickel, B., and Rapoport, T.A. (1995). A posttargeting signal sequence recognition event in the endoplasmic reticulum membrane. Cell 82, 261-270. |
Junne, T., Kocik, L., and Spiess, M. (2010). The hydrophobic core of the Sec61 translocon defines the hydrophobicity threshold for membrane integration. Mol. Biol. Cell 21, 1662-1670. |
Junne, T., Schwede, T., Goder, V., and Spiess, M. (2007). Mutations in the Sec61p channel affecting signal sequence recognition and membrane protein topology. J. Biol. Chem. 282, 33201-33209. |
Kida, Y., Morimoto, F., Mihara, K., and Sakaguchi, M. (2006). Function of positive charges following signal-anchor sequences during translocation of the N-terminal domain. J. Biol. Chem. 281, 1152-1158. |
Kim, S.J., Mitra, D., Salerno, J.R., and Hegde, R.S. (2002). Signal sequences control gating of the protein translocation channel in a substrate-specific manner. Dev. Cell 2, 207-217. |
Klauda, J.B., Venable, R.M., Freites, J.A., O'Connor, J.W., Tobias, D.J., Mondragon-Ramirez, C., Vorobyov, I., MacKerell, A.D., Jr., and Pastor, R.W. (2010). Update of the CHARMM all-atom additive force field for lipids: validation on six lipid types. J. Phys. Chem. B 114, 7830-7843. |
Klock, S. A. Lesley, The Polymerase Incomplete Primer Extension (PIPE) method applied to high-throughput cloning and site-directed mutagenesis. Methods Mol. Biol. 498, 91 (2009). |
Krautler, V., Van Gunsteren, W.F., and Hunenberger, P.H. (2001). A fast SHAKE: Algorithm to solve distance constraint equations for small molecules in molecular dynamics simulations. J. Comput. Chem. 22, 501-508. |
Kremer K, Grest GS: Dynamics of entangled linear polymer melts: A molecular-dynamics simulation. J Chem Phys. 1990, 92:5057-5086. |
Krogh, B. Larsson, G. von Heijne, E. L. Sonnhammer, Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567 (2001). |
Kyte, J., and Doolittle, R.F. (1982). A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105-132. |
Lewinson, A. T.Lee, D.C. Rees, J. Mol. Biol. 377, 62 (2008). |
Li, M.S., and Cieplak, M. (1999). Folding in two-dimensional off-lattice models of proteins. Phys. Rev. E Stat. Phys. Plasmas Fluids Relat. Interdiscip. Topics 59, 970-976. |
Liebermeister, W., Rapoport, T.A., and Heinrich, R. (2001). Ratcheting in posttranslational protein translocation: a mathematical model. J. Mol. Biol. 305, 643-656. |
Luo, K., Ala-Nissila, T., Ying, S.-C., and Bhattacharya, A. (2007). Influence of polymer-pore interactions on translocation. Phys. Rev. Lett. 99, 148102. |
Luo, K., Ala-Nissila, T., Ying, S.-C., and Bhattacharya, A. (2008). Sequence dependence of DNA translocation through a nanopore. Phys. Rev. Lett. 100, 058101. |
MacKerell, A.D., Bashford, D., Bellott, M., Dunbrack, R.L., Evanseck, J.D., Field, M.J., Fischer, S., Gao, J., Guo, H., Ha, S., et al. (1998). All-atomempirical potential for molecular modeling and dynamics studies of proteins. J. Phys. Chem. B 102, 3586-3616. |
Maifeld, S.V., MacKinnon, A.L., Garrison, J.L., Sharma, A., Kunkel, E.J., Hegde, R.S., and Taunton, J. (2011). Secretory protein profiling reveals TNF-a inactivation by selective and promiscuous Sec61 modulators. Chem. Biol. 18, 1082-1088. |
Matlack, K.E., Misselwitz, B., Plath, K., and Rapoport, T.A. (1999). BiP acts as a molecular ratchet during posttranslational transport of prepro-alpha factor across the ER membrane. Cell 97, 553-564. |
Meindl-Beinker, N.M., Lundin, C., Nilsson, I., White, S.H., and von Heijne, G. (2006). Asn- and Asp-mediated interactions between transmembrane helices during translocon-mediated membrane protein assembly. EMBO Rep. 7, 1111-1116. |
Moon, C.P., and Fleming, K.G. (2011). Side-chain hydrophobicity scale derived from transmembrane protein folding into lipid bilayers. Proc. Natl. Acad. Sci. USA 108, 10174-10177. |
Muthukumar, M. (1999). Polymer translocation through a hole. J. Chem. Physiol. 111, 10371-10374. |
Nilsson, I., and von Heijne, G. (1990). Fine-tuning the topology of a polytopic membrane protein: role of positively and negatively charged amino acids. Cell 62, 1135-1141. |
Ojemalm, K., Higuchi, T., Jiang, Y., Langel, U., Nilsson, I., White, S.H., Suga, H., and von Heijne, G. (2011). Apolar surface area determines the efficiency of translocon-mediated membrane-protein integration into the endoplasmic reticulum. Proc. Natl. Acad. Sci. USA 108, E359-E364. |
Panja, D., Barkema, G.T., and Ball, R.C. (2007). Anomalous dynamics of unbiased polymer translocation through a narrow pore. J. Phys. Condens. Matter 19, 432202-432202. |
Park, E., and Rapoport, T.A. (2011). Preserving the membrane barrier for small molecules during bacterial protein translocation. Nature 473, 239-242. |
Parks, G.D., and Lamb, R.A. (1991). Topology of eukaryotic type II membrane proteins: importance of N-terminal positively charged residues flanking the hydrophobic domain. Cell 64, 777-787. |
Plath, K., Mothes, W., Wilkinson, B.M., Stirling, C.J., and Rapoport, T.A. (1998). Signal sequence recognition in posttranslational protein transport across the yeast ER membrane. Cell 94, 795-807. |
Ramadurai, S., Holt, A., Krasnikov, V., van den Bogaart, G., Killian, J.A., and Poolman, B. (2009). Lateral diffusion of membrane proteins. J. Am. Chem. Soc. 131, 12650-12656. |
Ramasamy S, Abrol R, Subway CJ, Clemons Jr WM: The glove-like structure of the conserved membrane protein TatC provides insight into signal sequence recognition in twin-arginine translocation. Structure. 2013, 21:777-788. |
Rapoport, Nature 450, 663 (2007). |
Rapoport, T.A., Goder, V., Heinrich, S.U., and Matlack, K.E. (2004). Membrane-protein integration and the role of the translocation channel. Trends Cell Biol. 14, 568-575. |
Rollauer et al. Nature 492, 210 (2012). |
Rutkowski, D.T., Lingappa, V.R., and Hegde, R.S. (2001). Substrate-specific regulation of the ribosome-translocon junction by N-terminal signal sequences. Proc. Natl. Acad. Sci. USA 98, 7823-7828. |
Rychkova, A., Vicatos, S., and Warshel, A. (2010). On the energetics of translocon-assisted insertion of charged transmembrane helices into membranes. Proc. Natl. Acad. Sci. USA 107, 17598-17603. |
sä ä f, A., Wallin, E., and von Heijne, G. (1998). Stop-transfer function of pseudo-random amino acid segments during translocation across prokaryotic and eukaryotic membranes. Eur. J. Biochem. 251, 821-829. |
Sanders, S.L., Whitfield, K.M., Vogel, J.P., Rose, M.D., and Schekman, R.W. (1992). Sec1p and BiP directly facilitate polypeptide translocation into the ER. Cell 69, 353-365. |
Schlegel et al. Microb. Biotechnol. 3, 403 (2010). |
Scott, L. Kummer, D. Tremmel, A. Pluckthun, Current opinion in chemical biology 17, 427 (2013). |
Seiser, R.M., and Nicchitta, C.V. (2000). The fate of membrane-bound ribosomes following the termination of protein synthesis. J. Biol. Chem. 275, 33820-33827. |
Seepä lä , S., Slusky, J.S., Lloris-Garcerá , P., Rapp, M., and von Heijne, G. (2010). Control of membrane protein topology by a single C-terminal residue. Science 328, 1698-1700. |
Shan, Y., Klepeis, J.L., Eastwood, M.P., Dror, R.O., and Shaw, D.E. (2005). Gaussian split Ewald: A fast Ewald mesh method for molecular simulation. J. Chem. Phys. 122, 54101-54113. |
Shaw, A.S., Rottier, P.J., and Rose, J.K. (1988). Evidence for the loop model of signal-sequence insertion into the endoplasmic reticulum. Proc. Natl. Acad. Sci. USA 85, 7592-7596. |
Shaw, D.E., Maragakis, P., Lindorff-Larsen, K., Piana, S., Dror, R.O., Eastwood, M.P., Bank, J.A., Jumper, J.M., Salmon, J.K., Shan, Y., and Wriggers, W. (2010). Atomic-level characterization of the structural dynamics of proteins. Science 330, 341-346. |
Simon, S.M., Peskin, C.S., and Oster, G.F. (1992). What drives the translocation of proteins? Proc. Natl. Acad. Sci. USA 89, 3770-3774. |
Smith, M.A., Clemons, W.M., Jr., DeMars, C.J., and Flower, A.M. (2005). Modeling the effects of prl mutations on the Escherichia coli SecY complex. J. Bacteriol. 187, 6454-6465. |
Sonoda, Y., et al., (2010) Tricks of the trade used to accelerate high-resolution structure determination of membrane proteins. FEBS letters 584: 2539-2547. |
Sriraman, S., Kevrekidis, I.G., and Hummer, G. (2005). Coarse master equation from Bayesian analysis of replica molecular dynamics simulations. J. Phys. Chem. B 109, 6479-6484. |
Staple, S. H. Payne, A. L. C. Reddin, H. K. Kreuzer, Model for Stretching and Unfolding the Giant Multidomain Muscle Protein Using Single-Molecule Force Spectroscopy. Phys. Rev. Lett. 101 (2008). |
Sung, W., and Park, P.J. (1996). Polymer translocation through a pore in a membrane. Phys. Rev. Lett. 77, 783-786. |
Tian, P., and Andricioaei, I. (2006). Size, motion, and function of the SecY translocon revealed by molecular dynamics simulations with virtual probes. Biophys. J. 90, 2718-2730. |
Tsukazaki, T., Mori, H., Fukai, S., Ishitani, R., Mori, T., Dohmae, N., Perederina, A., Sugita, Y., Vassylyev, D.G., Ito, K., and Nureki, O. (2008). Conformational transition of Sec machinery inferred from bacterial SecYE structures. Nature 455, 988-991. |
Tuckerman, M., Berne, B.J., and Martyna, G.J. (1992). Reversible multiple time scale molecular dynamics. J. Chem. Phys. 97, 1990-2001. |
Tyedemers, J., Lerner, M., Wiedmann, M., Volkmer, J., and Zimmermann, R. (2003). Polypeptide-binding proteins mediate completion of co-translational protein translocation into the mammalian endoplasmic reticulum. EMBO Rep. 4, 505-510. Cell Reports 2, 927-937, Oct. 25, 2012 a2012 The Authors S11. |
Vaes, W.H., Ramos, E.U., Verhaar, H.J., Cramer, C.J., and Hermens, J.L. (1998). Understanding and estimating membrane/water partition coefficients: approaches to derive quantitative structure property relationships. Chem. Res. Toxicol. 11, 847-854. |
Van den Berg, B., Clemons, W.M., Jr., Collinson, I., Modis, Y., Hartmann, E., Harrison, S.C., and Rapoport, T.A. (2004). X-ray structure of a protein-conducting channel. Nature 427, 36-44. |
von Heijne, G. (1986). The distribution of positively charged residues in bacterial inner membrane proteins correlates with the trans-membrane topology. EMBO J. 5, 3021-3027. |
von Heijne, G. (1989). Control of topology and mode of assembly of a polytopic membrane protein by positively charged residues. Nature 341, 456-458. |
Wagner, M. L. Bader, D. Drew, J. W. de Gier, Trends in Biotechnology 24, 364 (2006). |
Wahlberg, J.M., and Spiess, M. (1997). Multiple determinants direct the orientation of signal-anchor proteins: the topogenic role of the hydrophobic signal domain. J. Cell Biol. 137, 555-562. |
Weeks, J.D., Chandler, D., and Andersen, H.C. (1971). Role of repulsive forces in determining the equilibrium structure of simple liquids. J. Chem. Phys. 54, 5237-5247. |
Wei, D., Yang, W., Jin, X., and Liao, Q. (2007). Unforced translocation of a polymer chain through a nanopore: the solvent effect. J. Chem. Phys. 126, 204901. |
Wimley, W.C., Creamer, T.P., and White, S.H. (1996). Solvation energies of amino acid side chains and backbone in a family of host-guest pentapeptides. Biochemistry 35, 5109-5124. |
Yang, R. Thomson, P. McNeil, R. M. Esnouf, Bioinformatics 21, 3369 (2005). |
Zhang, B., and Miller, T.F., 3rd. (2010). Hydrophobically stabilized open state for the lateral gate of the Sec translocon. Proc. Natl. Acad. Sci. USA 107, 5399-5404. |
Zhang B, Miller TF, 3rd: Long-timescale dynamics and regulation of Sec-facilitated protein translocation. Cell Rep. 2012, 2:927-937. |
Zhang, T. F. Miller, 3rd, Direct simulation of early-stage Sec-facilitated protein translocation. J. Am. Chem Soc. 134, 13700 (2012). |
Zimmer, J., and Rapoport, T.A. (2009). Conformational flexibility and peptide interaction of the translocation ATPase SecA. J. Mol. Biol. 394, 606-612. |
Zimmer, J., Nam, Y., and Rapoport, T.A. (2008). Structure of a complex of the ATPase SecA and the protein-translocation channel. Nature 455, 936-943. |
Restriction Requirement for U.S. Appl. No. 14/301,070, filed Jun. 10, 2014 on behalf of Thomas F. Miller III. dated Sep. 12, 2017. 6 pages. |
Number | Date | Country | |
---|---|---|---|
61833250 | Jun 2013 | US | |
61872103 | Aug 2013 | US |