A computer readable form of the Sequence Listing “P55846US01_SequenceListing.txt” (5,671 bytes), submitted via EFS-WEB and created on May 9, 2018, is herein incorporated by reference.
The invention relates to prediction of misfolded protein epitopes, more precisely unfolding-specific protein epitopes. Unfolding-specific epitopes can arise when a protein has lost at least some of its structure. Misfolded proteins may present such epitopes, while properly folded proteins will not. Particular embodiments provide methods for predicting misfolded protein epitopes which comprise: conducting molecular-dynamics-based simulations that impose a collective coordinate bias (e.g. a globally imposed collective coordinate bias) on a protein (or peptide-aggregate) to force the protein (or peptide-aggregate) to unfold; and then predicting unfolded protein epitopes based on detection of unfolded regions within the partially unstructured proteins (or peptide aggregates) resulting from the simulations.
Exemplary embodiments are illustrated in referenced figures of the drawings. It is intended that the embodiments and figures disclosed herein are to be considered illustrative rather than restrictive.
Throughout the following description, specific details are set forth in order to provide a more thorough understanding to persons skilled in the art. However, well known elements may not have been shown or described in detail to avoid unnecessarily obscuring the disclosure. Accordingly, the description and drawings are to be regarded in an illustrative, rather than a restrictive, sense.
Aspects of this disclosure provide methods and systems for prediction of misfolded protein epitopes. Proteins, or peptide aggregates, typically exhibit so-called native structure or fibril structure respectively. This disclosure refers to both native structure and fibril structure as the “native structure” when it is clear from the context. Typically, the native structure of a protein is stabilized by interactions (referred to as contacts) between various parts of the protein. Particular embodiments provide methods for predicting unfolding-specific protein epitopes which comprise conducting molecular-dynamics-based simulations which impose a collective coordinate bias on a protein (or peptide-aggregate) to force the protein or peptide-aggregate to unfold. In this disclosure and the accompanying claims, unless the context dictates otherwise, a collective coordinate (or collective variable) corresponding to a protein or peptide aggregate is a variable that is based on a plurality of parameters/variables of a molecular-dynamics based model corresponding to the protein or peptide aggregate. The collective coordinate may be global to the protein or peptide aggregate under consideration. In this disclosure and the accompanying claims, unless the context dictates otherwise, a global collective coordinate (or for brevity a global coordinate) refers to a collective coordinate that depends on the parameters/variables associated with the atoms of a model (e.g. a molecular-dynamics based model) corresponding to at least a substantial portion of the protein or peptide aggregate without selection, weighting or the like of the parameters/variables corresponding to any sub-portion of the substantial portion of the protein or peptide aggregate based on geometrical/spatial criteria associated with the atoms, the location(s) of the atoms in the primary sequence, the secondary structure of particular atoms or the like. The substantial portion of the protein or peptide aggregate may comprise all of the protein or peptide aggregate or all but the boundary structure, as meant to apply to appropriate boundary conditions (e.g. edge residues or edge peptide chains) of the protein or peptide aggregate. A non-limiting example of a global collective coordinate would involve the root mean squared deviation (RMSD) in the positions of all the alpha-Carbon atoms in a protein structure relative to the corresponding positions in the native structure. Two non-limiting examples of collective coordinates that are local rather than global would be the following: 1) the RMSD in the positions of all the alpha-Carbon atoms that are only within the hydrophobic core of the protein, 2) the RMSD of only the alpha-carbons that are in the turn regions of the secondary structure. Both of these examples have additional restrictive conditions on the selection of the atoms that have taken into account a priori information about select parts or subsets of the native or fibril structure, whereas the global coordinate above does not utilize any a priori biased weighting on sub-portions of the native structure.
After imposing the collective coordinate bias which forces the protein or peptide aggregate to unfold, methods according to some aspects of the invention comprise predicting unfolded protein epitopes based on detection of unfolded regions of the partially unstructured (i.e. not natively structured or fibrillary structured) protein or peptide aggregate which result from the simulations. In some embodiments, a globally applied collective coordinate bias forces the protein or peptide aggregate to have fewer or different contacts than in the native structure, while allowing the protein to adopt its own misfolded (non-native) structure in response to the globally applied collective coordinate bias or, if no non-native contacts are adopted by the disrupted protein system, to unfold in some regions preferred by the energy function of the protein.
Some aspects of this disclosure provide computer-based systems and methods for identifying one or more epitopes unique to a protein or set of proteins or peptide chains exhibiting partial local unfolding from a native structure or aggregated structure. As is understood, aggregated structures (also referred to as peptide aggregates or fibrils) comprise pluralities (e.g. 3, 5, 10, 100 or 1000) of peptide chains, including possibly proteins, which aggregate (e.g. at relatively high concentrations). While the individual peptide chains that form an aggregate structure may or may not have their own native structures, the aggregated structure typically has one or more “native” fibril structures which may depend on peptide chains involved, the conditions under which the peptide chains aggregate and possibly on stochastic factors, such as, by way of non-limiting example, random conformations of individual peptide chains. In this disclosure and the accompanying claims, unless the context dictates otherwise, proteins, peptide-aggregates, fibrils and aggregated structures may be referred to herein as proteins and the native structures of proteins, peptide-aggregates, fibrils and/or aggregate structures may be referred to herein as native structures, without loss of generality.
In accordance with some aspects and embodiments of the invention, methods are provided wherein a molecular dynamics-based or Monte-Carlo sampling-based model of a protein is induced to partially disorder from its native structure by biasing (e.g. increasing, decreasing or otherwise varying or manipulating) an externally applied (target) collective coordinate. In some aspects or embodiments, the collective coordinate is a global collective coordinate. In some aspects or embodiments, the collective coordinate is indicative of (e.g. correlated with, a function of, capable of quantifying, capable of ordering or otherwise indicative of) a degree of similarity to the native structure and/or a degree of deviation from the native structure. Non-limiting examples of global collective coordinates include variables based on: a number of stabilizing interactions (contacts) between heavy (non-hydrogen) atoms of the protein (or peptide aggregate) of any particular protein structure from among the contacts in the native structure; a number of stabilizing interactions (contacts) between hydrogen atoms in any particular protein structure from among the contacts between hydrogen atoms in the native structure; distances between all heavy atoms of a particular protein structure relative to the distances between the heavy atoms in the native structure; the root-mean square structural deviation (RMSD) of a particular protein structure relative to the RMSD of its native structure, as defined through the position of the alpha carbon atoms; the RMSD of a particular protein structure relative to its native structure, as defined through the position of the heavy atoms; the total solvent accessible surface area (SASA) of a particular protein structure relative to its native structure; the number of backbone hydrogen bonds in a particular protein structure from among the number of backbone hydrogen bonds in the native structure of the protein; combinations of the foregoing; and/or the like.
Some aspects and embodiments of the invention involve biasing an externally applied (target) collective coordinate and forcing the molecular dynamics-based model of the protein to reorganize its structure to conform to the biased target collective coordinate. Forcing the molecular dynamics-based model to reorganize its structure to conform to the biased target collective coordinate may be accomplished, for example, by forcing the molecular dynamics-based model to minimize a cost function (also referred to as a biasing potential function), where the cost function may depend on a difference between the actual collective coordinate (determined from the molecular dynamics-based model) and the biased target collective coordinate. Forcing the molecular dynamics-based model to reorganize its structure to conform to the biased target collective coordinate may be referred to as applying or imposing a biasing potential or applying or imposing a collective coordinate bias.
Where the applied biasing potential is based on a global collective coordinate, the protein typically does not lose its native structure homogeneously, but instead will lose its native structure (i.e. unfold and possibly misfold) in specific region(s) that are thermodynamically the most prone to disorder. Such region(s) may correspond to those region(s) having relatively weak free energy of stabilization compared to other regions of the protein. The region(s) that disorder upon application of the global biasing potential may comprise misfolding-specific or unfolding-specific epitopes—i.e. epitopes present only in the absence of native structure (e.g. present in the unfolded or misfolded structure, but not present in the native structure) for those region(s).
Aspect of the invention involve the application of collective coordinate bias to structural models of proteins which transforms the structural protein models to exhibit partially unfolded structures that are different from their native structures. The transformation based on collective coordinate bias may be applied globally to at least a substantial portion of the protein model in such a way that bias and corresponding transformation are impartial as to where, within the substantial portion of the protein model, unfolding occurs. The transformed (partially unfolded) structural protein model may then be analyzed to detect indicia of localized unfolding and to identify candidate epitopes, where the candidate epitopes exhibit indicia of localized unfolding.
Aspects of this disclosure provide systems and methods for predicting misfolding-specific, or additionally or alternatively, oligomer-specific, epitopes for a variety of amyloidogenic, neurodegenerative diseases including Alzheimer's disease, ALS, transthyretin amyloid polyneuropathy, as well as partially unfolded, cancer cell-specific epitopes including cell surface receptors such as epidermal growth factor receptors (EGFR), death receptors, and cluster of differentiation proteins. Specific and non-limiting example epitopes predicted in accordance with the systems and methods disclosed herein in aged or disrupted Aβ fibril include, without limitation: residues 13-18 or sequence HHQKLV (SEQ ID NO: 1); residues 6-9 or sequence HDSG (SEQ ID NO: 2), residues 13-16 or sequence HHQK (SEQ ID NO: 3), residues 15-18 or QKLV (SEQ ID NO: 4), residues 21-24 or AEDV (SEQ ID NO: 5), and residues 37-40 (specifically in Aβ42) or GGVV (SEQ ID NO: 6). Antibodies will target these epitopes based on both their sequence identity, and their conformation. Segments of primary sequence that have unfolded from the native structure or fibril are conformationally distinct from corresponding segments in the context of the native structure or fibril. Antibodies targeting such regions will not be raised to the native structure or fibril, but will be raised to peptide scaffolds of the foregoing primary sequences that mimic the unfolded structural ensemble. Antibodies that bind to unfolding-specific epitopes (i.e. that are selected based on the criterion that they unfolded from the fibril upon external perturbation) will not bind to the epitope in the context of the native structural conformation, but will only bind to epitope when it is unstructured. If antibodies are raised to a cyclic peptide, then they may also be selective against the unfolded, monomeric form of the peptide chain, for example selective against monomeric Aβ42.
Some misfolded proteins implicated in both neurodegenerative and systemic amyloid-related diseases appear to exhibit fibrils with a significant degree of native structure, including, by way of non-limiting example, transthyretin, β2-microglobulin, and superoxide dismutase. Such exhibition of fibrils with a significant degree of properly folded, putative-native structure suggests that local, rather than global protein unfolding may play a significant role in these diseases.
Other neurodegenerative diseases appear to involve the aggregation of intrinsically disordered peptides, such as Aβ peptide in Alzheimer's disease, and α-synuclein in Parkinson's disease. However, plaques (i.e. collections of fibrils) predominantly comprising Aβ peptide and neurofibrillary tangles predominantly comprising τ-protein occur with advanced age in most individuals, without any presentation of dementia. On the other hand, intracerebral injection of mice with dilute brain extracts containing Aβ seeds have been observed to induce the phenotypic symptoms of Alzheimer's disease, including plaque deposition and cerebral Aβ angiopathy. Such evidence points to the toxicity of heterogeneous sera of Aβ that may contain oligomers of various size and polymorphic structure, but to the relatively inert function of large fibrils acting by themselves. These findings are consistent with those in prion biology wherein oligomers of prion protein rather than fibrils have been found to be most infectious. Large fibrils may then play a protective role by sequestering AP peptide.
In the presence of Aβ monomers however, fibrils can act as nucleation substrates for oligomeric growth and spread. This “secondary nucleation” process has been found by kinetic studies using S-radiolabelled peptides to be dominant source of toxic oligomeric species, more so than direct nucleation between Aβ monomers or fibril fragmentation. Together the above evidence suggests that fibrils may present interaction sites that have the propensity to catalyze oligomerization, but that this may be strain-specific, and may only occur when selective fibril surface not present in normal patients is exposed and thus able to have aberrant interactions with the monomer (i.e. is presented to the monomer). Environmental challenges such as low pH, osmolytes present during inflammation, or oxidative damage may induce disruption in fibrils that can lead to exposure of more weakly-stable regions. There is an interest, then, to predict these weakly-stable regions, and use such predictions to rationally design therapeutics that could target them.
In the context of cancer, there are several lines of evidence that mutation or deletion-induced misfolding of proteins can play a role, either by destabilizing proteins involved in pro-apoptotic pathways, or by altering the function of cell-surface proteins such as growth factors so that they are constitutively active. The presence of molecular crowding, low pH, and reactive oxygen species all contribute to an anomalous environment that will destabilize protein structure, rendering proteins in neoplastic cells prone to more frequent structural disruption.
Misfolded proteins in the context of a neoplasm may present cancer cell-selective antigenic targets; antibodies directed against these targets, rather than against the native protein, may avoid unwanted side effects due to unintentional targeting of folded protein(s) in healthy tissue. Native antibody therapies to EGFR, for example, may antagonize EGF signaling in healthy tissue: the majority (45-100%) of patients receiving EGFR inhibitors develop a papulopustular rash, a smaller fraction develop paronychia and mucositis, and a small number develop severe reactions with life-threatening superinfection of skin lesions. An ideal antibody-based antineoplastic may avoid these adverse reactions by selectively antagonizing EGFR signaling in tumor tissue while sparing EGFR in normal tissue.
In the context of Alzheimer's disease, the above evidence motivates the general desire for prediction of locally-disordered regions of Aβ fibril that may act as “hot-spots” for secondary nucleation, or recruitment sites of Aβ monomer. Regions likely to be disrupted in the fibril may also be good candidates for passively exposed regions in toxic, oligomeric species. As well, the fact that natively-folded proteins may retain a significant degree of native structure when aggregating motivates the prediction of regions in the natively folded structure that are prone to disorder and to thereby lose their native structure, and may act as candidate regions for intermolecular non-native interactions. In the context of cancer, the disruptive influence of the anomalous environment in neoplastic cells provides motivation to predict locally-disordered regions of proteins disregulated in cancer, which may act as cancer-cell specific targets for small molecule or antibody therapies.
Aspects of this disclosure provide computer-based systems and methods to predict contiguous protein regions (epitopes) that are prone to disorder. Specific example epitope predictions based on partially-disrupted Aβ fibrils are described in more detail below.
Force fields parameterized quantum-mechanically (e.g. using a molecular dynamics model (also known as a molecular dynamics engine), such as, by way of non-liming example, CHARMM (Chemistry at HARvard Macromolecular Mechanics, http://www.charmm.org/) and/or the like are now sufficiently accurate to reproduce experimental folded protein structures de novo (i.e. to fold proteins). The force-fields used to fold proteins that are parameterized by quantum chemical computer representations tend to be the most accurate near or around the proteins' respective native structures. Some embodiments of the invention apply the techniques described herein within such contexts (i.e. near or around the native structure) or in relation to partial structural perturbations from this native structure (e.g. the native structure with thermal motion). Hence the known force-fields used in the molecular dynamics models and used in such embodiments are being applied within their range of validity.
Aspects of this disclosure characterize local unfolding events, in which a protein region deviates structurally from its native structure. Aspects of the invention impose a challenge (based on some anomalous environmental queue) to a molecular dynamics based model of a structured protein, such that, in response, the protein begins to unfold or misfold. To effect such techniques, aspects of this disclosure employ a technique referred to herein as collective coordinate biasing, which involves biasing (e.g. increasing, decreasing or otherwise varying or manipulating) an externally applied (target) collective coordinate to apply a corresponding biasing potential to the molecular dynamics based protein model. Once the protein begins to unfold, methods according to some aspects of the invention comprise predicting unfolded protein epitopes based on detection of unfolded regions of the partially unstructured protein.
In the illustrated embodiment, block 20 comprises obtaining a structural model 22 of the protein to be subjected to the method (e.g. a protein which may be implicated or otherwise considered to be associated with a particular disease). Structural model 22 may comprise a computer representation of the subject protein suitable for use with the molecular dynamics engine which performs block 30 (discussed in more detail below). Structural model 22 and its associated computer representation may specify (in a suitable manner) the physical coordinates (e.g. the x, y and z physical locations) of the nuclei of the atoms in the protein under consideration. Unless the context dictates otherwise, in this disclosure and the accompanying claims, the term structure when applied to a protein (e.g. a protein under consideration in method 10) should be understood to correspond to the physical coordinates (e.g. the x, y and z physical locations) of some or all of the nuclei of the atoms in the protein and/or to some computer representation of such physical coordinates. Structural model 22 obtained as a part of the block 20 modelling parameter inputs may provide, dictate or express the “native” structure for the protein under consideration, which may be subject to a collective coordinate bias by the simulation performed in block 25 to provide updated structure models 32, as described in more detail below. Structural model 22 may comprise an experimentally-determined set of nuclear coordinates or may be determined computationally. In some embodiments, structural model 22 may be obtained from the protein data bank (PDB, such as that available at www.rcsb.org). In some embodiments, structural model 22 obtained as a part of the block 20 modelling parameter inputs may comprise a computer-based representation of a properly folded native protein structure, or it may comprise a computer-based representation of a misfolded and aggregated fibril structure. Structural model 22 may comprise a single protein chain or a plurality of peptide chains which may form aggregated structures (e.g. fibrils). As discussed above, for the sake of brevity, proteins and aggregated structures subjected to method 10 may be referred to in this disclosure and the accompanying claims as a protein or proteins, without loss of generality.
Block 20 also involves obtaining computer representations of the atomic force fields 24 associated with the protein under consideration. Such atomic force fields 24 may be configured for use with the form of the computer representation of structural model 22 and/or the molecular dynamics engine which performs block 30. Force fields 24 may comprise parameterized force field models, such as those provided by CHARMM or similar force field models, such as OPLS (Optimization Potentials for Liquid Simulations), GROMOS (www.gromos.net) and/or the like, which are usable by a corresponding molecular dynamics engine to simulate the structure of a protein. In some embodiments, structural model 22 and atomic force fields 24 may be integrated.
In the illustrated embodiment, block 20 also comprises obtaining collective coordinate and/or simulation parameters 26 which describe how an externally applied target collective coordinate will be biased (e.g. increased, decreased or otherwise varied or manipulated) during the block 25 simulation loop described in more detail below. For example, such collective coordinate bias parameters 26 may specify the rate of change of the target collective coordinate, the amplitude of the change of the target collective coordinate, the maximum and/or minimum value of the target collective coordinate, other parameters of the biasing potential function, such as, by way of non-limiting example, the rigidity (or “spring-constant”) k of the potential function described below and/or the like. Parameters 26 may additionally or alternatively include other simulation parameters of the simulation to be performed in block 25, such as, by way of non-limiting example, the duration and/or time step discretization like of the simulation, the duration of the simulation, and/or the like. In some embodiments, the simulation may force a protein to unfold using metadynamics which involve penalizing conformations similar to those that have already been explored—see, for example, Bonomi et al. PLUMED: A portable plugin for free-energy calculations with molecular dynamics, Computer Physics Communications 180 (2009) 1961-1972, which is hereby incorporated herein by reference. In some such embodiments, parameters of the metadynamics may be part of simulation parameters 26.
After having obtained the modelling parameter inputs in block 20, method 10 proceeds to a simulation loop 25 comprising, in the illustrated embodiment, blocks 30 and 40. In some embodiments, the block 50 analysis step shown in
As will be discussed in more detail below, the loop 25 simulation comprises applying a collective coordinate bias to the protein under consideration and observing the protein over a series of time steps. A global collective coordinate may comprise any suitable function of the atomic positions (e.g. the physical coordinates of the nuclei) and/or the energies which, when biased, applies a globally destabilizing influence to the protein under consideration, thereby inducing loss of native structure. Non-limiting examples of global collective coordinates have been described above.
Updated structural model(s) 32 (also referred to as conformation(s) 32) may refer to the transformed structure(s) of the computer representation of a protein under consideration after one or more iterations of loop 25. In some embodiments, a new conformation 32 is generated in each iteration (e.g. for each time step) of loop 25, in which case conformation(s) 32 shown in
The collective coordinate biasing methods used in method 10 and loop 25 or otherwise described herein may be used to demand (at least approximate to within an acceptable threshold) particular levels of global unfolding from a candidate protein without specifying how or where (within the protein structure) that unfolding is to occur. For example, the collective coordinate may be a global collective coordinate, such that when used to bias the protein under consideration, the global collective coordinate merely requires that the protein achieve global unfolding to track a target collective coordinate while allowing the protein to adopt any local unfolding to achieve the global target. By demanding that a protein is, say, 30% unfolded (and thus 70% folded), method 10 may be used to analyze and draw results from an equilibrium protein structure constrained to be 30% partially-disordered. Where the collective coordinate bias is global (e.g. towards a structure with 30% disorder), the global collective coordinate bias does not specify where or how the protein may become locally disordered to satisfy the 30% disorder constraint. The region(s) of disorder may be adopted by the protein based on the protein's internal energy function or force field (i.e. based on the computer based model representation of the protein) and the requirement that the protein satisfy the collective coordinate bias constraint. As described in more detail below, the localized regions or “hot-spots” of the protein that are prone to becoming disordered (e.g. as may be determined from local unfolding indicia 54 in the illustrated embodiment) may be analyzed in block 50 to provide the method 10 candidate epitope predictions 52. These method 10 candidate epitopes 52 may then serve as antigenic targets, to which therapeutic agents may be designed.
Candidate epitope predictions 52 based on method 10 may be as accurate as the input force fields 24 and computer-based model representations 22 used for the loop 25 simulation. As mentioned above, distributed computing or custom supercomputers can now accurately fold proteins using these force-fields, which supports the accuracy of the force field models 24 and computer-based model representations 22 used as the block 20 inputs to method 10.
An input computer-based structural model 22 (e.g. as obtained from a PDB in block 20) may comprise a set of three dimensional coordinates for all atoms of the protein. Where the input computer-based structural model 22 is a native structural model, it defines a set of native contacts (also referred to herein as initial contacts). A set of initial contacts may be defined to include all (or a set of) pairs of heavy (other than hydrogen) atoms in the native structure model 22 having nuclei which are within a threshold distance (e.g. 4.8 A° or some other suitable distance) of each other. A typical PDB native structure 22 for a protein with primary sequence of length on the order of 100 amino acids may typically have about 2000 initial contacts or thereabouts. In some embodiments, the number of contacts may represent a global collective coordinate used in method 10. In such embodiments, the number of initial contacts may represent the initial value of the actual collective coordinate of the protein under consideration (prior to any iterations of simulation loop 25).
In some embodiments, rather than using a strictly native structure in loop 25, an input protein structure 22 may be equilibrated using an optional equilibrating process 23 (shown in
In some embodiments, for proteins comprising multiple chains (e.g. aggregated structures), method 10 (
In some embodiments, method 10 uses a set of contacts (or a representation of the set of contacts) as a basis for the collective coordinate used force a protein to unfold during the loop 25 simulation. More specifically, in some embodiments, the collective coordinate used for biasing a protein comprises the number of contacts from among a set of initial contacts. An exemplary embodiment using a representation of the set of contacts as a collective coordinate is described below, without loss of generality that the collective coordinate may have other forms. A representation of the initial set of contacts for the loop 25 simulation may generated from input (e.g. native) structure model 22 of the protein under consideration obtained in block 20 and/or from an equilibrated version of the protein under consideration obtained as output of the block 23 equilibration process. A representation of the number of contacts from among the initial set of contacts (and the corresponding collective coordinate output 34 or actual value of the collective coordinate) at any later time step may be determined from the updated structural model 32 in a similar manner. For each heavy atom pair (indexed by ij) in the protein structure under consideration, method 10 may comprise the use of a native contact function Qij(r). In some embodiments, the contact function Qij(r) may comprise a function of the atom pair ij and the distance rij between the atoms of the pair ij. In one particular embodiment, the contact function Qij(r) has the form:
where rij is the distance between the nuclei of atoms i and j in the protein under consideration. The other equation (1) parameters r0, n and m may be suitably selected constants. In some embodiments, m>n. In one particular embodiment, r0=4.8 A° (Angstrom), n=6, and m=12.
There are many functions that have a similar functional form and/or functional characteristics as that of equation (1) shown in
Some embodiments may use a continuous contact function (e.g. the equation (1) contact function) to weight contacts (rather than, for example, a Heaviside or discrete step function), because, as explained in more detail below, it may be desirable to apply a biasing potential as a function of Qij during the loop 25 simulation, where such a potential is implemented as a force (e.g. the derivative of the potential) on individual atom positions. Thus, in some embodiments, it is desirable for Qij to be a differentiable function of r with a well-defined derivative. In some embodiments, a discrete function, such as a Heaviside step function or multiple step variation of on a step function, may be used to describe native contacts. Such a formulation may be amenable to discrete molecular dynamics (DMD) simulation protocols, which generally use step-wise potential functions for inter-atomic interactions.
An actual collective coordinate Q (e.g. collective coordinate output 34 in method 10) for any structure characterized by the set of pairwise distances between heavy atoms (non-hydrogen atoms) {rij} may then be characterized by the equation:
where in equation (2), Qij is given in equation (1), the sum Σinitial is over the pairs of atoms in the input (e.g. native) structure model 22 or from the native structure 22 itself “Initial” in the above equation indicates that the sum is only over those contacts present in the initial native structure (typically a PDB model of the properly folded structure or the fibril structure). In the embodiment described in equation (2) above, the quantity in the denominator of equation (2) is the thermal average of the Qij values in the input (e.g. native) structure model 22 or the equilibrated structure, and the quantity in the equation (2) numerator is the sum of Qij in an arbitrary structure (e.g. of the updated structural model 32 obtained in each iteration of the block 25 loop). The brackets < . . . > in the denominator indicate the equilibrium (thermal) average of the native state, i.e. thermally-occupied structures when running a molecular dynamics simulation starting from the native PDB structure. The quantity Q in equation (2) is typically a number between zero and unity.
Other metrics (e.g. metrics other than equation (2) and/or metrics based on criteria other than contacts) are additionally or alternatively possible to characterize the degree of disorder from a native structure and, consequently, may be used as collective coordinates (e.g. global collective coordinates) in some embodiments. These metrics may comprise, for example, the root mean squared deviation (RMSD) of an updated structure model 32 relative to the native structure model 22, the radius of gyration of an updated structure model 32 relative to the radius of gyration of the native structure 22, the number of backbone hydrogen bonds in the updated structure model 32 from among the backbone hydrogen bonds in the native structure 22, the total solvent-accessible surface area (SASA) of the updated structure model 32 relative to the SASA of the native structure 22, the structural overlap function described by C. J. Camacho and D. Thirumalai. Kinetics and thermodynamics of folding in model proteins. Proc. Natl. Acad. Sci. USA, 90(13):6369-6372, 1 Jul. 1993 (which is hereby incorporated herein by reference), the generalized Euclidean distance from the native structure described by A. Das, B. K. Sin, A. R. Mohazab, and S. S. Plotkin, Unfolded protein ensembles, folding trajectories, and refolding rate prediction. J. Chem. Phys., 139(12):121925, 2013 (which is hereby incorporated herein by reference), functions of one or more of these parameters, and/or the like. In some embodiments, each of these collective coordinates used for within a biasing simulation (e.g. simulation loop 25) may be expressed as a scalar Q. For the sake of brevity, this description refers to the use of a single collective coordinate. However, unless the context dictates otherwise, references to a collective coordinate should be understood to include the possibility of combinations of multiple collective coordinates.
In some embodiments, loop 25 of method 10 comprises asserting a bias potential over a series of time steps as a time-dependent potential of the form:
V(Q,t)=½k(Q−Qc(t))2 (3)
where Qc(t) is a target collective coordinate which may be user-specified and which may be part of collective coordinate/simulation parameters 26 and where Q is the actual collective coordinate of the updated structural model at any given time step. It can be observed that equation (3) potential function has the appearance of the potential energy function of a spring, where the parameter k is similar to a spring constant. It can also be observed that for k>0, the equation (3) potential function increases where the actual collective coordinate Q differs from the target collective coordinate Qc(t). The loop 25 simulation may comprise minimizing the potential function (e.g. minimizing equation (3)) to ensure that the actual collective coordinate Q tracks the target collective coordinate Qc(t). In some embodiments, potential function having other forms which penalize differences between the actual collective coordinate Q and the target collective coordinate Qc(t) may be used in addition to or in the alternative to equation (3). Equation (3) and other potential functions having similar characteristics may be used for any of the collective coordinates described herein.
In some embodiments, the target collective coordinate Qc(t) may comprise a function of time, which starts at the value of Q for the input (e.g. native) structure (which may typically be unity or close to unity), and decreases with time. In some embodiments, Qc(t) may decrease linearly at a rate which may be specified by collective coordinate/simulation parameter(s) 26 to some suitable level. In general, the characteristics of the target collective coordinate Qc(t) may be specified or otherwise configured according to collective coordinate/simulation parameter(s) 26. An exemplary unfolding trajectory of the target collective coordinate Qc(t) as a function of time, and the actual collective coordinate Q of a protein under consideration (e.g. collective coordinate output 34 for each time step) as a function of time, are shown in
The potential V(Q,t) in equation (3) may be implemented (in block 30 of loop 25) by adding this potential to the total energy of the protein under consideration. The protein will try to minimize its free energy, but it will take time to do so; this is one reason for the lag between the target collective coordinate Qc(t) 102 and the actual collective coordinate Q(t) 34 of the protein exhibited in
If the rate of decrease of the target collective coordinate Qc 102 is too rapid, the values of the actual collective coordinate Q 34 characterizing the protein under consideration may deviate substantially from the value of the target collective coordinate Qc 102, and the perturbation on the protein due to V(Q,t) will induce a highly non-equilibrium unfolding process. Some embodiments attempt to maintain a quasi-equilibrium (adiabatic) process as the protein unfolds. The rate of decrease for the target collective coordinate Qc(t) 102 may, in some embodiments, be determined by a condition that the actual collective coordinate Q 34 is not too far different from the target Qc 102. Such a slow (adiabatic) perturbation yields an unfolding process that is governed primarily by the interactions within the protein under consideration, rather than the response to perturbing forces that may be much larger than the stabilizing forces inherent in the protein. In the
There is some freedom in setting the value of the constant kin equation (3). In some embodiments, this value k may be set in a range of 2×104-1×105 kJ/mol, depending on the rate at which the target collective coordinate Qc is changing. In some embodiments, this value k may be set in a range of 4×104-8×104 kJ/mol. In one exemplary embodiment, k is set to be k=6×104 kJ/mol, which provides small deviation of the actual collective coordinate Q 34 from the target collective coordinate Qc 102 (yielding a value of Q−Qc of approximately 0.02), when Qc was changing at a rate of about 0.4 per 15 nanoseconds (see
For a given protein under consideration, some embodiments involve performing the method 10 simulation a number of times (or at least loop 25 a number of times), where each biasing simulation is independent. This is illustrated in
In some embodiments, the fraction f is selected to be greater than 0.8. In some embodiments, the fraction f is selected to be greater than 0.85. In one particular example embodiments, the fraction f is selected to be f=0.87, which would correspond to either 7 of 8 simulations displaying an epitope, 8 of 9 simulations displaying an epitope, or 9 of 10 simulations displaying an epitope, and so on. The number of independent simulations may typically be greater than or equal to 8, although this is not necessary.
When the protein under consideration comprises an aggregated fibril structure, such as the Aβ fibrils described below, a region may be considered to be an epitope if, in a given simulation, the region exhibits one or more indicia of unfolding (e.g. is exposed) in any of the monomers (any of the peptide chains), and that such an epitope is found to be exposed reliably in a fraction g of the simulations. In some embodiments, the fraction g is selected to be greater than 0.8. In some embodiments, the fraction g is selected to be greater than 0.85. In one particular example embodiments, the fraction g is selected to be g=0.87.
If the block 45 inquiry is negative, method 10 proceeds to block 50. Block 50 comprises analyzing the simulation results of the block 25 simulations (e.g. each iteration or run through simulation loop 25) in effort to identify candidate epitopes. In the
For the
For a given chain segment (here residues 23 to 28), each chain (i.e. each column of
As discussed above, ΔSASA for a given simulation represents only one local unfolding indicia 54 (
As mentioned above, Aβ peptide tends to aggregate in several different polymorphic forms. Polymorphism exists for both the fibril form and the ensemble of oligomeric structures.
A number of the example results described herein represent results for a number of A fibril strains, each with its own morphology: a three-fold symmetric structure of 9 Aβ-40 peptides (or monomers) (PDB entry 2M4J), a two-fold symmetric structure of 12 A-40 monomers (PDB entry 2LMN), a single-chain, parallel in-register structure of 12 Aβ-42 monomers (PDB entry 2MXU; disordered N-terminal residues 1-10 have been added to this structural model), and a three-fold symmetric structure of 18 Aβ-40 monomers (PDB entry 2LMP; disordered N-terminal residues 1-8 have been added to this structural model). Two additional computational assays were performed, one on structure 2LMN by adding disordered residues 1-8 at the N-terminus (these are missing from the PDB structure), and one assay on structure 2MXU by constraining the top and bottom monomers along the fibril to remain in their structured conformation, and allowing the middle 10 monomers to disorder. Simulations were performed for each initial structure (using loop 25 of method 10 and the CHARMM force-field parameters described in: K. Vanommeslaeghe, E. Hatcher, C. Acharya, S. Kundu, S. Zhong, J. Shim, E. Darian, O. Guvench, P. Lopes, I. Vorobyov, and A. D. Mackerell. Charmm general force field: A force field for drug-like molecules compatible with the charmm all-atom additive biological force fields. Journal of Computational Chemistry, 31(4):671-690, 2010; and P. Bjelkmar, P. Larsson, M. A. Cuendet, B. Hess, and E. Lindahl. Implementation of the CHARMM force field in GROMACS: analysis of protein stability effects from correlation maps, virtual interaction sites, and water models. J. Chem. Theo. Comp., 6:459-466, 2010, both of which are hereby incorporated herein by reference, with TIP3P water. The simulations included a concentration of 0.1 M NaCl. Each system was equilibrated for 5 ns, during which time Q was measured to provide an initial value of Qc(t=0).
Unless otherwise indicated, the center of the biasing potential was moved to 0.6 of its original value over a time period of 15 ns, during which time the amount of structure initially present reduced systematically as described above, to about 60% of the original value. For one set of initial epitope predictions, the inventor analyzed structures corresponding to about 71% of the initial structure Q(t=0)—e.g. a collective coordinate Q corresponding to about 0.71 of the initial collective coordinate. As discussed above, the proteins under consideration were constrained to have 71% of the initial structure for a time window of typically about 100 ns.
For each protein under consideration, 9 or 10 (or some other suitable number) of independent runs may be performed with each independent run comprising random seeding of the thermostat random number generator of the molecular dynamics engine. Performing 9 or 10 (or some other suitable number) of independent runs gives some assurance that any predicted epitopes are genuine and not a rare or random occurrence. As discussed above, some embodiments comprise identifying an epitope as a potential candidate epitope if any chain exposes the epitope in a fraction f (e.g. f>0.87) of all the runs. After biasing and simulating the evolution of the protein under consideration (block 30), some embodiments comprise analyzing the results by ascertaining the extent to which each residue has unfolded by comparing the change in SASA (or other suitable measure of unfolding as discussed herein), from the initial structure to the structures in the ensemble after biasing. In embodiments which use SASA, some such embodiments may use the side chain surface area for every residue except glycine residues—for glycine, some embodiments may use the total residue surface area (which amounts to the backbone surface area for glycine).
One difference between the Aβ structures contained in the protein databank (PDB) (http://www.rcsb.org) and the empirical systems that the inventor has examined is that the PDB structures do not necessarily comprise all the residues of the chain; this is because some residues are disordered in the empirically determined system and so reliable coordinates cannot be deposited as part of the PDB structure. The structures corresponding to PDB ID 2LMN and PDB ID 2LMP contain only residues 9-40 for each monomer and are missing the N-terminal region consisting of residues 1-8, and the structure corresponding to PDB 2MXU contains only residues 11-42 for each monomer and is missing N-terminal residues 1-10. PDB 2M4J contains all 40 residues for each monomer. For PDB structures with missing N-terminal regions, some embodiments may comprise making final epitope predictions from systems where the disordered N-terminal region is explicitly added in to the PDB structures. The presence of a disordered N-terminal tail can be a potentially important effect because there is a polymeric entropy cost to tether a disordered terminal region to the rest of the structure, due to the steric non-crossing entropy of the polymer with the rest of the ordered protein or fibril. For this reason the predictions, specifically for N-terminal regions of 2LMN using the model with the N-terminus absent, are likely somewhat overemphasized.
Method 102 commences in block 105 which involves determining local unfolding indicia 54 for each residue in the current run and current chain. As discussed above, local unfolding indicia may be determined on the basis of updated structural models 32 determined in the block 25 simulation loop. In the particular case of method 102 of the
Method 102 then proceeds to block 120 where the residue index of the current peptide chain is parsed into a number of groups with each group having a number of residues equal to the current window size. It will be appreciated that for a given chain (having a given residue index), where the block 110 window size is larger, the number of block 120 groups will be lower and vice versa. Method 102 then proceeds to block 130 which initializes (first iteration) or increments (subsequent iterations) a group index counter. The group index counter may also be referred to as a window position or window position index.
Method 102 then proceeds to block 140 which involves an inquiry into whether the current group has a ΔSASA>0 for all residues in the group. If the block 140 inquiry is positive, method 102 proceeds to block 150, where a positive result is recorded for the current group, before ending up in block 170. In some embodiments, block 150 may comprise recording the ΔSASAs of the residues belonging to the current group and/or an accumulated sum of the ΔSASAs of the residues belonging to the current group, although this is not necessary, since this information is available from block 105. If the block 140 inquiry is negative, method 102 proceeds to block 160, where a negative results is recorded for the current group, before ending up in block 170. Block 170 involves an inquiry into whether the current group is the last group in the current chain. If the current group is not the last group, then method 102 loops back to block 130, where the group index is incremented for another iteration. If the block 170 inquiry is positive, then method 102 proceeds to block 180 which involves an inquiry into whether the current window size is the maximum window size to be considered. In some embodiments, the maximum window size is set to 12 residues. In some embodiments, this maximum window size may be 10 residues. If the current window size is not the maximum window size, then method 102 loops back to block 110, where the window size is incremented for another iteration.
If the current window size is the maximum window size, then method 102 concludes and moves on to method 202 of
At the conclusion of the execution of method 102 for each run and for each peptide chain, method 100 may proceed to portion 202 of method 100 shown in
Method 202 commences in block 210 which comprises initializing (in the first iteration) and incrementing (in other iterations) a group index. The block 210 group index may refer to one of the residue groups for which data is obtained in method 102. Method 202 then proceeds to block 220 which involves initializing (in a first iteration) and incrementing (in subsequent iterations) a run index. The block 220 run index may refer to a particular one of the independent runs. Method 202 then proceeds to block 230 which involves an inquiry into whether there is at least one chain, for the current run and the current group, which has a ΔSASA>0 for all of the residues in the current group. This block 230 inquiry is equivalent to inquiring as to whether there is at least one chain, for the current run and the current group, which has a positive result recorded in block 150 (
Block 250 involves an inquiry into whether the current run is the last run. If not, then method 202 loops back to bock 220 where the run index is incremented prior to another iteration. If the block 250 inquiry is positive, then method 202 proceeds to block 260 which involves an inquiry into whether the current residue group is indicated to be a potential candidate epitope in a sufficient fraction f, g of the independent runs. This fraction f, g, may a configurable parameter. As discussed elsewhere herein, some embodiments may consider a group of residues of a protein (to be potential candidate epitope predictions 52) where at least a significant fraction f over the number of independent simulations showed one or more indicia of unfolding of the group upon biasing (e.g. a ΔSASA>0 for all of the residues in the group). As discussed above, where the protein under consideration is an aggregate structure, a group of residues may be considered to be a potential candidate epitope if, in a given simulation, the group of residues exhibits one or more indicia of unfolding in any of the peptide chains (e.g. a ΔSASA>0 for all of the residues in the group), and that such an epitope is found to be exposed reliably in a fraction g of the simulations. If the block 260 inquiry is negative, then method 202 proceeds to block 280 which involves an inquiry into whether the current group is the last group. If the block 280 inquiry is also negative, then method 202 loops back to block 210, where the group index is incremented prior to another iteration of method 202. If the block 260 inquiry is positive, then the current group may be considered to be a potential candidate epitope and method 202 proceeds to block 270.
Block 270 involves generating a data structure comprising data (accumulated local unfolding indicia 272) of the type which is shown In the
Eventually, method 202 proceeds to block 280 (either via the block 260 NO branch or via block 270). When the block 280 inquiry is positive, then method 202 is completed.
As discussed above, the data structures generated by method 202 may be represented in the form of fireplots, such as the exemplary fireplots shown in
At the conclusion of the execution of method 202 (
Method 302 commences in block 310 which involves initializing a window size to be a maximum window size (in a first iteration) and then decrementing the window size in subsequent iterations. In some embodiments, the maximum window size is set at 12 residues in length, meaning that candidate epitopes predicted by method 302 will have a maximum possible length of 12 residues. In some embodiments, the maximum window size is set at 10 residues in length. If it is expected or discovered that a candidate epitope may be longer than 10 or 12 residues, then the maximum window size may be set to a larger number, as appropriate. Initializing the window size to be the maximum window size at the outset of method 302 effectively means that method 302 is starting its search at the top of the Y-axis of the
After initializing the residue index in block 320, method 302 proceeds to block 330 which involves an inquiry into whether the accumulated local unfolding indicia 272 is greater than zero for the current residue index and current window size. In particular embodiments where the local unfolding indicia 54 is ΔSASA, the block 330 inquiry may involve an inquiry into whether the accumulated ΔSASA is greater than zero for the current residue index and current window size. A positive block 330 inquiry corresponds to the existence of a rectangle in a particular row (window size) and column (residue index) of the
If the block 330 inquiry is positive, then method 302 proceeds to block 340, where the group of residues underlying the block 330 “hit” is identified and recorded as a candidate epitope 52 predicted by method 10 (
Method 302 then proceeds to block 350 which involves removal, from further consideration, of the candidate epitope 52 recorded in block 340 and all sub-epitopes that lie within the candidate epitope 52 recorded in block 340. In the case of the illustrated
Block 350 also comprises removing the sub-epitopes that lie within the block 340 candidate epitope 52 In the case of the
After removal of the candidate epitope 52 and sub-epitopes in block 350, method 302 proceeds to block 360 which involves an inquiry as to whether the current residue index is the last residue index (e.g. the last residue in a row of the
It will be appreciated by the above, that method 302 involves scanning row by row down from the top of the
Method 302 continues scanning plot (C) of
Epitope predictions corresponding to the fireplots of
Table 2 shows predicted epitopes for a number of other structures considered by the inventor.
length 5 centered at residue 8 is predicted (HDSGY; SEQ ID NO: 10). Thermal fluctuations may reduce or increase the size of epitope by the order of one residue at future times. Likewise, after 16 ns of equilibration (
The candidate epitopes 52 predicted by the methods described herein may be plotted for a variety of fibril models experimentally considered, and an emergent trend observed, see
For the single full-length structure considered herein, 2M4J, an N-terminal region emerges as an epitope prediction, roughly between residues 5-10. High-affinity polyclonal antibodies have been raised to the region consisting of residues 5-11, and these antibodies have also been observed to bind plaques and reduce neuritic pathology, see Frederique Bard, Robin Barbour, Catherine Cannon, Robert Carretto, Michael Fox, Dora Games, Teresa Guido, Kathleen Hoenow, Kang Hu, Kelly Johnson-Wood, Karen Khan, Dora Kholodenko, Celeste Lee, Mike Lee, Ruth Motter, Minh Nguyen, Amanda Reed, Dale Schenk, Pearl Tang, Nicki Vasquez, Peter Seubert, and Ted Yednock. Epitope and isotype specificities of antibodies to b-amyloid peptide for protection against Alzheimer's disease-like neuropathology. Proc. Natl. Acad. Sci. USA, 100(4):2023-2028, 2003.
A novel consensus-based epitope emerges from
In addition to or in the alternative to using SASA as a measure of disorder or exposure (local unfolding indicia 54), some embodiments may comprise considering the loss of contacts (from among the contacts in the native structure 22) as a local unfolding indicia 54. In this approach, the biasing simulations may be the same but the block 50 analysis may be slightly different. Instead of evaluating candidate epitopes by requiring that an epitope show increased ΔSASAn each residue for at least one chain in each simulation, such embodiments may comprise evaluating candidate epitopes by requiring that an epitope show a decrease in contacts (from among the contacts in the native structure 22) for each residue for at least one chain in each simulation. In practice, some embodiments may comprise setting a threshold, so that each residue not only has to decrease the number of contacts (from among the contacts in the native structure 22), but the change must be larger than some value—typically—0.5-1 contacts/atom.
The inventor has examined the potential effects of selecting subsets of the full number of residues, and not adding an N-terminal region. In some embodiments, the simulation parameters by default assign one proton unit of positive charge to the N-terminal residue, however charge-charge repulsion might enhance disorder in the N-terminal region.
Residues 25 to 29 in structure 2M4J form a turn between the two β sheets in the original structure. This region becomes exposed by breaking contacts with the N-terminal regions of the adjacent chains (
The Aβ42 structure 2MXU is a fibril that is 12 monomers long, which allows for an examination of the differences between end monomers and those in the middle. The residues 1 to 10 missing from the PDB structure have been reconstructed and added. The inventor has found that the end monomers of the 2MXU structure are much more prone to disorder those in the middle, which can be seen in
A snapshot of the disordered structure superimposed on the initial structure for PDB 2LMN is shown in
The methods described herein may be applied to single-chain proteins. In one example experiment, the methods described herein were applied to a system constituting superoxide dismutase 1 (SOD1) lacking metals, but containing a disulfide bond between cysteines 57 and 146. The protein was biased on the global coordinate corresponding to the number of total contacts, and the target collective coordinate was reduced to a value of Qc=0.65. The protein was then held at Qc=0.65 and subsequently equilibrated for 90 ns. Snapshots were recorded every 20 ps, and the ΔSASA for each residue was measured in this ensemble of 4500 configurations. The procedure described in
Input may be obtained by computer 502 via any of its input mechanisms, including, without limitation, by any input device 508, from accessible memory 518, from network 516 or by any other suitable input mechanism. The outputs may be output from computer 502 via any of its output mechanisms, including, without limitation, by any output device 512, to accessible memory 518, to network 516 or to any other suitable output mechanism. As discussed above,
The methods described herein may be implemented by computers comprising one or more processors and/or by one or more suitable processors, which may, in some embodiments, comprise components of suitable computer systems. By way of non-limiting example, such processors could comprise part of a computer-based automated contract valuation system. In general, such processors may comprise any suitable processor, such as, for example, a suitably configured computer, microprocessor, microcontroller, digital signal processor, field-programmable gate array (FPGA), other type of programmable logic device, pluralities of the foregoing, combinations of the foregoing, and/or the like. Such a processor may have access to software which may be stored in computer-readable memory accessible to the processor and/or in computer-readable memory that is integral to the processor. The processor may be configured to read and execute such software instructions and, when executed by the processor, such software may cause the processor to implement some of the functionalities described herein.
Certain implementations of the invention comprise computer processors which execute software instructions which cause the processors to implement a controller and/or perform a method of the invention. For example, one or more processors in a computer system may implement data processing steps in the controllers and/or methods described herein by executing software instructions retrieved from a program memory accessible to the processors. The invention may also be provided in the form of a program product. The program product may comprise any medium which carries a set of computer-readable signals comprising instructions which, when executed by a data processor, cause the data processor to implement a controller and/or execute a method of the invention. Program products according to the invention may be in any of a wide variety of forms. The program product may comprise, for example, physical (non-transitory) media such as magnetic data storage media including floppy diskettes, hard disk drives, optical data storage media including CD ROMs, DVDs, electronic data storage media including ROMs, flash RAM, or the like. The instructions may be present on the program product in encrypted and/or compressed formats.
Where a component (e.g. a software module, controller, processor, assembly, device, component, circuit, etc.) is referred to above, unless otherwise indicated, reference to that component (including a reference to a “means”) should be interpreted as including as equivalents of that component any component which performs the function of the described component (i.e., that is functionally equivalent), including components which are not structurally equivalent to the disclosed structure which performs the function in the illustrated exemplary embodiments of the invention.
Unless the context clearly requires otherwise, throughout the description and the
Embodiments of the invention may be implemented using specifically designed hardware, configurable hardware, programmable data processors configured by the provision of software (which may optionally comprise “firmware”) capable of executing on the data processors, special purpose computers or data processors that are specifically programmed, configured, or constructed to perform one or more steps in a method as explained in detail herein and/or combinations of two or more of these. Examples of specifically designed hardware are: logic circuits, application-specific integrated circuits (“ASICs”), large scale integrated circuits (“LSIs”), very large scale integrated circuits (“VLSIs”), and the like. Examples of configurable hardware are: one or more programmable logic devices such as programmable array logic (“PALs”), programmable logic arrays (“PLAs”), and field programmable gate arrays (“FPGAs”)). Examples of programmable data processors are: microprocessors, digital signal processors (“DSPs”), embedded processors, graphics processors, math co-processors, general purpose computers, server computers, cloud computers, mainframe computers, computer workstations, and the like. For example, one or more data processors in a computer system for a device may implement methods as described herein by executing software instructions in a program memory accessible to the processors.
Processing may be centralized or distributed. Where processing is distributed, information including software and/or data may be kept centrally or distributed. Such information may be exchanged between different functional units by way of a communications network, such as a Local Area Network (LAN), Wide Area Network (WAN), or the Internet, wired or wireless data links, electromagnetic signals, or other data communication channel.
For example, while processes or blocks are presented in a given order, alternative examples may perform routines having steps, or employ systems having blocks, in a different order, and some processes or blocks may be deleted, moved, added, subdivided, combined, and/or modified to provide alternative or subcombinations. Each of these processes or blocks may be implemented in a variety of different ways. Also, while processes or blocks are at times shown as being performed in series, these processes or blocks may instead be performed in parallel, or may be performed at different times.
In addition, while elements are at times shown as being performed sequentially, they may instead be performed simultaneously or in different sequences. It is therefore intended that the following claims are interpreted to include all such variations as are within their intended scope.
Embodiments of the invention may also be provided in the form of a program product. The program product may comprise any non-transitory medium which carries a set of computer-readable instructions which, when executed by a data processor, cause the data processor to execute a method of the invention. Program products according to the invention may be in any of a wide variety of forms. The program product may comprise, for example, non-transitory media such as magnetic data storage media including floppy diskettes, hard disk drives, optical data storage media including CD ROMs, DVDs, electronic data storage media including ROMs, flash RAM, EPROMs, hardwired or preprogrammed chips (e.g., EEPROM semiconductor chips), nanotechnology memory, or the like. The computer-readable signals on the program product may optionally be compressed or encrypted.
In some embodiments, the invention may be implemented in software. For greater clarity, “software” includes any instructions executed on a processor, and may include (but is not limited to) firmware, resident software, microcode, and the like. Both processing hardware and software may be centralized or distributed (or a combination thereof), in whole or in part, as known to those skilled in the art. For example, software and other modules may be accessible via local memory, via a network, via a browser or other application in a distributed computing context, or via other means suitable for the purposes described above.
Where a component (e.g. a software module, processor, assembly, device, circuit, etc.) is referred to above, unless otherwise indicated, reference to that component (including a reference to a “means”) should be interpreted as including as equivalents of that component any component which performs the function of the described component (i.e., that is functionally equivalent), including components which are not structurally equivalent to the disclosed structure which performs the function in the illustrated exemplary embodiments of the invention.
Where a record, field, entry, and/or other element of a database is referred to above, unless otherwise indicated, such reference should be interpreted as including a plurality of records, fields, entries, and/or other elements, as appropriate. Such reference should also be interpreted as including a portion of one or more records, fields, entries, and/or other elements, as appropriate. For example, a plurality of “physical” records in a database (i.e. records encoded in the database's structure) may be regarded as one “logical” record for the purpose of the description above and the claims below, even if the plurality of physical records includes information which is excluded from the logical record.
Specific examples of systems, methods and apparatus have been described herein for purposes of illustration. These are only examples. The technology provided herein can be applied to systems other than the example systems described above. Many alterations, modifications, additions, omissions, and permutations are possible within the practice of this invention. This invention includes variations on described embodiments that would be apparent to the skilled addressee, including variations obtained by: replacing features, elements and/or acts with equivalent features, elements and/or acts; mixing and matching of features, elements and/or acts from different embodiments; combining features, elements and/or acts from embodiments as described herein with features, elements and/or acts of other technology; and/or omitting combining features, elements and/or acts from described embodiments.
While a number of exemplary aspects and embodiments have been discussed above, those of skill in the art will recognize certain modifications, permutations, additions and sub-combinations thereof. For example:
Pseudocode illustrating details of steps and methods of particular non-limiting example embodiments is described below:
% After all data files from all runs are read in for input, ΔSASA is a 3D rectangular matrix of size (Nrun×Nchain×Nres)
% Define a new matrix DSASAwindowed, for the fireplots which consists of ΔSASA values for each window position wp, window size ws.
DSASAwindowed(wp,ws)=0 all wp, ws % where wp is window position 1<wp<Nres (in the for loop values are assigned for a subset of these positions), ws is window size 1<ws<wsmax defined below.
% size of DSASAwindowed is Nres×wsmax; loops below don't run from 1:Nres; elements outside the for loop below are never changed from zero.
% Guess a max window size wsmax; typically about 12 amino acids/residues. The max window size will have 0 “hits” in it. I.e. zero successes as defined below. This only means we are ending with a window size that is above the peaks in the fireplots that are produced.
Set fmin=the minimum fraction for success. % This is taken to allow some runs by chance to stochastically not exhibit localized unfolding. Since we typically implement Nruns=10 runs, we have taken this to be 0.9, meaning at least 9 out of 10 runs must result in a localized unfolding “hit”, a localized unfolding “hit” being increased SASA exposure for all residues in the window.
% Build “fireplot” data structure
% Input to the loop below is ΔSASA (res,run,chain), an array of size (Nres×Nrun×Nchain)
for window size ws=1:wsmax % i.e. increment until the window size is wsmax; wsmax can either be the total chain length Nres, or can be a window size that is expected to be larger than any of the contiguous strands that show an increase in surface area; in practice wsmax might be set to 12)
% Use Fireplot data structure to predict candidate epitopes
% Input (from above) would be DSASAwindowedTotal(wp,ws). i.e. the data in the fireplots.
for ws=wmax−1:3% decrease the window size from it's maximum value (e.g. wsmax=11 in
% For
% we repeat the ablation process until and including epitope lengths of 3.
% what results is a set of epitope predictions, of length 3 and higher.
% These epitopes are given in Table 1 and
This application is a national phase entry of PCT/CA2016/051306, filed Nov. 9, 2016, which claims priority from U.S. Provisional patent application Ser. Nos. 62/253,044 filed Nov. 9, 2015; 62/289,893 filed Feb. 1, 2016; 62/309,765 filed Mar. 17, 2016; 62/331,925 filed May 4, 2016; 62/352,346 filed Jun. 20, 2016; 62/363,566 filed Jul. 18, 2016; 62/365,634 filed Jul. 22, 2016; and 62/393,615 filed Sep. 12, 2016; each of these applications being incorporated herein in their entirety by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CA2016/051306 | 11/9/2016 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62393615 | Sep 2016 | US | |
62365634 | Jul 2016 | US | |
62363566 | Jul 2016 | US | |
62352346 | Jun 2016 | US | |
62331925 | May 2016 | US | |
62309765 | Mar 2016 | US | |
62289893 | Feb 2016 | US | |
62253044 | Nov 2015 | US |