The present invention relates to protein crystal structures and their use in identifying protein binding partners and in protein structure determination. In particular, it relates to the crystal structure of a corticotropin-releasing factor receptor 1 (CRF1R) and uses thereof.
The listing or discussion of an apparently prior-published document in this specification should not necessarily be taken as an acknowledgement that the document is part of the state of the art or is common general knowledge.
G protein-coupled receptors (GPCRs) are integral membrane proteins mediating the signalling of a diverse set of ligands including neurotransmitters and metabolites. In humans, there are approximately 370 non-sensory receptors, representing the site of action for ˜30% of clinically used drugs. Activation of the receptor results in a conformational change propagated to the intracellular surface where the receptor interacts with heterotrimeric G proteins to regulate signalling to ion channels and enzyme pathways. GPCRs can also signal independently of G proteins through β-arrestin and are known to exist as dimers.
GPCRs can be classified into three classes (A, B and C) based on sequence similarity (1,2). Class B GPCRs include receptors for peptides such as secretin, glucagon, glucagon-like peptide, calcitonin and parathyroid peptide hormone and have been studied as drug targets in the treatment of various diseases, including diabetes, osteoporosis, depression and anxiety. They feature an N-terminal extracellular domain (ECD) involved in peptide-binding and a seven transmembrane-helices containing transmembrane domain (TMD) involved in signal transduction. Recently determined structures of Class A receptors have greatly advanced our understanding of the function of GPCRs at a molecular level (3). However, structural information on Class B receptors is currently limited to the ECD (4-9) and no structure of a Class B TMD, the main target for small-molecule drugs (10), has been determined to date.
The inventors have now solved the crystal structure of the TMD of the human CRF1R (11), a Class B GPCR essential for the stress-induced activation of the hypothalamic-pituitary-adrenal axis (12, 13), in complex with the non-peptide antagonist CP-376395 (14). The structure reveals significant differences to those of Class A receptors. The extracellular half of the receptor assumes a very open conformation, presumably to allow binding of the large ECD-peptide complex. Furthermore, in contrast to Class A GPCRs where the ligand-binding sites are located close to the extracellular boundaries of the receptors, in CRF1R the antagonist binds in a hydrophobic pocket located deep in the cytoplasmic half of the receptor. This structure provides new insight into the architecture of Class B GPCRs and should aid in the design of novel therapeutics.
The coordinates of the CRF1R can be utilised and manipulated in many different ways with wide ranging applications including the fitting of binding partners, homology modelling and structure solution, analysis of ligand interactions and drug discovery.
Accordingly, a first aspect of the invention provides a method of predicting a three dimensional structural representation of a target protein of unknown structure, or part thereof, comprising:
By a ‘three dimensional structural representation’ we include a computer generated representation or a physical representation. Typically, in all aspects of the invention which feature a structural representation, the representation is computer generated. Computer representations can be generated or displayed by commercially available software programs. Examples of software programs include but are not limited to QUANTA (Accelrys COPYRIGHT, 2001, 2002), O (Jones et al., Acta Crystallogr. A47, pp. 110-119 (1991)), RIBBONS (Carson, J. Appl. Crystallogr., 24, pp. 9589-961 (1991)) and PyMol (The PyMol Molecular Graphics System, Schrödinger, LLC), which are incorporated herein by reference. Examples of representations include any of a wire-frame model, a chicken-wire model, a ball-and-stick model, a space-filling model, a stick model, a ribbon model, a snake model, an arrow and cylinder model, an electron density map or a molecular surface model. Certain software programs may also imbue these three dimensional representations with physico-chemical attributes which are known from the chemical composition of the molecule, such as residue charge, hydrophobicity, torsional and rotational degrees of freedom for the residue or segment, etc. Examples of software programs for calculating chemical energies are described below.
Typically, the coordinates of the CRF1R structure used in the invention are those listed in Table A or Table B or Table C, preferably those listed in Table C. However, it is appreciated that it is not necessary to have recourse to the original coordinates listed in Table A or Table B or Table C, and that any equivalent geometric representation derived from or obtained by reference to the original coordinates may be used.
Thus, for the avoidance of doubt, by ‘the coordinates of the CRF1R structure listed in Table A or Table B or Table C’, we include any equivalent representation wherein the original coordinates have been reparameterised in some way. For example, the coordinates in Table A or Table B or Table C may undergo any mathematical transformation known in the art, such as a geometric transformation, and the resulting transformed coordinates can be used. For example, the coordinates of Table A or Table B or Table C may be transposed to a different origin and/or axes or may be rotated about an axis. Furthermore, it is possible to use the coordinates to calculate the psi and phi backbone torsion angles (as displayed on a Ramachandran plot) and the chi sidechain torsion angles for each residue in the protein. These angles together with the corresponding bond lengths, enable the construction of a geometric representation of the protein which may be used based on the parameters of psi, phi and chi angles and bond lengths. Thus, while the coordinates used are typically those in Table A or Table B or Table C, the inventors recognise that any equivalent geometric representation of the CRF1R structure, based on the coordinates listed in Table A or Table B or Table C, may be used.
Additionally, it is appreciated that changing the number and/or positions of the ligand molecule of the Tables does not generally affect the usefulness of the coordinates in the aspects of the invention. Thus, it is also within the scope of the invention if the number and/or positions of ligand molecules of the coordinates of Table A or Table B or Table C is varied.
It will be appreciated that in all aspects of the invention which utilise the coordinates of the CRF1R, it is not necessary to utilise all the coordinates of Table A or Table B or Table C, but merely a portion of them, e.g. a set of coordinates representing atoms of particular interest in relation to a particular use. Such a portion of coordinates is referred to herein as ‘selected coordinates’.
By ‘selected coordinates’, we include at least 5, 10 or 20 non-hydrogen protein atoms of the Table A or Table B or Table C structure, more preferably at least 50, 100, 200, 300, 400, 500, 600, 700, 800 or 900 atoms and even more preferably at least 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, 3000, 3100, 3200, 3300 or 3400 non-hydrogen atoms. Preferably the selected coordinates pertain to at least 5, 10, 20 or 30 different amino acid residues (i.e. at least one atom from 5, 10, 20 or 30 different residues may be present), more preferably at least 40, 50, 60, 70, 80 or 90 residues, and even more preferably at least 100, 150, 200, 250, 300 or 350 residues. Optionally, the selected coordinates may include one or more ligand atoms as set out in Table A or Table B or Table C. Alternatively, the selected coordinates may exclude one or more atoms of the ligand. Similarly, optionally, the selected coordinates may include one or more T4 lysozyme atoms as set out in Table A or Table B or Table C. Alternatively, the selected coordinates may exclude one or more T4 lysozyme atoms. Thus, it will be appreciated that the selected coordinates may include one or more ligand atoms and, optionally one or more T4 lysozyme atoms.
In one example, the selected coordinates may comprise atoms of one or more amino acid residues that contribute to the main chain or side chain atoms of a binding region of the CRF1R.
For example, amino acid residues contributing to a small organic molecule binding pocket include amino acid residues Leu 158, Phe 162, His 199, Asn 202, Phe203, Phe 204, Trp205, Met 206, Phe 207, Gly 208, Glu 209, Gly 210, Cys211, Leu 213, His 214, Met 276, Val 279 Leu 280, Leu 281, Ile 282, Asn 283, Phe 284, Ile 285, Phe 286, Leu 287, Phe 288, Ile 290, Ala 312, Ala 315, Thr 316, Leu 317, Leu 319, Leu 320, Pro 321, Leu323, Gly 324, Ile 325, Tyr 327, Gln 355, Val 359 and Phe 362 according to the numbering of the CRF1R sequence as set out in
In a further example, the selected coordinates may be from amino acid residues contributing to the peptide orthosteric binding site including amino acid acid residues Ala119, Asn123, His127, Ser130, Phe162, Arg165, Asn166, Thr168, Thr169, Val172, Gln173, Thr175, Met176, His181, Val191, Thr192, Tyr195, Asn196, His199, Asn202, Phe203, Lys257, Ala260, Lys262, Tyr272, Gln273, Met276, Leu323, Thr326, Tyr327, Ala330, Phe331, Asn333, Asp337, Arg341, Phe344, Ile345, Asn348, Glu352, Ser353 and Gln355 according to the numbering of the CRF1R sequence as set out in
In a further example, the selected coordinates may comprise atoms of one or more amino acids involved in activation. For example, biochemical data suggests interaction of His 155 (2.50 Class B residue) and Glu 209 (3.50 Class B residue) play an essential role in activation (26-28). In the present structure, these two side chains are within hydrogen bonding distance (3.1 Å), forming a potentially important functional micro-switch. Thus, the selected coordinates may comprise atoms of one or both of amino acid residues His 155 (2.50) and Glu 209 (3.50) according to the numbering of the CRF1R as set out in
The reference residues for CRF1R are:
In a further example, the selected coordinates may comprise atoms of amino acid residues belonging to the GWG×P motif found in TM4. Thus, the selected coordinates may comprise atoms of one or more of amino acid residues Gly 235, Trp 236, Gly 237 and Pro 239, according to the numbering of the CRF1R as set out in
It is appreciated that the selected coordinates may comprise any atoms of particular interest including atoms mentioned in any one or more of the above examples, or as listed in Example 1 below.
It is appreciated that the selected coordinates may correspond to atoms from a particular structural region (e.g. helix and/or loop) of the CRF1R. By the helices and loop regions of the CRF1R we mean the following:
However, it will be appreciated that there are different criteria for which residues are considered to be in a helical conformation depending on phi and psi angles. Moreover, when comparing the CRF1R to other structures, some residues may be missing in one or other of the structures and some residues may be considered helical in one structure but not the other. Further, the loop regions may be defined as amino acid structures that join alpha helices (as above) or may be defined as amino acid structures that are predicted to be outside of the membrane. Therefore the limits above are not to be construed as absolute, but rather may vary according to the criteria used. For the purposes of the comparisons set out below, we have used the definitions of helices and loops noted in Table 4 in Example 1.
Preferably, the selected coordinates include at least 2% or 5% C-α atoms, and more preferably at least 10% C-α atoms. Alternatively or additionally, the selected coordinates include at least 10% and more preferably at least 20% or 30% backbone atoms selected from any combination of the nitrogen, C-α, carbonyl C and carbonyl oxygen atoms.
It is appreciated that the coordinates of the CRF1R used in the invention may be optionally varied and a subset of the coordinates or the varied coordinates may be selected (and constitute selected coordinates). Indeed, such variation may be necessary in various aspects of the invention, for example in the modelling of protein structures and in the fitting of various binding partners to the CRF1R structure.
Protein structure variability and similarity is routinely expressed and measured by the root mean square deviation (rmsd), which measures the difference in positioning in space between two sets of atoms. The rmsd measures distance between equivalent atoms after their optimal superposition. The rmsd can be calculated over any sets of selected atoms including all atoms, over residue backbone atoms (i.e. the nitrogen-carbon-carbon backbone atoms of the protein amino acid residues), side chain atoms only or over C-α atoms only.
The least-squares algorithms used to calculate rmsd are well known in the art and include those described by Rossman and Argos (J Biol Chem, (1975) 250:7525), Kabsch (Acta Cryst (1976) A92:922; Acta Cryst (1978) A34:827-828), Hendrickson (Acta Cryst (1979) A35: 158), McLachan (J Mol Biol (1979) 128:49) and Kearsley (Acta Cryst (1989) A45:208). Both algorithms based on iteration in which one molecule is moved relative to the other, such as that described by Ferro and Hermans (Acta Cryst (1977) A33:345-347), and algorithms which locate the best fit directly (e.g. Kabsch's methods) may be used. Methods of comparing proteins structures are also discussed in Methods of Enzymology, vol 115: 397-420.
Typically, rmsd values are calculated using coordinate fitting computer programs and any suitable computer program known in the art may be used, for example MNYFIT (part of a collection of programs called COMPOSER, Sutcliffe et al (1987) Protein Eng 1:377-384). Other programs also include LSQMAN (Kleywegt & Jones (1994) A super position, CCP4/ESF-EACBM, Newsletter on Protein Crystallography, 31: 9-14), LSQKAB (Collaborative Computational Project 4. The CCP4 Suite: Programs for Protein Crystallography, Acta Cryst (1994) D50:760-763), QUANTA (Jones et al, Acta Cryst (1991) A47:110-119 and commercially available from Accelrys, San Diego, Calif.), Insight (Commercially available from Accelrys, San Diego, Calif.), Sybyl® (commercially available from Tripos, Inc., St Louis) and O (Jones et al., Acta Cryst (1991) A47:110-119).
In, for example, the programs LSQKAB and O, the user can define the residues in the two proteins that are to be paired for the purpose of the calculation. Alternatively, the pairing of residues can be determined by generating a sequence alignment of the two proteins as is well known in the art. The atomic coordinates can then be superimposed according to this alignment and an rmsd value calculated. The program Sequoia (Bruns et al (1999) J Mol Biol 288(3):427-439) performs the alignment of homologous protein sequences, and the superposition of homologous protein atomic coordinates. Once aligned, the rmsd can be calculated using programs detailed above. When the sequences are identical or highly similar, the structural alignment of proteins can be done manually or automatically as outlined above. Another approach would be to generate a superposition of protein atomic coordinates without considering the sequence.
We have conducted an rmsd analysis of residue backbone atoms (i.e. the nitrogen-carbon-carbon-oxygen backbone atoms of the protein) between the CRF1R structure and various known Class A GPCR structures (see Example 2). Similar scripts can be used to calculate rmsd values for any other selected coordinates. Rmsd values have been calculated on residue backbone atoms in the complete crystallised structure and on selected regions of interest as discussed below.
The Class A GPCR that had a structure most closely related to the present CRF1R structure was dopamine D3 receptor (PDB: 3PBL) (see Example 2). Conducting an rmsd analysis of residue backbone atoms between the whole of the present CRF1R (Table C) structure and the dopamine D3 receptor structure gave an rmsd value of 4.383 Å. The same analysis using the structure of CRF1R in Tables A or B in the alignment (233 and 223 corresponding amino acids respectively) gave respective rmsd values of 4.074 Å and 3.360 Å. Thus in one embodiment, the coordinates or selected coordinates of Table A or Table B or Table C may be optionally varied within an rmsd of residue backbone atoms (i.e. the nitrogen-carbon-carbon-oxygen backbone atoms of the protein) of not more than 4.383 Å. Preferably, the coordinates or selected coordinates are varied within an rmsd of residue backbone atoms of not more than 4.3 Å, 4.2 Å, 4.1 Å, 4.0 Å, 3.9 Å, 3.8 Å, 3.7 Å, 3.6 Å, 3.5 Å, 3.4 Å, 3.3 Å, 3.2 Å, 3.1 Å, 3.0 Å, 2.9 Å, 2.8 Å, 2.7 Å, 2.6 Å, 2.5 Å, 2.4 Å, 2.3 Å, 2.2 Å, 2.1 Å, 2.0 Å, 1.9 Å, 1.8 Å, 1.7 Å, 1.6 Å, 1.5 Å, 1.4 Å, 1.3 Å, 1.2 Å, 1.1 Å, 1.0 Å, 0.9 Å, 0.8 Å, 0.7 Å, 0.6 Å, 0.5 Å, 0.4 Å, 0.3 Å, 0.2 Å or 0.1 Å. When the coordinates or selected coordinates are from Table A, it is preferred if they are optionally varied within an rmsd of residue backbone atoms of not more than 4.074 Å, and when the coordinates or selected coordinates are from Table B, it is preferred if they are optionally varied within an rmsd of residue backbone atoms of not more than 3.360 Å.
We have conducted an rmsd analysis of residue backbone atoms between the present CRF1R structure (Table C) and the dopamine D3 receptor structure within the small organic molecule binding pocket (i.e. amino acid residues Leu 158, Phe 162, His 199, Asn 202, Phe203, Phe 204, Trp205, Met 206, Phe 207, Gly 208, Glu 209, Gly 210, Cys211, Leu 213, His 214, Met 276, Val 279 Leu 280, Leu 281, Ile 282, Asn 283, Phe 284, Ile 285, Phe 286, Leu 287, Phe 288, Ile 290, Ala 312, Ala 315, Thr 316, Leu 317, Leu 319, Leu 320, Pro 321, Leu323, Gly 324, Ile 325, Tyr 327, Gln 355, Val 359 and Phe 362). The rmsd value for residue backbone atoms is 1.676 Å. A similar analysis using the CRF1R structure of Table A or Table B gave respective rmsd values of 1.642 Å and 1.655 Å. Thus in an embodiment, where the coordinates or selected coordinates used in the invention are optionally varied within the small organic molecule binding pocket, they are varied within an rmsd of residue backbone atoms of not more than 1.6 Å, 1.5 Å, 1.4 Å, 1.3 Å, 1.2 Å, 1.1 Å, 1.0 Å, 0.9 Å, 0.8 Å, 0.7 Å, 0.6 Å, 0.5 Å, 0.4 Å, 0.3 Å, 0.2 Å or 0.1 Å.
We have conducted an rmsd analysis of residue backbone atoms between the present CRF1R structure (Table C) and the dopamine D3 receptor structure within the peptide orthosteric binding site (i.e. amino acid residues Ala119, Asn123, His127, Ser130, Phe162, Arg165, Asn166, Thr168, Thr169, Val172, Gln173, Thr175, Met176, His181, Val191, Thr192, Tyr195, Asn196, His199, Asn202, Phe203, Lys257, Ala260, Lys262, Tyr272, Gln273, Met276, Leu323, Thr326, Tyr327, Ala330, Phe331, Asn333, Asp337, Arg341, Phe344, Ile345, Asn348, Glu352, Ser353 and Gln355). The rmsd value for residue backbone atoms is 4.242 Å. Thus in an embodiment, where the coordinates or selected coordinates used in the invention are optionally varied within the peptide orthosteric binding site, they are varied within an rmsd of residue backbone atoms of not more than 4.2 Å, 4.1 Å, 4.0 Å, 3.9 Å, 3.8 Å, 3.7 Å, 3.6 Å, 3.5 Å, 3.4 Å, 3.3 Å, 3.2 Å, 3.1 Å, 3.0 Å, 2.9 Å, 2.8 Å, 2.7 Å, 2.6 Å, 2.5 Å, 2.4 Å, 2.3 Å, 2.2 Å, 2.1 Å, 2.0 Å, 1.9 Å, 1.8 Å, 1.7 Å, 1.6 Å, 1.5 Å, 1.4 Å, 1.3 Å, 1.2 Å, 1.1 Å, 1.0 Å, 0.9 Å, 0.8 Å, 0.7 Å, 0.6 Å, 0.5 Å, 0.4 Å, 0.3 Å, 0.2 Å or 0.1 Å.
We have conducted an rmsd analysis of residue backbone atoms between the present CRF1R structure (Table C) and the dopamine D3 receptor structure within amino acids His 155 (2.50) and Glu 209 (3.50) which play an essential role in activation (26-28). The rmsd value for residue backbone atoms is 0.570 Å. The same analysis using the CFR1F structures of Table A or B gave rmsd values of 0.274 Å and 0.426 Å respectively. Thus in an embodiment, where the coordinates or selected coordinates used in the invention are optionally varied within amino acids His 155 (2.50) and Glu 209 (3.50), they are varied within an rmsd of residue backbone atoms of not more than 0.50 Å, 0.45 Å, 0.40 Å, 0.35 Å, 0.30 Å, 0.25 Å, 0.20 Å, 0.15 Å or 0.10 Å. When the coordinates, or selected coordinates are from Table A, it is preferred when they are optionally varied within amino acids His 155 (2.50) and Glu 209 (3.50) that they are optionally varied within an rmsd of residue backbone atoms of not more than 0.274 Å. When the coordinates, or selected coordinates are from Table B, it is preferred when they are optionally varied within amino acids His 155 (2.50) and Glu 209 (3.50) that they are optionally varied within an rmsd of residue backbone atoms of not more than 0.426 Å.
We have conducted an rmsd analysis of residue backbone atoms between the present CRF1R structure (Table C) and the dopamine D3 receptor structure within GWG×P motif found in TM4 (i.e. amino acid residues Gly 235, Trp 236, Gly 237 and Pro 239, according to the numbering of the CRF1R as set out in
We have conducted an rmsd analysis of residue backbone atoms between the present CRF1R structure (Table C) and various Class A GPCR structures within the common transmembrane region (i.e. amino acid residues 119-143, 150-176, 186-218, 227-247, 269-294, 312-332 and 343-365 according to the numbering of the CRF1R as set out in
We have conducted an rmsd analysis of residue backbone atoms between the present CRF1R structure (Table C) and various Class A GPCR structures within transmembrane 1 (TM1) (i.e. amino acid residues 119-143 according to the numbering of the CRF1R as set out in
We have conducted an rmsd analysis of residue backbone atoms between the present CRF1R structure (Table C) and various Class A GPCR structures within transmembrane 2 (TM2) (i.e. amino acid residues 150-176 according to the numbering of the CRF1R as set out in
We have conducted an rmsd analysis of residue backbone atoms between the present CRF1R structure (Table C) and various Class A GPCR structures within transmembrane 3 (TM3) (i.e. amino acid residues 186-218 according to the numbering of the CRF1R as set out in
We have conducted an rmsd analysis of residue backbone atoms between the present CRF1R structure (Table C) and various Class A GPCR structures within transmembrane 4 (TM4) (i.e. amino acid residues 227-247 according to the numbering of the CRF1R as set out in
We have conducted an rmsd analysis of residue backbone atoms between the present CRF1R structure (Table C) and various Class A GPCR structures within transmembrane 5 (TM5) (i.e. amino acid residues 269-294 according to the numbering of the CRF1R as set out in
We have conducted an rmsd analysis of residue backbone atoms between the present CRF1R structure (Table C) and various Class A GPCR structures within transmembrane 6 (TM6) (i.e. amino acid residues 312-332 according to the numbering of the CRF1R as set out in
We have conducted an rmsd analysis of residue backbone atoms between the present CRF1R structure (Table C) and various Class A GPCR structures within transmembrane 7 (TM7) (i.e. amino acid residues 343-365 according to the numbering of the CRF1R as set out in
In this aspect of the invention, the coordinates of the CRF1R structure are used to predict a three dimensional representation of a target protein of unknown structure, or part thereof, by modelling. By “modelling”, we mean the prediction of structures using computer-assisted or other de novo prediction of structure, based upon manipulation of the coordinate data from Table A or Table B or Table C, or selected coordinates thereof.
The target protein may be any protein that shares sufficient sequence identity to the human CRF1R such that its structure can be modelled by using the CRF1R coordinates of Table A or Table B or Table C. It will be appreciated that if a structural representation of only a part of the target protein is being modelled, for example a particular domain, the target protein only has to share sufficient sequence identity to the CRF1R over that part.
It has been shown for soluble protein domains that their three dimensional structure is broadly conserved above 20% amino acid sequence identity and well conserved above 30% identity, with the level of structural conservation increasing as amino acid sequence identity increases up to 100% (Ginalski, K. Curr Op Struc Biol (2006) 16, 172-177). Thus, it is preferred if the target protein, or part thereof, shares at least 20% amino acid sequence identity with the human CRF1R sequence provided in
It will be appreciated therefore that the target protein may be an CRF1R analogue or homologue.
Analogues are defined as proteins with similar three-dimensional structures and/or functions with little evidence of a common ancestor at a sequence level.
Homologues are proteins with evidence of a common ancestor, i.e. likely to be the result of evolutionary divergence and are divided into remote, medium and close sub-divisions based on the degree (usually expressed as a percentage) of sequence identity.
By a human CRF1R homologue, we include a protein with at least 20%, 25%, 30%, 35%, 40%, 45% or at least 50% amino acid sequence identity with the sequence of CRF1R provided in
Sequence identity may be measured by the use of algorithms such as BLAST or PSI-BLAST (Altschul et al, NAR (1997), 25, 3389-3402) or methods based on Hidden Markov Models (Eddy S et al, J Comput Biol (1995) Spring 2 (1) 9-23). Typically, the percent sequence identity between two polypeptides may be determined using any suitable computer program, for example the GAP program of the University of Wisconsin Genetic Computing Group and it will be appreciated that percent identity is calculated in relation to polypeptides whose sequence has been aligned optimally. The alignment may alternatively be carried out using the Clustal W program (Thompson et al., 1994). The parameters used may be as follows: Fast pairwise alignment parameters: K-tuple(word) size; 1, window size; 5, gap penalty; 3, number of top diagonals; 5. Scoring method: x percent. Multiple alignment parameters: gap open penalty; 10, gap extension penalty; 0.05. Scoring matrix: BLOSUM.
In one embodiment the target protein is an integral membrane protein. By “integral membrane protein” we mean a protein that is normally integrated into the membrane and can only be removed using detergents, non-polar solvents or denaturing agents that physically disrupt the lipid bilayer. Examples include receptors such as GPCRs, the T-cell receptor complex and growth factor receptors; transmembrane ion channels such as ligand-gated and voltage gated channels; transmembrane transporters such as neurotransmitter transporters; enzymes; carrier proteins; and ion pumps.
The amino acid sequences (and the nucleotide sequences of the cDNAs which encode them) of many membrane proteins are readily available, for example by reference to GenBank. For example, Foord et al supra gives the human gene symbols and human, mouse and rat gene IDs from Entrez Gene (http://www.ncbi.nlm.nih.gov/entrez) for GPCRs. It should be noted, also, that because the sequence of the human genome is substantially complete, the amino acid sequences of human membrane proteins can be deduced therefrom.
In a preferred embodiment, the target protein is a GPCR. GPCRs are well known in the art and include those listed in Hopkins & Groom supra. In addition, the International Union of Pharmacology produce a list of GPCRs (Foord et al (2005) Pharmacol. Rev. 57, 279-288, incorporated herein by reference and this list is periodically updated at http://www.iuphar-db.org/GPCR/ReceptorFamiliesForward). It will be noted that GPCRs are divided into different classes, principally based on their amino acid sequence similarities. They are also divided into families by reference to the natural ligands to which they bind. All GPCRs are included in the scope of the invention and their structure may be modelled by using the coordinates of the CRF1R. CRF1R is a Class B GPCR (sometimes known as Class 2 or Family B GPCRs which terms are used interchangeably).
In a particularly preferred embodiment, the target protein is a Class B GPCR, including a Class B GPCR in the secretin class such as any of glucagon-like peptide 1 receptor (GLP1R), glucagon-like peptide 2 receptor (GLP2R), calcitonin receptor (CT), amylin/CGRP receptor (AMY1α), amylin receptor (AMY2α), amylin/CGRP receptor (AMY3α), CGRP/adrenomedullin receptor (CGRP1α), adrenomedullin/CGRP receptor (AM1α), adrenomedullin/CGRP receptor (AM2α receptor), corticotropin releasing factor receptor (CRF1), urocortins receptor (CRF2), growth hormone releasing hormone receptor (GHRH), gastric inhibitory polypeptide receptor (GIP), glucagon receptor, secretin receptor, TIP-39 receptor (PTH2), parathyroid hormone receptor (PTH1), VIP/PACAP receptor (VPAC1), PACAP receptor (PAC2), and VIP/PACAP receptor (VPAC2). Alternatively, the target protein is a Class B GPCR in the adhesion class such as any of Brain-specific angiogenesis inhibitor 1 (BAI1), Brain-specific angiogenesis inhibitor 2 (BAI2), Brain-specific angiogenesis inhibitor 1 (BAI3), CD97, Cadherin EGF LAG seven-pass G-type receptor 1 (CELSR1), Cadherin EGF LAG seven-pass G-type receptor 2 (CELSR2), Cadherin EGF LAG seven-pass G-type receptor 3 (CELSR3), EGF latrophilin seven transmembrane domain containing 1 (ELTD1), EGF-like module receptor 1 (EMR1), EGF-like module receptor 2 (EMR2), EGF-like module receptor 3 (EMR3), EGF-like module-containing mucin-like hormone receptor-like 4 (EMR4P), G protein coupled receptor 56 (GPR56), G protein coupled receptor 64 (GPR64), G protein coupled receptor 97 (GPR97), G protein coupled receptor 98 (GPR98), G protein coupled receptors from 110 to 116 (GPR110-116), G protein coupled receptors from 123 to 126 (GPR123-126), G protein coupled receptor 128 (GPR128), G protein coupled receptor 133 (GPR133), G protein coupled receptor 144 (GPR144), G protein coupled receptor 157 (GPR157) and Latrophilin 1 to 3 (LPHN1-3).
Although the target protein may be derived from any source, it is particularly preferred if it is from a eukaryotic source. It is particularly preferred if it is derived from a vertebrate source such as a mammal. It is particularly preferred if the target protein is derived from rat, mouse, rabbit or dog or non-human primate or man.
Typically, modelling a structural representation of a target is done by homology modelling whereby homologous regions between the CRF1R and the target protein are matched and the coordinate data of the CRF1R used to predict a structural representation of the target protein.
The term “homologous regions” describes amino acid residues in two sequences that are identical or have similar (e.g. aliphatic, aromatic, polar, negatively charged, or positively charged) side-chain chemical groups. Identical and similar residues in homologous regions are sometimes described as being respectively “invariant” and “conserved” by those skilled in the art.
Typically, the method involves comparing the amino acid sequences of CRF1R with a target protein by aligning the amino acid sequences. Amino acids in the sequences are then compared and groups of amino acids that are homologous (conveniently referred to as “corresponding regions”) are grouped together. This method detects conserved regions of the polypeptides and accounts for amino acid insertions or deletions.
Homology between amino acid sequences can be determined using commercially available algorithms known in the art. For example, the programs BLAST, gapped BLAST, BLASTN, PSI-BLAST, BLAST 2 and WU-BLAST (provided by the National Center for Biotechnology Information) can be used to align homologous regions of two, or more, amino acid sequences. These may be used with default parameters to determine the degree of homology between the amino acid sequence of the CRF1R and other target proteins which are to be modelled.
Preferred for use according to the present invention is the WU-BLAST (Washington University BLAST) version 2.0 software. WU-BLAST version 2.0 executable programs for several UNIX platforms can be downloaded from ftp://blast. wustl. edu/blast/executables. This program is based on WU-BLAST version 1.4, which in turn is based on the public domain NCBI-BLAST version 1.4 (Altschul and Gish, 1996, Local alignment statistics, Doolittle ed., Methods in Enzymology 266: 460-480; Altschul et al., 1990, Basic local alignment search tool, Journal of Molecular Biology 215: 403-410; Gish and States, 1993, Identification of protein coding regions by database similarity search, Nature Genetics 3: 266-272; Karlin and Altschul, 1993, Applications and statistics for multiple high-scoring segments in molecular sequences, Proc. Natl. Acad. Sci. USA 90: 5873-5877; all of which are incorporated by reference herein).
In all search programs in the suite the gapped alignment routines are integral to the database search itself. Gapping can be turned off if desired. The default penalty (Q) for a gap of length one is Q=9 for proteins and BLASTP, and Q=10 for BLASTN, but may be changed to any integer. The default per-residue penalty for extending a gap (R) is R=2 for proteins and BLASTP, and R=10 for BLASTN, but may be changed to any integer. Any combination of values for Q and R can be used in order to align sequences so as to maximize overlap and identity while minimizing sequence gaps. The default amino acid comparison matrix is BLOSUM62, but other amino acid comparison matrices such as PAM can be utilized.
Once the amino acid sequences of CRF1R and the target protein of unknown structure have been aligned, the structures of the conserved amino acids in the structural representation of the CRF1R may be transferred to the corresponding amino acids of the target protein. For example, a tyrosine in the amino acid sequence of CRF1R may be replaced by a phenylalanine, the corresponding homologous amino acid in the amino acid sequence of the target protein.
The structures of amino acids located in non-conserved regions may be assigned manually by using standard peptide geometries or by molecular simulation techniques, such as molecular dynamics. The final step in the process is accomplished by refining the entire structure using molecular dynamics and/or energy minimization. Typically, the predicted three dimensional structural representation will be one in which favourable interactions are formed within the target protein and/or so that a low energy conformation is formed (“High resolution structure prediction and the crystallographic phase problem” Qian et al (2007) Nature 450; 259-264; “State of the art in studying protein folding and protein structure production using molecular dynamics methods” Lee et al (2001) J of Mol Graph & Modelling 19(1): 146-149).
Whereas it is preferred to base homology modelling on homologous amino acid sequences, it is appreciated that some proteins have low sequence identity (e.g. Class B and C GPCRs) and at the same time are very similar in structure. Therefore, where at least part of the structure of the target protein is known, homologous regions can also be identified by comparing structures directly.
Homology modelling as such is a technique well known in the art (see e.g. Greer, (Science, Vol. 228, (1985), 1055), and Blundell et al (Eur. J. Biochem, Vol. 172, (1988), 513)). The techniques described in these references, as well as other homology modelling techniques generally available in the art, may be used in performing the present invention.
Typically, homology modelling is performed using computer programs, for example SWISS-MODEL available through the Swiss Institute for Bioinformatics in Geneva, Switzerland; WHATIF available on EMBL servers; Schnare et al. (1996) J. Mol. Biol, 256: 701-719; Blundell et al. (1987) Nature 326: 347-352; Fetrow and Bryant (1993) Bio/Technology 11:479-484; Greer (1991) Methods in Enzymology 202: 239-252; and Johnson et al (1994) Crit. Rev. Biochem. Mol Biol. 29:1-68. An example of homology modelling is described in Szklarz G. D (1997) Life Sci. 61: 2507-2520.
Thus, in an embodiment of the first aspect of the invention, the method further comprises aligning the amino acid sequence of the target protein of unknown structure with the amino acid sequence of CRF1R listed in
The invention therefore provides a method of predicting a three dimensional structural representation of a target protein of unknown structure, or part thereof, comprising:
The coordinate data of Table A or Table B or Table C, or selected coordinates thereof, will be particularly advantageous for homology modelling of other GPCRs. For example, since the protein sequence of CRF1R and another GPCR can be aligned relative to each other, it is possible to predict structural representations of the structures of other GPCRs, particularly in the regions of the transmembrane helices and ligand binding region, using the CRF1R coordinates.
The coordinate data of the CRF1R can also be used to predict the structure of target proteins where X-ray diffraction data or NMR spectroscopic data of the protein has been generated and requires interpretation in order to provide a structure.
A second aspect of the invention provides a method of predicting the three dimensional structural representation of a target protein of unknown structure, or part thereof, comprising: providing the coordinates of the human corticotropin-releasing factor receptor-1 (CRF1R) structure listed in Table A, Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof; and either (a) positioning the coordinates in the crystal unit cell of the protein so as to predict its structural representation, or (b) manipulating the coordinates to assign, or account for, peaks in NMR spectra.
Thus, where X-ray crystallographic or NMR spectroscopic data is provided for a target protein of unknown structure, the coordinate data of Table A or Table B or Table C may be used to interpret that data to predict a likely structure using techniques well known in the art including phasing, in the case of X-ray crystallography, and assisting peak assignments in the case of NMR spectra.
A three dimensional structural representation of any part of any target protein that is sufficiently similar to any portion of the CRF1R can be predicted by this method. Typically, the target protein or part thereof has at least 20% amino acid sequence identity with any portion of CRF1R, such as at least 30% amino acid sequence identity or at least 40% or 50% or 60% or 70% or 80% or 90% sequence identity. For example, the coordinates may be used to predict the three-dimensional representations of other crystal forms of CRF1R, other CRF1R receptors, CRF1R mutants or co-complexes of a CRF1R receptor. Other suitable target proteins are as defined with respect to the first aspect of the invention.
One method that may be employed for these purposes is molecular replacement which is well known in the art and described, for example, in Evans & McCoy (Acta Cryst, 2008, D64:1-10), McCoy (Acta Cryst, 2007, D63:32-42) and McCoy et al (J of App Cryst, 2007, 40:658-674). Molecular replacement enables the solution of the crystallographic phase problem by providing initial estimates of the phases of the new structure from a previously known structure, as opposed to the other major methods for solving the phase problem, i.e. experimental methods (which measure the phase from isomorphous or anomalous differences) or direct methods (which use mathematical relationships between reflection triplets and quartets to bootstrap a phase set for all reflections from phases for a small or random ‘seed’ set of reflections.) Compared to molecular replacement, such methods are time consuming and generally hinder the solution of crystal structures. Thus molecular replacement provides an accurate structural form for an unknown crystal more quickly and efficiently than attempting to determine such information ab initio.
Accordingly, the invention involves generating a preliminary model of a target protein whose structure coordinates are unknown, by orienting and positioning the relevant portion of the CRF1R according to Table A or Table B or Table C within the unit cell of a crystal of the target protein so as best to account for the observed X-ray diffraction pattern of the crystal of the target protein. Phases can be calculated from this model and combined with the observed X-ray diffraction pattern amplitudes to generate an electron density map of the target protein's structure. This, in turn, can be subjected to any well-known model building and structure refinement techniques to provide a final, accurate structural representation of the target protein (E. Lattman, “Use of the Rotation and Translation Functions”, in Meth. Enzymol., 115, pp. 55-77 (1985); M. G. Rossmann, ed., “The Molecular Replacement Method”, Int. Sci. Rev. Ser., No. 13, Gordon & Breach, New York (1972)).
Thus the invention includes a method of predicting a three dimensional structural representation of a target protein of unknown structure, or part thereof, comprising: providing the coordinates of the CRF1R structure, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof; providing an X-ray diffraction pattern of the target protein; and using the coordinates to predict at least part of the structure coordinates of the target protein.
In an embodiment, the X-ray diffraction pattern of the target protein is provided by crystallising the target protein unknown structure; and generating an X-ray diffraction pattern from the crystallised target protein. Thus, the invention also provides a method of predicting a three dimensional structural representation of a target protein of unknown structure comprising the steps of (a) crystallising the target protein; (b) generating an X-ray diffraction pattern from the crystallised target protein; (c) applying the coordinates of the CRF1R structure, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof, to the X-ray diffraction pattern to generate a three-dimensional electron density map of the target protein, or part thereof; and (d) predicting a three dimensional structural representation of the target protein from the three-dimensional electron density map.
Examples of computer programs known in the art for performing molecular replacement include CNX (Brunger A T.; Adams P. D.; Rice L. M., Current Opinion in Structural Biology, Volume 8, Issue 5, October 1998, Pages 606-611 (also commercially available from Accelrys San Diego, Calif.), MOLREP (A. Vagin, A. Teplyakov, MOLREP: an automated program for molecular replacement, J Appl Cryst (1997) 30, 1022-1025, part of the CCP4 suite), AMoRe (Navaza, J. (1994). AMoRe: an automated package for molecular replacement. Acta Cryst A50, 157-163), or PHASER (part of the CCP4 suite).
Preferred selected coordinates of the CRF1R are as defined above with respect to the first aspect of the invention.
The invention may also be used to assign peaks of NMR spectra of target proteins, by manipulation of the data of Table A or Table B or Table C (J Magn Reson (2002) 157(1): 119-23).
The coordinates of the CRF1R structure, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof may be used in the provision, design, modification or analysis of binding partners of CRF1R. Such a use will be important in drug design.
By CRF1R we mean any CRF1R which has at least 75% sequence identity with human CRF1R as well as CRF1R receptors from other species and mutants thereof. Preferably, the CRF1R has at least 80% amino acid sequence identity to human CRF1R, and more preferably at least 85%, 90%, 95% or 99% amino acid sequence identity.
By “binding partner” we mean any molecule that binds to a CRF1R. Preferably, the molecule binds selectively to the CRF1R. For example, it is preferred if the binding partner has a Kd value (dissociation constant) which is at least five or ten times lower (i.e. higher affinity) than for at least one other GPCR, and preferably more than 100 or 500 times lower. More preferably, the binding partner of a CRF1R has a Kd value more than 1000 or 5000 times lower than for at least one other GPCR. However, it will be appreciated that the limits will vary dependent upon the nature of the binding partner.
Thus, typically, for small molecule binding partners, the binding partner typically has a Kd value which is at least 10 times or 50 times or 100 times lower than for at least one other GPCR. Typically, for antibody binding partners, the binding partner typically has a Kd value which is at least 500 or 1000 times lower than for at least one other GPCR.
Kd values can be determined readily using methods well known in the art and as described, for example, below.
At equilibrium Kd=[R][L]/[RL]
where the terms in brackets represent the concentration of
In order to determine the Kd the value of these terms must be known. Since the concentration of receptor is not usually known then the Hill-Langmuir equation is used where Fractional occupancy=[L]/[L]+Kd.
In order to experimentally determine a Kd then, the concentration of free ligand and bound ligand at equilibrium must be known. Typically, this can be done by using a radio-labelled or fluorescently labelled ligand which is incubated with the receptor (present in whole cells or homogenised membranes) until equilibrium is reached. The amount of free ligand vs bound ligand must then be determined by separating the signal from bound vs free ligand. In the case of a radioligand this can be done by centrifugation or filtration to separate bound ligand present on whole cells or membranes from free ligand in solution. Alternatively a scintillation proximity assay is used. In this assay the receptor (in membranes) is bound to a bead containing scintillant and a signal is only detected by the proximity of the radioligand bound to the receptor immobilised on the bead.
The binding partner may be any of a polypeptide; an anticalin; a peptide; an antibody; a chimeric antibody; a single chain antibody; an aptamer; a darpin; a Fab, F(ab′)2, Fv, ScFv or dAb antibody fragment; a small molecule; a natural product; an affibody; a peptidomimetic; a nucleic acid; a peptide nucleic acid molecule; a lipid; a carbohydrate; a protein based on a modular framework including ankyrin repeat proteins, armadillo repeat proteins, leucine rich proteins, tetrariopeptide repeat proteins or Designed Ankyrin Repeat Proteins (DARPins); a protein based on lipocalin or fibronectin domains or Affilin scaffolds based on either human gamma crystalline or human ubiquitin; a G protein; an RGS protein; an arrestin; a GPCR kinase; a receptor tyrosine kinase; a RAMP; a NSF; a GPCR; an NMDA receptor subunit NR1 or NR2a; calcyon; or a fragment or derivative thereof that binds to CRF1R. Typically, the binding partner is a small molecule.
It will be appreciated that the coordinates of the invention will also be useful in the analysis of solvent and ion interactions with a CRF1R, which are important factors in drug design. Thus the binding partner may be a solvent molecule, for example water or acetonitrile, or an ion, for example a sodium ion or a protein.
It is particularly preferred if the binding partner is a small molecule with a molecule weight less than 5000 daltons, for example less than 4000, 3000, 2000 or 1000 daltons, or with a molecule weight less than 500 daltons, for example less than 450 daltons, 400 daltons, 350 daltons, 300 daltons, 250 daltons, 200 daltons, 150 daltons, 100 daltons, 50 daltons or 10 daltons.
It is further preferred if the binding partner causes a change (i.e a modulation) in the level of biological activity of the CRF1R, i.e. it has functional agonist or antagonist activity, and therefore may have the potential to be a candidate drug. Thus, the binding partner may be any of a full agonist, a partial agonist, an inverse agonist or an antagonist of CRF1R. The binding partner may bind to the orthosteric site or it may bind to an allosteric binding site. It is also appreciated that the binding partner may be one that modulates the ability of the CRF1R to dimerise. For example, the binding partner may bind to the dimerisation interface or bind to another region of the CRF1R which nevertheless modulates dimerisation.
Accordingly, a third aspect of the invention provides a method for selecting or designing one or more binding partners of CRF1R comprising using molecular modelling means to select or design one or more binding partners of the CRF1R, wherein the three-dimensional structural representation of at least part of the human CRF1R, as defined by coordinates of the CRF1R, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof, is compared with a three-dimensional structural representation of one or more candidate binding partners, and one or more binding partners that are predicted to interact with CRF1R are selected.
In order to provide a three-dimensional structural representation of a candidate binding partner, the binding partner structural representation may be modelled in three dimensions using commercially available software for this purpose or, if its crystal structure is available, the coordinates of the structure may be used to provide a structural representation of the binding partner.
The design of binding partners that bind to a CRF1R generally involves consideration of two factors.
First, the binding partner must be capable of physically and structurally associating with parts or all of a CRF1R binding region (e.g. orthosteric binding site or an allosteric binding site). Non-covalent molecular interactions important in this association include hydrogen bonding, van der Waals interactions, hydrophobic interactions and electrostatic interactions.
Second, the binding partner must be able to assume a conformation that allows it to associate with a CRF1R binding region directly. Although certain portions of the binding partner will not directly participate in these associations, those portions of the binding partner may still influence the overall conformation of the molecule. This, in turn, may have a significant impact on potency. Such conformational requirements include the overall three-dimensional structure and orientation of the binding partner in relation to all or a portion of the binding region, or the spacing between functional groups of a binding partner comprising several binding partners that directly interact with the CRF1R. This is particularly relevant where the binding partner is a protein.
Thus it will be appreciated that selected coordinates which represent a binding region of the CRF1R, e.g. atoms from amino acid residues contributing to the small organic molecule binding pocket including amino acid residues Leu 158, Phe 162, His 199, Asn 202, Phe203, Phe 204, Trp205, Met 206, Phe 207, Gly 208, Glu 209, Gly 210, Cys211, Leu 213, His 214, Met 276, Val 279 Leu 280, Leu 281, Ile 282, Asn 283, Phe 284, Ile 285, Phe 286, Leu 287, Phe 288, Ile 290, Ala 312, Ala 315, Thr 316, Leu 317, Leu 319, Leu 320, Pro 321, Leu323, Gly 324, Ile 325, Tyr 327, Gln 355, Val 359 and Phe 362, may be used, or atoms from amino acid residues contributing to the peptide orthosteric binding site including amino acid residues Ala119, Asn123, His127, Ser130, Phe162, Arg165, Asn166, Thr168, Thr169, Val172, Gln173, Thr175, Met176, His181, Val191, Thr192, Tyr195, Asn196, His199, Asn202, Phe203, Lys257, Ala260, Lys262, Tyr272, Gln273, Met276, Leu323, Thr326, Tyr327, Ala330, Phe331, Asn333, Asp337, Arg341, Phe344, Ile345, Asn348, Glu352, Ser353 and Gln355, may be used. Selected coordinates representing an extracellular face would be useful to select or design for binding partners such as antibodies, and selected coordinates representing an intracellular face would be useful to select or design for agents which modulate (e.g. prevent) binding to natural binding partners such as G proteins. Additional preferences for the selected coordinates are as defined above with respect to the first aspect of the invention. Preferably, the selected coordinates comprise one or more atoms from any one or more (eg at least 2, 3, 4, 5, 6 or 7) of amino acids Phe 203, Met 206, Gly 210, Asn 283, Thr 316, Leu 323 and Tyr 327, according to the numbering of the CRF1R sequence as set out in
Designing of binding partners can generally be achieved in two ways, either by the step wise assembly of a binding partner or by the de novo synthesis of a binding partner. As is described in more detail below, binding partners can also be identified by virtual screening.
With respect to the step-wise assembly of a binding partner, several methods may be used. Typically the process begins by visual inspection of, for example, any of the binding regions on a computer representation of the CRF1R as defined by the coordinates in Table A or Table B or Table C optionally varied within a rmsd of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof. Selected binding partners, or fragments or moieties thereof may then be positioned in a variety of orientations, or docked, within the binding region. Docking may be accomplished using software such as QUANTA and Sybyl (Tripos Associates, St. Louis, Mo.), followed by, or performed simultaneously with, energy minimization, rigid-body minimization (Gshwend, supra) and molecular dynamics with standard molecular mechanics force fields, such as CHARMM and AMBER.
Specialized computer programs may also assist in the process of selecting binding partners or fragments or moieties thereof, as are known in the art and as detailed in WO2008/068534 incorporated herein by reference.
Once suitable binding partners or fragments have been selected, they may be assembled into a single compound or complex. Assembly may be preceded by visual inspection of the relationship of the fragments to each other on the three-dimensional image displayed on a computer screen in relation to the structure coordinates of the CRF1R. This would be followed by manual model building using software such as QUANTA or Sybyl. Useful programs known in the art (see, for example WO2008/068534 incorporated herein by reference) may aid connecting the individual chemical entities or fragments.
Thus the invention includes a method of designing a binding partner of a CRF1R comprising the steps of: (a) providing a structural representation of a CRF1R binding region as defined by the coordinates of the human CRF1R of Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å or selected coordinates thereof; (b) using computational means to dock a three dimensional structural representation of a first binding partner in part of the binding region; (c) docking at least a second binding partner in another part of the binding region; (d) quantifying the interaction energy between the first or second binding partner and part of the binding region; (e) repeating steps (b) to (d) with another first and second binding partner, selecting a first and a second binding partner based on the quantified interaction energy of all of said first and second binding partners; (f) optionally, visually inspecting the relationship of the first and second binding partner to each other in relation to the binding region; and (g) assembling the first and second binding partners into a one binding partner that interacts with the binding region by model building.
As an alternative to the step-wise assembly of binding partners, binding partners may be designed as a whole or “de novo” using either an empty binding region or optionally including some portion(s) of a known binding partner(s). There are many de novo ligand design methods including: 1. LUDI (H.-J. Bohm, “The Computer Program LUDI: A New Method for the De Novo Design of Enzyme Inhibitors”, J. Comp. Aid. Molec. Design, 6, pp. 61-78 (1992)). LUDI is available from Molecular Simulations Incorporated, San Diego, Calif.; 2. LEGEND (Y. Nishibata et al., Tetrahedron, 47, p. 8985 (1991)). LEGEND is available from Molecular Simulations Incorporated, San Diego, Calif.; 3. LeapFrog (available from Tripos Associates, St. Louis, Mo.); and 4. SPROUT (V. Gillet et al., “SPROUT: A Program for Structure Generation)”, J. Comput. Aided Mol. Design, 7, pp. 127-153 (1993)). SPROUT is available from the University of Leeds, UK.
Other molecular modelling techniques may also be employed in accordance with this invention (see, e.g., N. C. Cohen et al., “Molecular Modeling Software and Methods for Medicinal Chemistry, J. Med. Chem., 33, pp. 883-894 (1990); see also, M. A. Navia and M. A. Murcko, “The Use of Structural Information in Drug Design”, Current Opinions in Structural Biology, 2, pp. 202-210 (1992); L. M. Balbes et al., “A Perspective of Modern Methods in Computer-Aided Drug Design”, in Reviews in Computational Chemistry, Vol. 5, K. B. Lipkowitz and D. B. Boyd, Eds., VCH, New York, pp. 337-380 (1994); see also, W. C. Guida, “Software For Structure-Based Drug Design”, Curr. Opin. Struct. Biology, 4, pp. 777-781 (1994)).
In addition to the methods described above in relation to the design of binding partners, other computer-based methods are available to select for binding partners that interact with CRF1R.
For example the invention involves the computational screening of small molecule databases for binding partners that can bind in whole, or in part, to the CRF1R. In this screening, the quality of fit of such binding partners to a binding region of a CRF1R as defined by the coordinates of the human CRF1R of Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å or selected coordinates thereof, may be judged either by shape complementarity or by estimated interaction energy (E. C. Meng et al., J. Comp. Chem., 13, pp. 505-524 (1992)).
For example, selection may involve using a computer for selecting an orientation of a binding partner with a favourable shape complementarity in a binding region comprising the steps of: (a) providing the coordinates of the human CRF1R of Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å or selected coordinates thereof and a three-dimensional structural representation of one or more candidate binding partners; (b) employing computational means to dock a first binding partner in the binding region; (c) quantitating the contact score of the binding partner in different orientations; and (d) selecting an orientation with the highest contact score.
The docking may be facilitated by the contact score. The method may further comprise the step of generating a three-dimensional structural representation of the binding region and binding partner bound therein prior to step (b).
The method may further comprise the steps of: (e) repeating steps (b) through (d) with a second binding partner; and (f) selecting at least one of the first or second binding partner that has a higher contact score based on the quantitated contact score of the first or second binding partner.
In another embodiment, selection may involve using a computer for selecting an orientation of a binding partner that interacts favourably with a binding region comprising; a) providing the coordinates of the human CRF1R of Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å or selected coordinates thereof; b) employing computational means to dock a first binding partner in the binding region; c) quantitating the interaction energy between the binding partner and all or part of a binding region for different orientations of the binding partner; and d) selecting the orientation of the binding partner with the most favorable interaction energy.
The docking may be facilitated by the quantitated interaction energy and energy minimization with or without molecular dynamics simulations may be performed simultaneously with or following step (b).
The method may further comprise the steps of: (e) repeating steps (b) through (d) with a second binding partner; and (f) selecting at least one of the first or second binding partner that interacts more favourably with a binding region based on the quantitated interaction energy of the first or second binding partner.
In another embodiment, selection may involve screening a binding partner to associate with an energy of binding of less than −7 kcal/mol with an CRF1R binding region comprising: (a) providing the coordinates of the human CRF1R of Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å or selected coordinates thereof and employing computational means which utilise coordinates to dock the binding partner into a binding region; (b) quantifying the deformation energy of binding between the binding partner and the binding region; and (d) selecting a binding partner that associates with a CRF1R binding region with an energy of binding of less than −7 kcal/mol.
A fourth aspect of the invention provides a method for selecting or designing one or more binding partners of a CRF1R having a binding pocket in the position structurally equivalent to a binding pocket of human CRF1R that is defined by residues including (a) Leu 158, Phe 162, His 199, Asn 202, Phe203, Phe 204, Trp205, Met 206, Phe 207, Gly 208, Glu 209, Gly 210, Cys211, Leu 213, His 214, Met 276, Val 279 Leu 280, Leu 281, Ile 282, Asn 283, Phe 284, Ile 285, Phe 286, Leu 287, Phe 288, Ile 290, Ala 312, Ala 315, Thr 316, Leu 317, Leu 319, Leu 320, Pro 321, Leu323, Gly 324, Ile 325, Tyr 327, Gln 355, Val 359 and Phe 362 of human CRF1R or (b) Ala119, Asn123, His127, Ser130, Phe162, Arg165, Asn166, Thr168, Thr169, Val172, Gln173, Thr175, Met176, His181, Val191, Thr192, Tyr195, Asn196, His199, Asn202, Phe203, Lys257, Ala260, Lys262, Tyr272, Gln273, Met276, Leu323, Thr326, Tyr327, Ala330, Phe331, Asn333, Asp337, Arg341, Phe344, Ile345, Asn348, Glu352, Ser353 and Gln355 of human CRF1R, the method comprising the step of using molecular modelling means to select or design one or more binding partners that are predicted to interact with the said CRF1R, wherein a three-dimensional structural representation of one or more candidate binding partners are compared with a three-dimensional structural representation of the said binding pocket, and one or more candidate binding partners that are predicted to interact with the said binding pocket, are selected or designed.
Preferably, the binding partner selected is one that is able to interact with at least one of amino acids that define the said binding pockets such as at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 or all 41 of said amino acid residues.
By a CRF1R having a binding pocket in the position structurally equivalent to the defined binding pocket of human CRF1R, we include the meaning of a protein identifiable as that of a CRF1R, and further having a predicted or determined three-dimensional structure that includes a binding pocket defined by (a) Leu 158, Phe 162, His 199, Asn 202, Phe203, Phe 204, Trp205, Met 206, Phe 207, Gly 208, Glu 209, Gly 210, Cys211, Leu 213, His 214, Met 276, Val 279 Leu 280, Leu 281, Ile 282, Asn 283, Phe 284, Ile 285, Phe 286, Leu 287, Phe 288, Ile 290, Ala 312, Ala 315, Thr 316, Leu 317, Leu 319, Leu 320, Pro 321, Leu323, Gly 324, Ile 325, Tyr 327, Gln 355, Val 359 and Phe 362 according to the numbering of the human CRF1R in
It will be appreciated that the three-dimensional structural representations of the defined binding pockets may be any suitable three-dimensional structural representation. For example, it may be a three-dimensional structural representation represented by the coordinates of the CRF1R structure in Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof. It is preferred if the selected coordinates are from one or more amino acid residues that define a binding region of CRF1R including those mentioned above. Alternatively, the three-dimensional structural representations of the defined binding pockets may be a three-dimensional structural representation modelled on such coordinates.
The structural representation may then be compared with structural representations of one or more candidate binding partners and those binding partners that are predicted to interact with the binding pocket are selected.
Any suitable molecular modelling means may be employed in this selection, including those outlined above.
It is appreciated that in some instances high throughput screening of binding partners is preferred and that methods of the invention may be used as “library screening” methods, a term well known to those skilled in the art. Thus, the binding partner may be a library of binding partners. For example, the library may be a peptide or protein library produced, for example, by ribosome display or an antibody library prepared either in vivo, ex vivo or in vitro. Methodologies for preparing and screening such libraries are known in the art.
Determination of the three-dimensional structure of the CRF1R provides important information about the binding sites of CRF1R receptors, particularly when comparisons are made with other GPCRs including corticotropin factor receptors. This information may then be used for rational design and modification of CRF1R binding partners, e.g. by computational techniques which identify possible binding ligands for the binding sites, by enabling linked-fragment approaches to drug design, and by enabling the identification and location of bound ligands using X-ray crystallographic analysis. These techniques are discussed in more detail below.
Thus as a result of the determination of the CRF1R three-dimensional structure, more purely computational techniques for rational drug design may also be used to design structures whose interaction with CRF1R is better understood (for an overview of these techniques see e.g. Walters et al (Drug Discovery Today, Vol. 3, No. 4, (1998), 160-178; Abagyan, R.; Totrov, M. Curr. Opin. Chem. Biol. 2001, 5, 375-382). For example, automated ligand-receptor docking programs (discussed e.g. by Jones et al. in Current Opinion in Biotechnology, Vol. 6, (1995), 652-656 and Halperin, I.; Ma, B.; Wolfson, H.; Nussinov, R. Proteins 2002, 47, 409-443), which require accurate information on the atomic coordinates of target receptors may be used.
The aspects of the invention described herein which utilize the CRF1R structure in silico may be equally applied to both the human CRF1R structure of Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof; and, by predicting the three-dimensional structural representation of a target protein, or part thereof, by modelling the structural representation on all or the selected coordinates of the CRF1R or selected coordinates thereof, to the models of target proteins obtained by the first and second aspects of the invention. Thus having determined a conformation of a target protein, for example an CRF1R, by the methods described above, such a conformation may be used in a computer-based method of rational drug design as described herein. In addition, the availability of the structure of the CRF1R will allow the generation of highly predictive pharmacophore models for virtual library screening or ligand design.
Accordingly, a fifth aspect of the invention provides a method for the analysis of the interaction of one or more binding partners with CRF1R, comprising: providing a three dimensional structural representation of CRF1R as defined by the coordinates of the human CRF1R structure of Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof; providing a three dimensional structural representation of one or more binding partners to be fitted to the structural representation of CRF1R or selected coordinates thereof; and fitting the one of more binding partners to said structure.
This method of the invention is generally applicable for the analysis of known binding partners of CRF1R, the development or discovery of binding partners of CRF1R, the modification of binding partners of CRF1R e.g. to improve or modify one or more of their properties, and the like. Moreover, the methods of the invention are useful in identifying binding partners that are selective for CRF1R receptors over other GPCRs (including other corticotropin factor receptors). For example, comparing corresponding binding regions between CRF1R receptors and other GPCRs will facilitate the design of CRF1R specific binding partners.
It will be desirable to model a sufficient number of atoms of the CRF1R as defined by the coordinates of Table A or Table B or Table C optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof, which represent a binding region, e.g. atoms from amino acid residues contributing to the small organic molecule binding pocket including amino acid residues Leu 158, Phe 162, His 199, Asn 202, Phe203, Phe 204, Trp205, Met 206, Phe 207, Gly 208, Glu 209, Gly 210, Cys211, Leu 213, His 214, Met 276, Val 279 Leu 280, Leu 281, Ile 282, Asn 283, Phe 284, Ile 285, Phe 286, Leu 287, Phe 288, Ile 290, Ala 312, Ala 315, Thr 316, Leu 317, Leu 319, Leu 320, Pro 321, Leu323, Gly 324, Ile 325, Tyr 327, Gln 355, Val 359 and Phe 362, or atoms from amino acid residues contributing to the peptide orthosteric binding site including amino acid residues Ala119, Asn123, His127, Ser130, Phe162, Arg165, Asn166, Thr168, Thr169, Val172, Gln173, Thr175, Met176, His181, Val191, Thr192, Tyr195, Asn196, His199, Asn202, Phe203, Lys257, Ala260, Lys262, Tyr272, Gln273, Met276, Leu323, Thr326, Tyr327, Ala330, Phe331, Asn333, Asp337, Arg341, Phe344, Ile345, Asn348, Glu352, Ser353 and Gln355. Although every different binding partner bound by CRF1R may interact with different parts of a binding region of the protein, the structure of the CRF1R allows the identification of a number of particular sites which are likely to be involved in many of the interactions of CRF1R with a drug candidate. Additional preferred selected coordinates are as described as above with respect to the first aspect of the invention.
In order to provide a three-dimensional structural representation of a binding partner to be fitted to the CRF1R structure, the binding partner structural representation may be modelled in three dimensions using commercially available software for this purpose or, if its crystal structure is available, the coordinates of the structure may be used to provide a structural representation of the binding partner for fitting to the CRF1R structure of the invention.
By “fitting”, is meant determining by automatic, or semi-automatic means, interactions between one or more atoms of a candidate binding partner and at least one atom of the CRF1R structure of the invention, and calculating the extent to which such interactions are stable. Interactions include attraction and repulsion, brought about by charge, steric, lipophilic, considerations and the like. Charge and steric interactions of this type can be modelled computationally. An example of such computation would be via a force field such as Amber (Cornell et a/. A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules, Journal of the American Chemical Society, (1995), 117(19), 5179-97) which would assign partial charges to atoms on the protein and binding partner and evaluate the electrostatic interaction energy between a protein and binding partner atom using the Coulomb potential. The Amber force field would also assign van der Waals energy terms to assess the attractive and repulsive steric interactions between two atoms. Lipophilic interactions can be modeled using a variety of means. For example the ChemScore function (Eldridge M D; Murray C W; Auton T R; Paolini G V; Mee R P Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of binding partners in receptor complexes, Journal of computer-aided molecular design (1997 September), 11 (5), 425-45) assigns protein and binding partner atoms as hydrophobic or polar, and a favourable energy term is specified for the interaction between two hydrophobic atoms. Other methods of assessing the hydrophobic contributions to ligand binding are available and these would be known to one skilled in the art. Other methods of assessing interactions are available and would be known to one skilled in the art of designing molecules. Various computer-based methods for fitting are described further herein.
More specifically, the interaction of a binding partner with the CRF1R structure of the invention can be examined through the use of computer modelling using a docking program such as GOLD (Jones et al., J. Mol. Biol., 245, 43-53 (1995), Jones et al., J. Mol. Biol., 267, 727-748 (1997)), GRAMM (Vakser, I. A., Proteins, Suppl., 1:226-230 (1997)), DOCK (Kuntz et al, (1982) J. Mol. Biol., 161, 269-288; Makino et al, (1997) J. Comput. Chem., 18, 1812-1825), AUTODOCK (Goodsell et al, (1990) Proteins, 8, 195-202, Morris et al, (1998) J. Comput. Chem., 19, 1639-1662.), Glide (Friesner et al (2004) J. Med. Chem. 47, 1739-1749), FlexX, (Rarey et al, (1996) J. Mol. Biol., 261, 470-489) or ICM (Abagyan et al, (1994) J. Comput. Chem., 15, 488-506). This procedure can include computer fitting of binding partners to the CRF1R structure to ascertain how well the shape and the chemical structure of the binding partner will bind to a CRF1R.
Thus the invention includes a method for the analysis of the interaction of one or more binding partners with CRF1R comprising (a) constructing a computer representation of a binding region of the CRF1R as defined by the coordinates of the human CRF1R of Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å or selected coordinates thereof (b) selecting a binding partner to be evaluated by a method selected from the group consisting of assembling said binding partner; selecting a binding partner from a small molecule database; de novo ligand design of the binding partner; and modifying a known agonist or inhibitor, or a portion thereof, of a CRF1R or homologue thereof; (c) employing computational means to dock said binding partner to be evaluated in a binding region in order to provide an energy-minimized configuration of the binding partner in a binding region; and (d) evaluating the results of said docking to quantify the interaction energy between said binding partner and the binding region.
Also computer-assisted, manual examination of the binding region structure of the CRF1R may be performed. The use of programs such as GRID (Goodford, (1985) J. Med. Chem., 28, 849-857)—a program that determines probable interaction sites between molecules with various functional groups and an enzyme surface—may also be used to analyse a binding region to predict, for example, the types of modifications which will alter the rate of metabolism of a binding partner.
Computer programs can be employed to estimate the attraction, repulsion, and steric hindrance of the CRF1R structure and a binding partner.
Further modelling software that may be used in the context of the invention include MOE (Molecular Operating Environment; Chemical Computing Group Inc., 1010 Sherbooke St. West, Suite #910, Montreal, QC, Canada, H3A 2R7), Maestro (Schrödinger, LLC, New York, N.Y., 2012), and Discovery Studio (Accelrys Software Inc., Discovery Studio Modeling Environment, Release 3.5, San Diego: Accelrys Software Inc., 2012).
If more than one CRF1R binding region is characterized and a plurality of respective smaller molecular fragments are designed or selected, a binding partner may be formed by linking the respective small molecular fragments into a single binding partner, which maintains the relative positions and orientations of the respective small molecular fragments at the binding sites. The single larger binding partner may be formed as a real molecule or by computer modelling. Detailed structural information can then be obtained about the binding of the binding partner to CRF1R, and in the light of this information adjustments can be made to the structure or functionality of the binding partner, e.g. to alter its interaction with CRF1R. The above steps may be repeated and re-repeated as necessary.
Thus, the three dimensional structural representation of the one or more binding partners of the third, fourth and fifth aspects of the invention may be obtained by: providing structural representations of a plurality of molecular fragments; fitting the structural representation of each of the molecular fragments to the coordinates of the human CRF1R structural representation of Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof; and assembling the representations of the molecular fragments into one or more representations of single molecules to provide the three-dimensional structural representation of one or more candidate binding partners.
Typically the binding partner or molecule fragment is fitted to at least 5 or 10 non-hydrogen atoms of the CRF1R structure, preferably at least 20, 30, 40, 50, 60, 70, 80 or 90 non-hydrogen atoms and more preferably at least 100, 150, 200, 250, 300, 350, 400, 450, or 500 atoms and even more preferably at least 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, 3000, 3100, 3200, 3300 or 3400 non-hydrogen atoms.
The invention includes screening methods to identify drugs or lead compounds of use in treating a disease or condition. For example, large numbers of binding partners, for example in a chemical database, can be screened for their ability to bind to CRF1R.
It is appreciated that in the methods described herein, which may be drug screening methods, a term well known to those skilled in the art, the binding partner may be a drug-like compound or lead compound for the development of a drug-like compound.
The term “drug-like compound” is well known to those skilled in the art, and may include the meaning of a compound that has characteristics that may make it suitable for use in medicine, for example as the active ingredient in a medicament. Thus, for example, a drug-like compound may be a molecule that may be synthesised by the techniques of organic chemistry, less preferably by techniques of molecular biology or biochemistry, and is preferably a small molecule, which may be of less than 5000 daltons (such as less than 500 daltons) and which may be water-soluble. A drug-like compound may additionally exhibit features of selective interaction with a particular protein or proteins and be bioavailable and/or able to penetrate target cellular membranes or the blood:brain barrier, but it will be appreciated that these features are not essential.
The term “lead compound” is similarly well known to those skilled in the art, and may include the meaning that the compound, whilst not itself suitable for use as a drug (for example because it is only weakly potent against its intended target, non-selective in its action, unstable, poorly soluble, difficult to synthesise or has poor bioavailability) may provide a starting-point for the design of other compounds that may have more desirable characteristics.
Thus in one embodiment of the methods of third, fourth and fifth aspects of the invention, the methods further comprise modifying the structural representation of the binding partner so as to increase or decrease their interaction with CRF1R.
For example, once a binding partner has been designed or selected by the above methods, the efficiency with which that binding partner may bind to a CRF1R may be tested and optimised, for example by computational evaluation. For example, a binding partner designed or selected as binding to a CRF1R may be further computationally optimised so that in its bound state it would preferably lack repulsive electrostatic interaction with the target CRF1R and with the surrounding water molecules. Such non-complementary electrostatic interactions include repulsive charge-charge, dipole-dipole and charge-dipole interactions.
Furthermore, it is often desired that binding partners demonstrate a relatively small difference in energy between the bound and free states (i.e., a small deformation energy of binding). Thus, binding partners may be designed with a deformation energy of binding of not greater than about 10 kcal/mole, more preferably, not greater than 7 kcal/mole. Binding partners may interact with the binding region in more than one conformation that is similar in overall binding energy. In those cases, the deformation energy of binding is taken to be the difference between the energy of the free binding partner and the average energy of the conformations observed when the binding partner binds to the protein.
Specific computer software is available in the art to evaluate compound deformation energy and electrostatic interactions as detailed in WO2008/068534 (see, for example, page 34) incorporated herein by reference.
By modifying the structural representation we include, for example, adding molecular scaffolding, adding or varying functional groups, or connecting the molecule with other molecules (e.g. using a fragment linking approach) such that the chemical structure of the binding partner is changed while its original binding to CRF1R capability is increased or decreased. Such optimisation is regularly undertaken during drug development programmes to e.g. enhance potency, promote pharmacological acceptability, increase chemical stability etc. of lead compounds.
Examples of modifications include substitutions or removal of groups containing residues which interact with the amino acid side chain groups of the CRF1R structure of the invention, as described further in relation to the 6-adrenergic receptor in WO2008/068534 (see for example, page 35), incorporated herein by reference.
The potential binding effect of a binding partner on CRF1R may be analysed prior to its actual synthesis and testing by the use of computer modeling techniques. If the theoretical structure of the given entity suggests insufficient interaction and association between it and the CRF1R, testing of the entity is obviated. However, if computer modelling indicates a strong interaction, the molecule may then be synthesized and tested for its ability to bind to a CRF1R. In this manner, synthesis of inoperative compounds may be avoided.
Thus in a further embodiment of the third, fourth and fifth aspects of the invention, the methods further comprise the steps of obtaining or synthesising the one or more binding partners of a CRF1R; and optionally contacting the one or more binding partners with a CRF1R to determine the ability of the one or more binding partners to interact with the CRF1R.
Various methods known in the art may be used to determine binding between a CRF1R and a binding partner including those described in WO2008/068534 (see for example, pages 35-36) incorporated herein by reference.
Once computer modelling has indicated that a binding partner has a strong interaction, it is appreciated that it may be desirable to crystallise a complex of the CRF1R with that binding partner and analyse its interaction further by X-ray crystallography.
Thus in a further embodiment of the third, fourth and fifth aspects of the invention, the methods further comprise the steps of obtaining or synthesising the one or more binding partners of a CRF1R; forming one or more complexes of the CRF1R and the one or more binding partners; and analysing the one or more complexes by X-ray crystallography to determine the ability of the one or more binding partners to interact with CRF1R.
Thus, it will be appreciated that another particularly useful drug design technique enabled by this invention is iterative drug design. Iterative drug design is a method for optimizing associations between a protein and a binding partner by determining and evaluating the three-dimensional structures of successive sets of protein/compound complexes, and is described further in WO2008/068534 (see, for example, pages 36-37), incorporated herein by reference.
The ability of a binding partner to modify CRF1R function may also be tested. For example the ability of a binding partner to modulate a CRF1R function could be tested by a number of well known standard methods, described extensively in the prior art.
In addition to in silico analysis and design, the interaction of one or more binding partners with a CRF1R may be analysed directly by X-ray crystallography experiments, wherein the coordinates of the human CRF1R of Table A or Table B or Table C optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof, are used to analyse a crystal complex of the CRF1R receptor and binding partner. This can provide high resolution information of the interaction and can also provide insights into a mechanism by which a binding partner exerts an agonistic or antagonistic function.
Accordingly, a sixth aspect of the invention provides a method for the analysis of the interaction of one or more binding partners with CRF1R, comprising: obtaining or synthesising one or more binding partners; forming one or more crystallised complexes of a CRF1R and a binding partner; and analysing the one or more complexes by X-ray crystallography by employing the coordinates of the human CRF1R structure, of Table A or Table B or Table C optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof, to determine the ability of the one or more binding partners to interact with the CRF1R.
Preferences for the selected coordinates in this and all subsequent aspects of the invention are as defined above with respect to the first aspect of the invention.
The analysis of such structures may employ X-ray crystallographic diffraction data from the complex and the coordinates of the human CRF1R structure, of Table A or Table B or Table C optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof, to generate a difference Fourier electron density map of the complex. The difference Fourier electron density map may then be analysed.
In one embodiment, the one or more crystallised complexes are formed by soaking a crystal of CRF1R with the binding partner to form a complex. Alternatively, the complexes may be obtained by cocrystallising the CRF1R with the binding partner. For example a purified CRF1R protein sample is incubated over a period of time (usually >1 hr) with a potential binding partner and the complex can then be screened for crystallization conditions. Alternatively, protein crystals containing a first binding partner can be back-soaked to remove this binding partner by placing the crystals into a stabilising solution in which the binding partner is not present. The resultant crystals can then be transferred into a second solution containing a second binding partner and used to produce an X-ray diffraction pattern of CRF1R complexed with the second binding partner.
The complexes can be analysed using X-ray diffraction methods, e.g. according to the approach described by Greer et al., (J of Medicinal Chemistry, Vol. 37, (1994), 1035-1054), and difference Fourier electron density maps can be calculated based on X-ray diffraction patterns of soaked or co-crystallized CRF1R and the solved structure of uncomplexed CRF1R. This is described further in WO2008/068534 (see, for example, pages 38-39), incorporated herein by reference.
This information may thus be used to optimise known classes of CRF1R binding partners and to design and synthesize novel classes of CRF1R binding partners, particularly those which have agonistic or antagonistic properties, and to design drugs with modified CRF1R interactions.
In one approach, the structure of a binding partner bound to a CRF1R may be determined by experiment. This will provide a starting point in the analysis of the binding partner bound to CRF1R thus providing those of skill in the art with a detailed insight as to how that particular binding partner interacts with CRF1R and the mechanism by which it exerts any function effect.
Many of the techniques and approaches applied to structure-based drug design described above rely at some stage on X-ray analysis to identify the binding position of a binding partner in a ligand-protein complex. A common way of doing this is to perform X-ray crystallography on the complex, produce a difference Fourier electron density map, and associate a particular pattern of electron density with the binding partner. However, in order to produce the map (as explained e.g. by Blundell et al., in Protein Crystallography, Academic Press, New York, London and San Francisco, (1976)), it is necessary to know beforehand the protein three dimensional structure (or at least a set of structure factors for the protein crystal). Therefore, determination of the CRF1R structure also allows difference Fourier electron density maps of CRF1R-binding partner complexes to be produced, determination of the binding position of the binding partner and hence may greatly assist the process of rational drug design.
Accordingly, a seventh aspect of the invention provides a method of predicting the three dimensional structure of a binding partner of unknown structure, or part thereof, which binds to CRF1R, comprising: providing the coordinates of the human CRF1R structure, listed in Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof; providing an X-ray diffraction pattern of CRF1R complexed with the binding partner; and using said coordinates to predict at least part of the structure coordinates of the binding partner.
In one embodiment, the X-ray diffraction pattern is obtained from a crystal formed by soaking a crystal of CRF1R with the binding partner to form a complex. Alternatively, the X-ray diffraction pattern is obtained from a crystal formed by cocrystallising the CRF1R with the binding partner as described above. Alternatively, protein crystals containing a first binding partner can be back-soaked to remove this binding partner and the resultant crystals transferred into a second solution containing a second binding partner as described above.
A mixture of compounds may be soaked or co-crystallized with a CRF1R crystal, wherein only one or some of the compounds may be expected to bind to the CRF1R. The mixture of compounds may comprise a ligand known to bind to CRF1R. As well as the structure of the complex, the identity of the complexing compound(s) is/are then determined.
Preferably, the methods of the previous aspects of the invention are computer-based. For example, typically the methods of the previous aspects of the invention make use of the computer systems and computer-readable storage mediums of the tenth and eleventh aspects of the invention.
An eighth aspect of the invention provides a method for producing a binding partner of CRF1R comprising: identifying a binding partner according to the third, fourth, fifth, sixth or seventh aspects of the invention and synthesising the binding partner.
The binding partner may be synthesised using any suitable technique known in the art including, for example, the techniques of synthetic chemistry, organic chemistry and molecular biology.
It will be appreciated that it may be desirable to test the binding partner in an in vivo or in vitro biological system in order to determine its binding and/or activity and/or its effectiveness. For example, its binding to a CRF1R may be assessed using any suitable binding assay known in the art including the examples described above. Alternatively, is ability to modulate the CRF1R's ability to form dimers may be assessed.
Moreover, its effect on CRF1R function in an in vivo or in vitro assay may be tested. For example, the effect of the binding partner on the CRF1R signalling pathway may be determined. For example, the activity may be measured by using a reporter polynucleotide to measure the activity of the CRF1R signalling pathway. By a reporter polynucleotide we include genes which encode a reporter protein whose activity may easily be assayed, for example β-galactosidase, chloramphenicol acetyl transferase (CAT) gene, luciferase or Green Fluorescent Protein (see, for example, Tan et al, 1996 EMBO J 15(17): 4629-42). Several techniques are available in the art to detect and measure expression of a reporter polynucleotide which would be suitable for use in the present invention. Many of these are available in kits both for determining expression in vitro and in vivo. Alternatively, signalling may be assayed by the analysis of downstream targets. For example, a particular protein whose expression is known to be under the control of a specific signalling pathway may be quantified. Protein levels in biological samples can be determined using any suitable method known in the art. For example, protein concentration can be studied by a range of antibody based methods including immunoassays, such as ELISAs, western blotting and radioimmunoassays.
A ninth aspect of the invention provides a binding partner produced by the method of the eighth aspect of the invention.
Following identification of a binding partner, it may be manufactured and/or used in the preparation, i.e. manufacture or formulation, of a composition such as a medicament, pharmaceutical composition or drug. These may be administered to individuals.
Accordingly, the invention includes a method for producing a medicament, pharmaceutical composition or drug, the process comprising: (a) providing a binding partner according to the eighth aspect of the invention and (b) preparing a medicament, pharmaceutical composition or drug containing the binding partner.
The medicaments may be used to treat any disorder or condition ameliorated by modulation of the CRF1R. Examples include anxiety, depression, schizophrenia, stress related disorders, post-operative ileus, Alzheimer's disease, insomnia, eating disorders such as anorexia, panic disorder, cardiovascular disease including heart failure, kidney disease, Cusing's Disease, disease of the immune system including psoriasis, asthma, rheumatoid arthritis, inflammatory bowel disease, stroke and migraine.
The invention also provides systems, particularly a computer system, intended to generate structures and/or perform optimisation of binding partner which interact with CRF1R, CRF1R homologues or analogues, complexes of CRF1R with binding partners, or complexes of CRF1R homologues or analogues with binding partners.
Accordingly, a tenth aspect of the invention provides a computer system, intended to generate three dimensional structural representations of CRF1R, CRF1R homologues or analogues, complexes of CRF1R with binding partners, or complexes of CRF1R homologues or analogues with binding partners, or, to analyse or optimise binding of binding partners to said CRF1R or homologues or analogues, or complexes thereof, the system containing computer-readable data comprising one or more of:
For example the computer system may comprise: (i) a computer-readable data storage medium comprising data storage material encoded with the computer-readable data; (ii) a working memory for storing instructions for processing said computer-readable data; and (iii) a central-processing unit coupled to said working memory and to said computer-readable data storage medium for processing said computer-readable data and thereby generating structures and/or performing rational drug design. The computer system may further comprise a display coupled to the central-processing unit for displaying structural representations.
The invention also provides such systems containing atomic coordinate data of target proteins of unknown structure wherein such data has been generated according to the methods of the invention described herein based on the starting data provided in Table A or Table B or Table C optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof.
Such data is useful for a number of purposes, including the generation of structures to analyse the mechanisms of action of binding partners and/or to perform rational drug design of binding partners which interact with CRF1R, such as compounds which are agonists or antagonists.
An eleventh aspect of the invention provides a computer-readable storage medium, comprising a data storage material encoded with computer readable data, wherein the data comprises one or more of:
The invention also includes a computer-readable storage medium comprising a data storage material encoded with a first set of computer-readable data comprising a Fourier transform of at least a portion of the structural coordinates of human CRF1R listed in Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof; which data, when combined with a second set of machine readable data comprising an X-ray diffraction pattern of a molecule or molecular complex of unknown structure, using a machine programmed with the instructions for using said first set of data and said second set of data, can determine at least a portion of the structure coordinates corresponding to the second set of machine readable data.
It will be appreciated the that the computer-readable storage media of the invention may comprise a data storage material encoded with any of the data generated by carrying out any of the methods of the invention relating to structure solution and selection/design of binding partners to CRF1R and drug design.
The invention also includes a method of preparing the computer-readable storage media of the invention comprising encoding a data storage material with the computer-readable data.
As used herein, “computer readable media” refers to any medium or media, which can be read and accessed directly by a computer. Such media include, but are not limited to: magnetic storage media such as floppy discs, hard disc storage medium and magnetic tape; optical storage media such as optical discs or CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media.
By providing such computer readable media, the atomic coordinate data of the invention can be routinely accessed to model CRF1R or selected coordinates thereof. For example, RASMOL (Sayle et al., TIBS, Vol. 20, (1995), 374) is a publicly available computer software package, which allows access and analysis of atomic coordinate data for structure determination and/or rational drug design.
As used herein, “a computer system” refers to the hardware means, software means and data storage means used to analyse the atomic coordinate data of the invention. The minimum hardware means of the computer-based systems of the present invention comprises a central processing unit (CPU), input means, output means and data storage means. Desirably, a monitor is provided to visualize structure data. The data storage means may be RAM or means for accessing computer readable media of the invention. Examples of such systems are microcomputer workstations available from Silicon Graphics Incorporated and Sun Microsystems running Unix based, Windows XP or IBM OS/2 operating systems. Apple and Linux based systems may be used.
A twelfth aspect of the invention provides a method for providing data for generating three dimensional structural representations of CRF1R, CRF1R homologues or analogues, complexes of CRF1R with binding partners, or complexes of CRF1R homologues or analogues with binding partners, or, for analysing or optimising binding of binding partners to said CRF1R or homologues or analogues, or complexes thereof, the method comprising:
The computer-readable data received from said remote device, particularly when in the form of the coordinates of the human CRF1R structure of Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof, may be used in the methods of the invention described herein, e.g. for the analysis of a binding partner structure with a CRF1R structure.
Thus the remote device may comprise e.g. a computer system or computer readable media of one of the previous aspects of the invention. The device may be in a different country or jurisdiction from where the computer-readable data is received.
The communication may be via the internet, intranet, e-mail etc, transmitted through wires or by wireless means such as by terrestrial radio or by satellite. Typically the communication will be electronic in nature, but some or all of the communication pathway may be optical, for example, over optical fibers.
A thirteenth aspect of the invention provides a method of obtaining a three dimensional structural representation of a crystal of a CRF1R, which method comprises providing the coordinates of the human CRF1R structure of Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof, and generating a three-dimensional structural representation of said coordinates.
For example, the structural representation may be a physical representation or a computer generated representation. Examples of representations are described above and include, for example, any of a wire-frame model, a chicken-wire model, a ball-and-stick model, a space-filling model, a stick model, a ribbon model, a snake model, an arrow and cylinder model, an electron density map or a molecular surface model.
Computer representations can be generated or displayed by commercially available software programs including for example QUANTA (Accelrys .COPYRIGHT.2001, 2002), O (Jones et al., Acta Crystallogr. A47, pp. 110-119 (1991)), RIBBONS (Carson, J. Appl. Crystallogr., 24, pp. 9589-961 (1991)) and PyMol (The PyMOL Molecular Graphics System, Schrödinger LLC).
Typically, the computer used to generate the representation comprises (i) a computer-readable data storage medium comprising a data storage material encoded with computer-readable data, wherein said data comprise the coordinates of the human CRF1R structure of Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof; and (ii) instructions for processing the computer-readable data into a three-dimensional structural representation. The computer may further comprise a display for displaying said three-dimensional representation.
A fourteenth aspect of the invention provides a method of predicting one or more sites of interaction of a CRF1R or a homologue thereof, the method comprising: providing the coordinates of the human CRF1R structure of Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof; and analysing said coordinates to predict one or more sites of interaction.
For example, a binding region of a CRF1R for a particular binding partner can be predicted by modelling where the structure of the binding partner is known. Typically, the fitting and docking methods described above would be used. This method may be used, for example, to predict the site of interaction of a G protein of known structure as described in viz Gray J J (2006) Curr Op Struc Biol Vol 16, pp 183-193.
A fifteenth aspect of the invention provides a method for assessing the activation state of a structure for CRF1R, comprising: providing the the coordinates of the human CRF1R structure of Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof; performing a statistical and/or topological analysis of the coordinates; and comparing the results of the analysis with the results of an analysis of coordinates of proteins of known activation states.
For example, protein structures may be compared for similarity by statistical and/or topological analyses (suitable analyses are known in the art and include, for example those described in Grindley et al (1993) J Mol Biol Vol 229: 707-721 and Holm & Sander (1997) Nucl Acids Res Vol 25: 231-234). Highly similar scores would indicate a shared conformational and therefore functional state eg the inactive antagonist state in this case.
One example of statistical analysis is multivariate analysis which is well known in the art and can be done using techniques including principal components analysis, hierarchical cluster analysis, genetic algorithms and neural networks.
By performing a multivariate analysis of the coordinate data of the human CRF1R structure of Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof, and comparing the result of the analysis with the results of the analysis performed on coordinates of proteins with known activation states, it is possible to determine the activation state of the coordinate set analysed. For example, the activation state may be classified as ‘active’ or ‘inactive’.
A sixteenth aspect of the invention provides a method of producing a protein with a binding region that has substrate specificity substantially identical to that of CRF1R, the method comprising
b) identifying the amino acid residues in the target protein that correspond to any one or more of the following positions according to the numbering of the CRF1R as set out in
By “an amino acid residue that corresponds to” we include an amino acid residue that aligns to the given amino acid residue in CRF1R when the CRF1R and target protein are aligned using e.g. MacVector and CLUSTALW.
For example, amino acid residues contributing to the small organic molecule binding pocket of CRF1R include amino acid residues Leu 158, Phe 162, His 199, Asn 202, Phe203, Phe 204, Trp205, Met 206, Phe 207, Gly 208, Glu 209, Gly 210, Cys211, Leu 213, His 214, Met 276, Val 279 Leu 280, Leu 281, Ile 282, Asn 283, Phe 284, Ile 285, Phe 286, Leu 287, Phe 288, Ile 290, Ala 312, Ala 315, Thr 316, Leu 317, Leu 319, Leu 320, Pro 321, Leu323, Gly 324, Ile 325, Tyr 327, Gln 355, Val 359 and Phe 362, and amino acid residues contributing to the peptide orthosteric binding site include Ala119, Asn123, His127, Ser130, Phe162, Arg165, Asn166, Thr168, Thr169, Val172, Gln173, Thr175, Met176, His181, Val191, Thr192, Tyr195, Asn196, His199, Asn202, Phe203, Lys257, Ala260, Lys262, Tyr272, Gln273, Met276, Leu323, Thr326, Tyr327, Ala330, Phe331, Asn333, Asp337, Arg341, Phe344, Ile345, Asn348, Glu352, Ser353 and Gln355. Thus a binding site of a particular protein may be engineered using well known molecular biology techniques to contain any one or more of these residues to give it the same substrate specificity. This technique is well known in the art and is described in, for example, Ikuta et al (J Biol Chem (2001) 276, 27548-27554) where the authors modified the active site of cdk2, for which they could obtain structural data, to resemble that of cdk4, for which no X-ray structure was available.
In the context of the small organic molecule binding site, preferably, all 41 amino acids in the target portion which correspond to amino acid residues Leu 158, Phe 162, His 199, Asn 202, Phe 203, Phe 204, Trp 205, Met 206, Phe 207, Gly 208, Glu 209, Gly 210, Cys211, Leu 213, His 214, Met 276, Val 279 Leu 280, Leu 281, Ile 282, Asn 283, Phe 284, Ile 285, Phe 286, Leu 287, Phe 288, Ile 290, Ala 312, Ala 315, Thr 316, Leu 317, Leu 319, Leu 320, Pro 321, Leu323, Gly 324, Ile 325, Tyr 327, Gln 355, Val 359 and Phe 362 of the CRF1R are, if different, replaced. However, it will be appreciated that only 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1 amino acid residues may be replaced.
In the context of the peptide orthosteric binding site, preferably, all 41 amino acids in the target portion which correspond to amino acid residues Ala 119, Asn 123, His 127, Ser 130, Phe 162, Arg 165, Asn 166, Thr 168, Thr 169, Val 172, Gln 173, Thr 175, Met 176, His 181, Val 191, Thr 192, Tyr 195, Asn 196, His 199, Asn 202, Phe 203, Lys 257, Ala 260, Lys 262, Tyr 272, Gln 273, Met 276, Leu 323, Thr 326, Tyr 327, Ala 330, Phe 331, Asn 333, Asp 337, Arg 341, Phe 344, Ile 345, Asn 348, Glu 352, Ser 353 and Gln 355 of the CRF1R are, if different, replaced. However, it will be appreciated that only 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1 amino acid residues may be replaced.
Preferences for the target protein are as defined above with respect to the first aspect of the invention.
A seventeenth aspect of the invention provides a method of predicting the location of internal and/or external parts of the structure of CRF1R or a homologue thereof, the method comprising: providing the coordinates of the CRF1R structure of Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof and analysing said coordinates to predict the location of internal and/or external parts of the structure.
For example, from the three dimensional representation, it is possible to read off external parts of the structure, eg surface residues, as well as internal parts, eg residues within the protein core. It will be appreciated that the identification of external protein sequences will be especially useful in the generation of antibodies against a CRF1R.
The crystallisation of the CRF1R has led to many interesting observations about its structure. Thus it will be appreciated that the invention allows for the generation of mutant CRF1Rs wherein residues corresponding to these areas of interest are mutated.
Accordingly, an eighteenth aspect of the invention provides a mutant CRF1R which, when compared to the corresponding wild-type CRF1R, has a different amino acid at a position which corresponds to any one or more of the following positions according to the numbering of the human CRF1R as set out in
The invention also provides a mutant CRF1R which, when compared to the corresponding wild-type CRF1R, has a different amino acid at a position which corresponds to any one or more of the following positions according to the numbering of the human CRF1R as set out in
A nineteenth aspect of the invention provides a mutant CRF1R which, when compared to the corresponding wild-type CRF1R has a different amino acid at a position which corresponds to any one or more of the following positions according to the numbering of the human CRF1R as set out in
The inventors have found that these mutations increase the conformational stability of the GPCR (ie increase the stability of the mutant GPCR in a particular conformation compared to the stability of the parent GPCR in the same particular conformation), and so the mutant GPCR of the nineteenth aspect of the invention may be one which has increased conformational stability to any denaturant or denaturing condition such as to any one or more of heat, a detergent, a chaotropic agent or an extreme of pH. Suitable methods for assessing conformational stability are well known in the art and are described, for example, in WO 2008/114020. Conveniently, conformational stability is measured by an extended lifetime of the mutant under the imposed conditions which may lead to instability (such as heat, harsh detergent conditions, chaotropic agents and so on). Destabilisation under the imposed condition is typically determined by measuring denaturation or loss of structure. This may manifest itself by loss of ligand binding ability or loss of secondary or tertiary structure indicators.
Preferably, the mutant GPCR of the nineteenth aspect of the invention has increased stability in an agonist or antagonist conformation.
It is particularly preferred if the mutant CRF1R of the eighteenth or nineteenth aspects of the invention is one which has at least 20% amino acid sequence identity when compared to the given human CRF1R, as determined using MacVector and CLUSTALW. Preferably, the mutant CRF1R receptor has at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95% or 99% amino acid sequence identity.
The mutant CRF1R receptor may be a mutant of any CRF1R receptor provided that it is mutated at one or more of the amino acid positions as stated by reference to the given human CRF1R amino acid sequence.
Thus, the invention includes a mutant human CRF1R in which, compared to its parent, one or more of these amino acid residues have been replaced by another amino acid residue. The invention also includes mutant CRF1Rs from other sources in which one or more corresponding amino acids in the parent receptor are replaced by another amino acid residue. For the avoidance of doubt the parent may be a CRF1R which has a naturally-occurring sequence, or it may be a truncated form or it may be a fusion, either to the naturally-occurring protein or to a fragment thereof, or it may contain mutations compared to the naturally-occurring sequence, providing that it retains its natural ligand-binding ability, ie it retains binding to CRF1.
For the avoidance of doubt, the mutant CRF1R of the invention, as described in the eighteenth and nineteenth aspects, is not a CRF1R with a naturally-occurring amino acid sequence.
In an embodiment of the eighteenth aspect, the mutant CRF1R of the invention has a combination of 2 or 3- or 4- or 5- or 6- or 7- or 8- or 9- or 10- or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 or 31 or 32 or 33 or 34 of 35 or 36 or 37 or 38 or 39 or 40 or 41 mutations as described above.
In an embodiment of the nineteenth aspect, the mutant CRF1R of the invention has a combination of 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 mutations as described above.
It will be appreciated that it may be desirable to replace the intracellular loop (ICL)-2 of the mutant GPCR of the invention (eg of the nineteenth aspect of the invention) with T4 lysozyme so as to make the mutant CRF1R more amenable to crystallisation (see Example 1). By doing so, it may be desirable to remove the mutation at the position corresponding to Ser 222 according to the numbering of the human CRF1R as set out in
By “corresponding amino acid residue” we include the meaning of the amino acid residue in another CRF1R receptor which aligns to the given amino acid residue in the human CRF1R when the human CRF1R and the other CRF1R are compared using MacVector and CLUSTALW.
Residues in proteins can be mutated using standard molecular biology techniques as are well known in the art.
Although the amino acid used to replace a given amino acid at a particular position is typically a naturally occurring amino acid, typically an “encodeable” amino acid, it may be a non-natural amino acid (in which case the protein is typically made by chemical synthesis or by use of non-natural amino-acyl tRNAs). An “encodeable” amino acid is one which is incorporated into a polypeptide by translation of mRNA. It is also possible to create non-natural amino acids or introduce non-peptide linkages at a given position by covalent chemical modification, for example by post-translational treatment of the protein or semisynthesis. These post-translational modifications may be natural, such as phosphorylation, glycosylation or palmitoylation, or synthetic or biosynthetic.
A twentieth aspect of the invention provides a method of making a CRF1R crystal comprising: providing purified CRF1R; and crystallising the CRF1R by using a lipidic cubic phase technique, using a precipitant solution comprising sodium citrate, lithium sulphate, and PEG. Preferably, the sodium citrate buffer has a concentration of between 20 and 200 mM such as 100 mM, and a pH of 4.5-6.5 such as a pH of 5.5. Any suitable PEG may be used. Generally, low molecular weight PEGs are used such as PEG200, PEG300, PEG400, PEG550mme, PEG600 and PEG1000. However, it is preferred if PEG400 is used.
In a particularly preferred embodiment, the precipitant solution comprises 100 mM sodium citrate pH 5.5, 200 mM lithium sulphate, and 30% (v/v) PEG400.
In a preferred embodiment, the a CRF1R ligand is included during the crystallisation process, for example CP-376395.
Preferably, the crystals are grown in lipidic cubic phase using a monoolein/cholesterol mixture, for example as described further in Example 1.
Accordingly, it will be appreciated that the precipitant solution may comprise 100 mM sodium citrate pH 5.5, 200 mM lithium sulphate, and 30% (v/v) PEG400; a CRF1R ligand may be included during the crystallisation process, for example CP-376395; and the crystals may be grown in lipidic cubic phase using a monoolein/cholesterol mixture.
A twenty-first aspect of the invention provides a crystal of CRF1R having the structure defined by the coordinates of the human CRF1R structure, listed in Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof. Typically, the crystal has a resolution of 3.15 Å or better, such as 2.97 Å or better.
The space group of the crystal may be P22121.
Thus, in one embodiment, the crystal has P22121 symmetry and unit cell dimensions a=86.6 (±15) Å, b=124.0 (±15) Å, c=166.8 (±15) Å. It will be appreciated that with P22121 symmetry all α, β and γ angles are 90°.
The invention also includes a co-crystal of CRF1A having the structure defined by the coordinates of the human CRF1R structure, listed in Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof, and a binding partner. Typically, the crystal has a resolution of 3.15 Å or better, such as 2.97 Å or better. The binding partner may be CP-376395.
In an embodiment of the twentieth and twenty-first aspects of the invention, the CRF1R is one in which intracellular loop (ICL) 2 is replaced with T4-lysozyme (T4L). Methods for inserting T4L into an ICL of a GPCR are routine practice in the art, and are described for example in Bill et al (Nat Biotechnol 29(4) 335-340 (2011)) and Kobilka et al (Science 240(4857) 1310-6 (1988)).
The invention includes the use of the coordinates of the CRF1R structure of Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof to solve the structure of target proteins of unknown structure.
The invention includes the use of the coordinates of the CRF1R structure of Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof to identify binding partners of an CRF1R.
The invention includes the use of the coordinates of the CRF1R structure of Table A or Table B or Table C, optionally varied by a root mean square deviation of residue backbone atoms of not more than 4.383 Å, or selected coordinates thereof in methods of drug design where the drugs are aimed at modifying the activity of the CRF1R.
The invention will now be described in more detail with the aid of the following Figures and Examples.
The invention will now be described with the aid of the following Figures and Examples.
G protein-coupled receptors (GPCRs) transmit extracellular signals across cell membranes and can be classified into three families (A, B, and C) based on sequence similarity1. Class B GPCRs include receptors for peptides such as secretin, glucagon, glucagon-like peptide, calcitonin and parathyroid peptide hormone and have been studied as drug targets in the treatment of various diseases, including diabetes, osteoporosis, depression and anxiety. They feature an N-terminal extracellular domain (ECD) involved in peptide-binding and a seven transmembrane α-helices containing transmembrane domain (TMD). Recently determined structures of Class A receptors have greatly advanced our understanding of the function of GPCRs at a molecular level. However, structural information on Class B receptors is currently limited to the ECD and no structure of a Class B TMD, the main target for small-molecule drugs2, has been determined to date. Here we report the crystal structure of the TMD of the human corticotropin-releasing factor receptor 1 (CRF1R)3, a Class B GPCR essential for the stress-induced activation of the hypothalamic-pituitary-adrenal axis, in complex with the non-peptide antagonist CP-3763954. The structure reveals significant differences to those of Class A receptors. The extracellular half of the receptor assumes a very open conformation, presumably to allow binding of the large ECD-peptide complex. Furthermore, in contrast to Class A GPCRs where the ligand-binding sites are located close to the extracellular boundaries of the receptors, in CRF1R the antagonist binds in a hydrophobic pocket located deep in the cytoplasmic half of the receptor. This structure provides new insight into the architecture of Class B GPCRs and may aid in the design of novel therapeutics.
To obtain a structure of CRF1R, we generated a thermostabilized receptor (StaR) that preferentially adopts the inactive conformation using a conformational thermostabilization approach5, previously employed to determine the structures of GPCRs (Table 1). This StaR contained twelve amino-acid substitutions, none of which were located in or adjacent to the ligand-binding site. To facilitate crystallization, both termini were truncated, removing the ECD and amino-acids beyond transmembrane helix 7. Additionally, intracellular loop (ICL) 2 was replaced with T4-lysozyme (T4L) (
The corefold of CRF1R features seven transmembrane helices (TM1-TM7) in a generally similar arrangement to those observed in previously determined GPCR structures (
Comparison of the structures of CRF1R with previously solved GPCRs provides insight into the architectural differences between Class A and Class B receptors. Unlike the compact overall architecture of Class A GPCRs, CRF1R adopts a pronounced V-shape, presenting a large, polar cavity accessible from the extracellular side (
Despite the limited sequence similarity between Class A and Class B GPCRs, signaling through both receptor classes is through the same effector proteins. The comparison of CRF1R with D3R revealed that in contrast to their extracellular portions their cytoplasmic parts superimpose well (
In Class A GPCRs, a conserved salt bridge connects TM6 to TM3 in the inactive state. The sequence motifs for this ‘ionic lock’ are absent in Class B receptors. Instead, biochemical data suggests interaction of His1552.50 and Glu2093.50 to play an essential role in activation. In our structure, these two side-chains are within hydrogen-bonding distance (3.1 Å), forming a potentially important functional micro-switch (
Unexpectedly, we found strong electron density for the small-molecule antagonist CP-376395 in a pocket located deep into the cytoplasmic half of the receptor (
Access to this binding site from the extracellular side is restricted to a small channel by the side-chains of Phe2033.44 and Tyr3276.53 (
In addition, the antagonist-binding site is separated from the interior of the membrane merely by a single layer of side-chains provided by amino-acids in TM5 and TM6 and, hence, lateral opening of the binding site would require only minor rearrangements in the receptor. Further studies are needed to elucidate the precise mechanisms of antagonist binding.
The structure of the inactive state of CRF1R reported here provides valuable insight into the overall architecture of Class B GPCRs as well as into the molecular basis of Class B receptor antagonism.
The CRF1R StaR was generated using a mutagenesis approach as previously described (Robertson et al, 2011). Mutants were analyzed for thermostability in the presence of the antagonist radioligand [3H]CP-376395. The CRF1R StaR contained 12 mutations; V1201.40A, L144A, W1562.51A, S1602.55A, S222A, K2284.42A, F260A, I2775.44A, Y3096.35A, F3306.56A, S3497.43A, and Y3637.57A (
HEK293T cells were maintained in culture in DMEM supplemented with 10% (v/v) fetal bovine serum (FBS, Sigma-Aldrich) and passaged twice weekly. Cells were transfected with CRF1R constructs using GeneJuice (Merck Millipore) according to manufacturer's instructions and harvested after 48 hours using PBS supplemented with EDTA-free protease inhibitors (Roche). Membranes for use in radioligand binding assays were prepared as previously described (Robertson et al, 2011).
HEK293T cells transiently transfected with CRF1R constructs were incubated in buffer (50mMTris-HCl pH 7.5, 150mMNaCl, EDTA-free protease inhibitors) with 30 nM [3H]CP-376395, and 120 nM cold CP-376395 (Tocris) for 18 hours at room temperature. Reactions were transferred to ice and all subsequent steps performed at 4° C. Cells were solubilized in 1% (w/v) n-dodecyl-β-
For saturation binding experiments, membranes isolated from HEK293T cells transiently expressing wild-type CRF1R (15 μg/well), CRF1R StaR (6 μg/well), CRF1R StaR with T4L fusion (20 μg/well), and CRF1R-#105 (20 μg/well) were incubated in buffer (50mMTris-HCl pH 7.5, 150 mM NaCl, 0.1% (w/v) PEI, EDTA-free protease inhibitors) with [3H]CP-376395 (0-60 nM) in the presence or absence of 30 uM cold CP-376395 in a final volume of 500 μl. Final DMSO concentration in each reaction was 5% (v/v). Membranes were incubated for 18 hours at room temperature before being terminated by rapid filtration through 96-well GF/C UniFilter plates pre-soaked in 0.3% (w/v) PEI, followed by washing with PBS with 0.15% (w/v) CHAPS. Plates were dried, 50 μl Ultima Gold-F added per well and bound ligand measured using a Packard Microbeta counter. Data were analyzed using a global fitted one-site binding hyperbola in GraphPad Prism v5 to generate Kd. For solubilized whole cell ligand binding experiments, HEK293T cells transiently expressing eGFP-tagged wild-type CRF1R or single point mutants were treated as described above for thermostability experiments, without the 30 minute heating step. Specific binding was determined by subtracting untransfected controls. Expression of each construct was quantified by eGFP fluorescence of whole cells measured at Ex/Em of 488/520 nm.
A panel of N- and C-terminal truncations of the human CRF1 receptor was designed based on secondary structure prediction and hydropathy plots. Truncated receptors were expressed HEK293Tcells as C-terminal fusions with eGFP. Receptors were solubilized in 50 mM Tris-HCl pH 8.0, 150 mM NaCl, and 2% (w/v) n-decyl-β-
CRF1R carrying a C-terminal deca-histidine tag was expressed in Trichoplusia ni (High Five) cells in EX-CELL 405 medium (Sigma-Aldrich) supplemented with 10% (v/v) FBS, 1% (v/v) CD lipid concentrate (GIBCO) and 1% (v/v) Penicillin/Streptomycin (PAA Laboratories). Cells were infected at a density of 2×106 cells/ml with 10 ml of baculovirus per liter of culture, corresponding to an approximate multiplicity of infection (moi) of 1. Cultures were grown at 27° C. with constant shaking and harvested 72 hours post infection. Cells were pelleted and washed with 250 ml PBS and stored at −80° C. All subsequent purification steps were carried out at 4° C. unless indicated differently. To prepare membranes, cell were thawed at room temperature and resuspended in 400 ml ice-cold 50 mM Tris-HCl pH 8.0, 500 mM NaCl supplemented with EDTA-free protease inhibitors. The cell suspension was incubated with 0.3 μM CP-376395 for 1 hour to allow the ligand to bind. Cells were disrupted by ultra-sonication and cell debris was removed by centrifugation at 10.000×g. Membranes were collected by ultracentrifugation at 140.000×g, resuspended and stored at −80° C. until further use. Membranes were thawed at room temperature and solubilized with 2% (w/v) DM for 1.5 hours. Insoluble material was removed by ultra-centrifugation and the receptors were immobilized by batch binding to TALON metal-affinity resin (Clontech) for 2 hours. The resin was packed into a XK-16 column (GE Healthcare) and washed with steps of 8 and 30 mM imidazole in 50 mM Tris-HCl pH 8.0, 500 mM NaCl, 0.15% (w/v) DM, and 0.3 μM CP-376395 for a total of 15-20 column volumes before bound material was eluted with 200 mM imidazole. The protein was then concentrated using an Amicon Ultra-15 centrifugal filter unit (Millipore) and subjected to preparative gel filtration in 20 mM Tris-HCl pH 8.0, 150 mM NaCl, 0.15% (w/v) DM, and 0.3 μM CP-376395 on a Superdex200 10/300 GL gel filtration column (GE Healthcare) to remove remaining contaminating proteins and aggregates. It is important to note that in preparations of CRF1R-#105 significantly more aggregated material was obtained than with CRF1R-#76. For improved yields and a higher degree of homogeneity the procedure was altered as follows. After elution from the metal affinity resin the buffer was exchanged to 50 mM Tris-HCl pH 8.0, 500 mM NaCl, 0.15% DM, 0.3 μM CP-376395 and 5 mM EDTA by desalting. In addition, the final buffer was supplemented with 1-palmitoyl-2-oleoyl-sn-glycero-3-phospho-(1′-rac-glycerol) (POPG, Avanti Polar Lipids) at a concentration of 0.005% (w/v). Receptor purity was analyzed using SDS-PAGE and mass spectrometry and receptor mono-dispersity was assayed by FSEC monitoring tryptophan fluorescence (
CRF1R was crystallized in lipidic cubic phase (LCP) at 22.5° C. The protein was concentrated to 20-30 mg/ml by ultrafiltration and mixed with monoolein (Nu-Check) supplemented with 10% (w/w) cholesterol (Sigma) and 5 μM CP-376395 using the twin-syringe method (Caffrey & Cherezov, 2009). The final protein:lipid ratio was 1:1.5(w/w). With the help of a dispensing robot (Mosquito LCP, TTP Labtech), 40-60 nlboli were dispensed on 96-well Laminex Glass Bases (Molecular Dimensions), overlaid with 0.75 μl precipitant solution and sealed off with LaminexFilm Covers (Molecular Dimensions). 20-30 μm crystals of construct CRF1R-#76 were obtained in 100 mM Na-citrate pH 5.5, 200 mM Li2SO4, 30% (v/v) polyethylene glycol 400, and 0.6 μM CP-376395 and we were able to collect a complete dataset to 3.2 Å by combining data from multiple crystals. The crystals belonged to hexagonal spacegroup P6 and the data featured a 30% off-origin peak in a native Patterson map, indicating translational non-crystallographic symmetry (tNCS). Extensive trials to solve the structure by molecular replacement failed, most likely due to the presence of tNCS. We hypothesized that conformational flexibility in the connection between the receptor and T4L was the cause for the observed pseudo-symmetry in the crystals and that deletion of residues in this part of the CRF1R-T4L fusion would reduce flexibility of the construct and, hence, enable growth of a different crystal form without tNCS. The resulting construct CRF1R-#105 (
X-ray diffraction data were measured on a Pilatus 6M hybrid-pixel detector at Diamond Light Sourcebeamline I24 using a 5 μm×5 μm microbeam. Crystals displayed isotropic diffraction to beyond 3.0 Å following exposure to an unattenuated beam for 7.5 seconds per degree of oscillation. Consequently, radiation damage was severe and wedges of typically only 2-3 degrees per crystal could be used for data merging. Data from individual crystals were integrated using XDS (Kabsch, 2010) and a complete dataset was compiled using the data collection strategy option of the programMosflm (Leslie & Powell, 2007). Data merging and scaling was carried out with AIMLESS (Evans & Murshudov, 2012; Collaborative Computational Project, Number 4, 1994). The final dataset comprised data from 35 crystals and was scaled to 3.15 Å with a completeness of 93.3% overall using a combination of isotropic resolution cut-off criteria such as </>/<σ/> and Rmerge. Crystals belonged to orthorhombic spacegroup P22121 with unit cell dimension of a=86.6 Å, b=124.0 Å, c=166.8 Å, á=â=ã=90°. Using the microdiffraction assembly method as described previously (Hanson et al, 2012) we were able to extend the resolution of the dataset to 2.97 Å. Briefly, data from each crystal were split into wedges of reflection observations corresponding to 1° of oscillation and then scaled individually to a medium-resolution (4.3 Å) reference dataset, collected from a single crystal, using XSCALE (Kabsch, 2010) without merging reflections. Initially, as rejection criterion for reflections, the peak profile correlation threshold was set to zero and increased in increments of 1% until all reflection observations could be scaled with an Rmergelower than 14%. The resulting multi-record reflection file was then scaled using AIMLESS. Data collection statistics for both methods are presented in Table 3. For subsequent structure solution and refinement the data processed using the micro-diffraction assembly method was used.
Cell content analysis using the Matthews volume (Matthews, 1968) suggested the presence of three copies of receptor-T4L fusion in the asymmetric unit, resulting in a solvent content of 57%. The structure was solved by molecular replacement (MR) with the program Phaser (McCoy et al, 2007, Collaborative Computational Project, Number 4, 1994) using two independent search models, T4L from the adenosine A2A receptor structure (PDB ID 3EML) and a truncated version (TM helices only, no loops) of the dopamine D3 receptor (PDB ID 3PBL). Solutions were found for two out of the three T4L copies, which were subsequently fixed to locate three copies of the truncated receptor. Manual model building was done in COOT (Emsley et al, 2010) using sigma-A weighted 2Fo-Fc, Fo-Fc as well as a simulated-annealing composite omit maps calculated using Phenix (Adams et al, 2010). Initial refinement was carried out with REFMAC5 (Murshudov et al, 2011, Collaborative Computational Project, Number 4, 2007) using the maximum-likelihood restrained refinement protocol in combination with the jelly-body method and imposing tight non-crystallographic symmetry (NCS) restraints. Later stages of the refinement were performed with Phenix using a combination of simulated annealing, positional and individual isotropic B-factor refinement. NCS restraints were gradually loosened and finally fully released. The resulting model was then submitted to backbone torsion optimization followed by automated all-atom real-space refinement against a 2Fo-Fc electron density map, a method developed by Haddadian and co-workers (Haddadian et al, 2011), resulting in improved stereochemistry and electron density maps. The quality of the model was further enhanced by manual adjustments until the crystallographic R-factors Rwork and Rfree reached 24.0% and 26.3%, respectively, and structure quality assessed with Molprobity (Chen et al, 2010) was satisfactory. With increasing quality of the model, weak electron density became visible for the first and last few residues of the missing copy of T4L at the junctions to TM3 and TM4 in chain C, revealing that the orientation of the T4L insertion relative to its corresponding receptor was significantly different from those observed in the other two receptor-T4L fusions. Very poor or no density was, however, observed for the remaining parts of T4L. It is conceivable that due to the absence of lattice contacts in this region this portion remains disordered in a solvent-filled cavity of the crystal lattice. The final refinement statistics are presented in Table 3. Figures were prepared using PyMOL (Schrödinger).
Superposition of D3R onto CRF1R
D3R (molecule A in PDB ID 3PBL) was superimposed onto CRF1R molecule C using the Cα-atoms of the following amino-acid ranges comprising the cytoplasmic halves of TM1, TM2, TM4 and TM5 as well as entire TM3 (CRF1R/D3R): 130-143/43-56 (TM1), 150-162/63-75 (TM2), 193-216/108-131 (TM3), 228-234/150-156 (TM4), 282-295/203-216 (TM5). TM6 and TM7, exhibiting obvious conformational differences, were excluded.
We defined the global common TM region between Class A and B GPCRs as the CRF1R residues 119-143, 150-176, 186-218, 227-247, 269-294, 312-332 and 343-365. They correspond to the Class A Ballesteros-Weinstein residues 1.35-1.59, 2.38-2.64, 3.23-3.55, 4.41-4.61, 5.40-5.65, 6.33-6.53, 7.33-7.55. For every CRF1R—Class A GPCR crystal structure the RMSDs were calculated as indicated: both molecules were initially read into Maestro and their sequences were aligned using the ‘Pairwise Alignment’ algorithm contained within the ‘Multiple Sequence Viewer’ toolbar within Maestro. Manual adjustment within the ‘Multiple Sequence Viewer’ using the ‘Grab and drag’ tool was performed to have the correct corresponding residues on the TM region (Table 4) to ensure correct alignment of corresponding residues. Residues not in the defined global common TM region were selected within the ‘Multiple Sequence Viewer’ using the ‘Select and slide’ tool. They were deleted using the ‘delete’ menu in the main window of Maestro pressing the ‘select’ tool and in the ‘Atom Selection’ pop up box pressing ‘Selection’ and ‘OK’. Protein side chains were deleted using the ‘delete’ menu in the main window of Maestro pressing the ‘select’ tool, in the ‘Atom Selection’ pop up box selecting the ‘Residue’ tab, selecting Backbone/side chain′, ticking the ‘side chain’ box and pressing ‘Add’ and ‘OK’. For the global superposition of the 7 TMs the ‘Superposition’ tool was selected from the ‘Tools’ menu in the main window of Maestro. The ‘Superimpose by ASL’ tab was selected and the ‘All’ button was pressed. The global backbone RMSD for the 7 TMs is then returned in the box at the bottom of the ‘Superposition’ tool.
Starting from this obtained structural aligned position of the 7 TMs a RMSD value for every TM was calculated. All TMs excluding the TM considered were selected within the ‘Multiple Sequence Viewer’ using the ‘Select and slide’ tool. These selected TMs not of interest were deleted using the ‘delete’ menu in the main window of Maestro pressing the ‘select’ tool and in the ‘Atom Selection’ pop up box pressing ‘Selection’ and ‘OK’. For example to analyze TM1, for all GPCRs considered TM2 to TM7 were deleted.
On the resulting individual TM the global backbone RMSD was calculated using the ‘Superposition’ tool from the ‘Tools’ menu in the main window of Maestro. The ‘Calculate in place (no transformation)’ box was ticked. The ‘Superimpose by ASL’ tab was selected and the ‘All’ button was pressed. The RMSD for the individual TM (using the starting global superimposition based on the 7 TMs) is then returned in the box at the bottom of the ‘Superposition’ tool.
Similarly starting from the individual TM obtained above the backbone RMSD was calculated after superimposition of the individual TM. The ‘Superposition’ tool from the ‘Tools’ menu in the main window of Maestro was used. The ‘Calculate in place (no transformation)’ box was not ticked. The ‘Superimpose by ASL’ tab was selected and the ‘All’ button was pressed. The RMSD for the individual TM (after superimposition of the individual TM) is then returned in the box at the bottom of the ‘Superposition’ tool.
Tables A-C show the x, y and z coordinates by amino acid residue of each non-hydrogen atom in the polypeptide structure for molecules A, B and C respectively, in addition to the antagonist CP-376395 atoms. The crystallised polypeptide is shown in
The fourth column of the tables indicates whether the atom is from an amino acid residue of the CRF1R protein residues 115-368) (by three-letter amino acid code e.g. TRP, GLU, ALA etc), an amino acid residue of T4L (residues 1002-1161) or the CP-376395 ligand (CP3). Parameters used for the modelling are listed in the REMARK section.
POP is: 1-palmitoyl-2-oleoyl-sn-glycero-3-phospho-(1′-rac-glycerol) abbreviated as POPG; MOO is: 1-oleoyl-rac-glycerol aka monoolein.
This application claims the benefit under 35 U.S.C. §119(e) of U.S. Provisional Application Ser. No. 61/783,914, entitled “CRYSTAL STRUCTURE,” filed on Mar. 14, 2013, the entire disclosure of which is herein incorporated by reference in its entirety.