The present invention relates generally to the structure of the ligand-binding domain of CAR, and more particularly to the structure of the ligand-binding domain of CAR in complex with a ligand. The present invention also relates to CAR binding compounds and to the design of compounds that bind to CAR.
amu—atomic mass unit(s)
ATP—adenosine triphosphate
ADP—adenosine diphosphate
BSA—bovine serum albumin
CaMV—cauliflower mosaic virus
CAR—constitutive androstane receptor
CARαa—constitutive androstane receptor alpha
CBP—CREB binding protein
CCDB—Cambridge Crystallographic Data Bank
cDNA—complementary DNA
CPU—central processing unit
RAM—random access memory
CRT—cathode-ray tube
DBD—DNA binding domain
DMSO—dimethyl sulfoxide
DNA—deoxyribonucleic acid
DTT—dithiothreitol
EDTA—ethylenediaminetetraacetic acid
Et2O—diethyl ether
FEDs—field emission displays
GST—glutathione S-transferase
HEPES—N-2-hydroxyethylpiperazine-N′-2-ethanesulfonic acid
kDa—kilodalton(s)
LBD—ligand-binding domain
LCDs—liquid crystal displays
LED—light emitting diode
MPD—methyl-pentanediol
MCAR—mouse constitutive androstane receptor
MIR—multiple isomorphous replacement
MPD—methyl pentanediol
N-COR—nuclear co-repressor
NDP—nucleotide diphosphate
NR—nuclear receptor
nt—nucleotide(s)
NTP—nucleotide triphosphate
PAGE—polyacrylamide gel electrophoresis
PCR—polymerase chain reaction
PEG—polyethylene glycol
pI—isoelectric point
PXR—pregnane X receptor
PBREM—phenobarbital-responsive enhancer module
RAR—retinoic acid receptor
RAREs—retinoic acid response elements
rCAR—rat constitutive androstane receptor
RUBISCO—ribulose bisphosphate carboxylase
RXR—retinoid X receptor
SDS—sodium dodecyl sulfate
SDS-PAGE—sodium dodecyl sulfate polyacrylamide gel electrophoresis
SMRT—silencing mediator for retinoid and thyroid receptors
SRC-1—steroid receptor coactivator-1
SR—steroid receptor
TFA—trifluoroacetic acid
TMV—tobacco mosaic virus
TR—thyroid receptor
VDR—vitamin D receptor
The constitutive androstane receptor (CAR; Unified Nomenclature Committee designation NR1I3) was isolated in 1994 by screening a human liver library with a degenerate oligonucleotide probe based on the P box region (Baes et al., 1994). CAR was subsequently shown to be a heterodimer partner for RXR that acts as a specific, retinoid-independent activator of a subset of retinoic acid response elements (RAREs). The mouse CAR homologue was also isolated in 1994 (Honkakoski et al., 1998). Mouse CAR studies showed that RXR and CAR bind to a site in the phenobarbital-responsive enhancer module (PBREM) of the cytochrome P-450 Cyp2b10 gene in response to phenobarbital induction. Expression of RXR and CAR in mammalian cell lines activated PBREM, indicating that a CAR-RXR heterodimer is a trans-acting factor for the mouse Cyp2b10 gene. These studies were the first to indicate that CAR might play a role in response to xenobiotics.
The ability to respond to a wide range of potentially toxic chemicals is essential in a complex environment. Evidence is accumulating that CAR and its closest mammalian homologue, the pregnane X receptor (PXR; Unified Nomenclature Committee designation NR1I2), evolved to detect xenobiotics as part of the body's detoxification machinery (Waxman, 1999). Both receptors are highly expressed in the liver and intestine and both regulate the expression of specific detoxification genes. PXR and CAR regulate genes whose protein products are involved in the hydroxylation (phase I), conjugation (phase II), and transport of xenobiotics (phase III). CAR is activated by some of the same ligands as PXR (Moore et al., 2000), regulates at least partially overlapping sets of genes (e.g. CYP3A and CYP2B; Xie et al., 2000a), and can signal through the same response elements (Goodwin et al., 2001; Handschin et al., 2001).
Despite these similarities, CAR differs from PXR in several respects. CAR ligand binding has been shown to be more restricted than that of PXR (Moore et al., 2000). Furthermore, CAR displays a high basal level of activity relative to PXR that can be reduced by the binding of either naturally occurring androstanes or xenobiotics such as clotrimazole (Baes et al., 1994; Moore et al., 2000). Finally, CAR displays fundamental differences from PXR with regard to its cellular regulation. In mouse primary hepatocytes and in mouse liver in vivo, CAR is cytoplasmic in the naive state and translocates to the nucleus upon activation (Kawamoto et al., 1999), a process thought to be regulated in part by dephosphorylation of the receptor (Honkakoski et al., 1998). Induction of CAR nuclear translocation does not necessarily depend upon ligand-binding, as phenobarbital has been shown to be an activator of CAR in vivo and in hepatocytes, but does not appear to interact directly with the CAR ligand-binding domain (Moore et al., 2000). Thus, CAR has a high basal level of transcriptional activity even in the absence of an exogenous ligand. An important goal of future efforts will be to further differentiate the physical and functional properties of CAR from PXR, and to ultimately distinguish the unique physiological role of CAR.
Towards this goal, the CAR gene has recently been “knocked-out” by targeted gene disruption (Xie et al., 2000b). The loss of CAR expression did not result in any overt phenotype. Homozygous CAR−/− animals were born at the expected Mendelian frequency, and both male and female CAR-deficient animals were fertile. It was further demonstrated that the nuclear receptor CAR mediates the Cyp2b10 gene response evoked by phenobarbital-like inducers, as well as by the more potent TCPOBOP compound (Xie et al., 2000b). When challenged, these animals showed decreased metabolism of the classic CYP substrate zoxazolamine and a complete loss of the liver hypertrophic and hyperplastic responses to these compounds. These experiments were thus consistent with the notion that at least one aspect of the physiological role of CAR involves xenobiotic metabolism.
Further insight into CAR is expected to be gleaned from CAR structural studies. The availability of the CAR structure will allow an understanding of ligand modulation of CAR activity and will facilitate the design of novel CAR ligands. The present invention addresses these and other needs in the art.
The present invention provides a crystalline form comprising a substantially pure constitutive androstane receptor (CAR) ligand-binding domain polypeptide. In one embodiment, the crystalline form comprises a substantially pure constitutive androstane receptor (CAR) ligand-binding domain polypeptide in complex with a ligand. In one embodiment, a ligand is 2-(benzhydrylamino)-1-(2-phenylethyl)-1H-benzimidazole-6-carboxamide.
The present invention also provides a method of generating a crystalline form comprising a constitutive androstane receptor (CAR) ligand-binding domain polypeptide in complex with a ligand, the method comprising: (a) incubating a solution comprising a constitutive androstane receptor (CAR) ligand-binding domain and a ligand with an equal volume of reservoir; and (b) crystallizing the constitutive androstane receptor (CAR) ligand-binding domain polypeptide and ligand using the hanging drop method, whereby a crystalline form of a constitutive androstane receptor (CAR) ligand-binding domain polypeptide in complex with a ligand is generated. Also provided is a crystalline form formed by the above-recited method. In one embodiment, a ligand is 2-(benzhydrylamino)-1-(2-phenylethyl)-1H-benzimidazole-6-carboxamide.
The present invention also provides a method of designing a chemical compound that modulates the biological activity of a target constitutive androstane receptor (CAR) polypeptide. In one embodiment, the method comprises: obtaining one or more three-dimensional structures for the ligand-binding domain (LBD) of constitutive androstane receptor (CAR) in a repressed conformation, and one or more three-dimensional structures of the LBD of constitutive androstane receptor (CAR) in an activated conformation; rotating and translating the three-dimensional structures as rigid bodies so as to superimpose corresponding backbone atoms of a core region of the constitutive androstane receptor (CAR) LBD; comparing one or both of: (i) the superimposed three-dimensional structures to identify volume near the ligand-binding pocket of the constitutive androstane receptor (CAR) LBD that is available to a ligand in the one or more activated structures, or in one or more repressed structures, but that is not available to the ligand in one or more structures of the opposite class; and (ii) the superimposed three-dimensional structures to identify interactions that a ligand could make in one or more of the activated structures, or in one or more of the repressed structures, but which the ligand could not make in one or more structures of the opposite class; and designing a chemical compound that occupies the volume, makes the interaction, or both occupies the volume and makes the interaction.
Optionally the method further comprises synthesizing the designed chemical compound; and testing the designed chemical compound in a biological assay to determine whether it acts as a ligand of constitutive androstane receptor (CAR) with an effect on constitutive androstane receptor (CAR) biological activities, whereby a ligand of a constitutive androstane receptor (CAR) polypeptide is designed.
In another embodiment, the volume or interaction is available in one or more of the repressed structures of constitutive androstane receptor (CAR), but not available in one or more of the activated structures of constitutive androstane receptor (CAR). In another embodiment, the method further comprises designing a chemical compound that promotes the binding of co-repressor to the constitutive androstane receptor (CAR) LBD by making direct favorable interactions with the co-repressor. In another embodiment, the method further comprises designing a chemical compound that reduces binding of a co-repressor to the constitutive androstane receptor (CAR) LBD by making direct unfavorable interactions with the co-repressor. In another embodiment, the method further comprises designing a chemical compound that promotes coactivator binding by displacing an AF2 helix of the constitutive androstane receptor (CAR) LBD and making direct favorable interactions with a coactivator, where the designing allows for an expected movement of the coactivator within a coactivator/co-repressor binding pocket. In yet another embodiment, the method further comprises designing a chemical compound by considering a known agonist of the constitutive androstane receptor (CAR) and adding a substituent that protrudes into the volume identified in step (c) or that makes a desired interaction.
The present invention also provides a binding site in a human constitutive androstane receptor (CAR) polypeptide for a constitutive androstane receptor ligand, wherein the ligand is in van der Waals, hydrogen binding, or van der Waals and hydrogen binding contact with at least one residue of the human constitutive androstane receptor polypeptide.
The present invention also provides a complex of a human constitutive androstane receptor (CAR) ligand-binding domain and a ligand, wherein the ligand is in van der Waals, hydrogen bonding, or both van der Waals and hydrogen bonding contact with at least one of the following residues of the human constitutive androstane receptor polypeptide: Phe161, Ile164, Asn165, Val199, His203, Phe217, Trp224, Thr225, Ile226, Asp228, Gly229, Gln234, Phe238, Leu239, Leu242, Phe243, Tyr326, Met339, Met340.
The present invention also provides a crystal of a complex of a human constitutive androstane receptor (CAR) ligand-binding domain and a ligand, wherein the ligand is in van der Waals, hydrogen bonding, or both van der Waals and hydrogen bonding contact with at least one of the following residues of the human constitutive androstane receptor polypeptide: Phe161, Ile164, Asn165, Val199, His203, Phe217, Trp224, Thr225, Ile226, Asp228, Gly229, Gln234, Phe238, Leu239, Leu242, Phe243, Tyr326, Met339, Met340. In one embodiment, the constitutive androstane receptor is a human constitutive androstane receptor and the crystal has the following physical measurements: space group P212121 and unit cell: a=83.0 angstroms, b=116.8 angstroms, c=131.9 angstroms, and α=β=γ=90 degrees.
The present invention also provides a method for designing a ligand of a constitutive androstane receptor (CAR) polypeptide, the method comprising: (a) forming a complex of a compound bound to the constitutive androstane receptor (CAR) polypeptide; (b) determining a structural feature of the complex formed in (a); wherein the structural feature is of a binding site for the compound; and (c) using the structural feature determined in (b) to design a ligand of a constitutive androstane receptor (CAR) polypeptide capable of binding to the binding site of the present invention. In one embodiment, the method of the present invention further comprises using a computer-based model of the complex formed in (a) in designing the ligand.
The present invention also provides a method of designing a ligand that selectively modulates the activity of a constitutive androstane receptor (CAR) polypeptide, the method comprising: (a) evaluating a three-dimensional structure of a crystallized constitutive androstane receptor (CAR) ligand-binding domain polypeptide in complex with a ligand; and (b) synthesizing a potential ligand based on the three-dimensional structure of the crystallized constitutive androstane receptor (CAR) catalytic polypeptide in complex with a ligand, whereby a ligand that selectively modulates the activity of a constitutive androstane receptor (CAR) polypeptide is designed. In one embodiment, the constitutive androstane receptor (CAR) ligand-binding domain polypeptide comprises the amino acid sequence of SEQ ID NO: 4. In one embodiment, the crystalline form is such that the three-dimensional structure of the crystallized constitutive androstane receptor (CAR) ligand-binding domain polypeptide in complex with a ligand can be determined to a resolution of about 2.15 Å or better. In one embodiment, the method further comprises contacting a constitutive androstane receptor (CAR) ligand-binding domain polypeptide with the potential ligand and a ligand; and assaying the constitutive androstane receptor (CAR) ligand-binding domain polypeptide for binding of the potential ligand, for a change in activity of the constitutive androstane receptor (CAR) ligand-binding domain polypeptide, or both. In one embodiment, the ligand is 2-(benzhydrylamino)-1-(2-phenylethyl)-1H-benzimidazole-6-carboxamide.
The present invention also provides a method of screening a plurality of compounds for a ligand of a constitutive androstane receptor (CAR) ligand-binding domain polypeptide, the method comprising: (a) providing a library of test samples; (b) contacting a crystalline form comprising a constitutive androstane receptor (CAR) polypeptide in complex with a ligand with each test sample; (c) detecting an interaction between a test sample and the crystalline constitutive androstane receptor (CAR) polypeptide in complex with a ligand; (d) identifying a test sample that interacts with the crystalline constitutive androstane receptor (CAR) polypeptide in complex with a ligand; and (e) isolating a test sample that interacts with the crystalline constitutive androstane receptor (CAR) polypeptide in complex with a ligand, whereby a plurality of compounds is screened for a ligand of a constitutive androstane receptor (CAR) ligand-binding domain polypeptide. In one embodiment, the CAR polypeptide comprises a CAR ligand-binding domain. In another embodiment, the CAR polypeptide is a human CAR polypeptide. In yet another embodiment, the CAR polypeptide comprises the amino acid sequence of SEQ ID NO: 4. In one embodiment, the library of test samples is bound to a substrate. In another embodiment, the library of test samples is synthesized directly on a substrate. In one embodiment, the ligand is 2-(benzhydrylamino)-1-(2-phenylethyl)-1H-benzimidazole-6-carboxamide,
The present invention also provides a method for identifying a constitutive androstane receptor (CAR) ligand, the method comprising: (a) providing atomic coordinates of a constitutive androstane receptor (CAR) ligand-binding domain in complex with a ligand to a computerized modeling system; and (b) modeling a ligand that fits spatially into the binding pocket of the constitutive androstane receptor (CAR) ligand-binding domain to thereby identify a constitutive androstane receptor (CAR) ligand. In one embodiment, the method further comprises identifying in an assay for constitutive androstane receptor (CAR)-mediated activity a modeled ligand that increases or decreases the activity of the constitutive androstane receptor (CAR). In one embodiment, the CAR is a human CAR. In one embodiment, the CAR ligand-binding domain comprises the amino acid sequence of SEQ ID NO: 4. In one embodiment, the ligand is 2-(benzhydrylamino)-1-(2-phenylethyl)-1H-benzimidazole-6-carboxamide.
The present invention also provides a method of identifying a constitutive androstane receptor (CAR) ligand that selectively binds a constitutive androstane receptor (CAR) polypeptide compared to other polypeptides, the method comprising: (a) providing atomic coordinates of a constitutive androstane receptor (CAR) ligand-binding domain in complex with a ligand to a computerized modeling system; and (b) modeling a ligand that fits into the binding pocket of a constitutive androstane receptor (CAR) ligand-binding domain and that interacts with residues of a constitutive androstane receptor (CAR) ligand-binding domain that are conserved among constitutive androstane receptor (CAR) subtypes to thereby identify a constitutive androstane receptor (CAR) ligand that selectively binds a constitutive androstane receptor (CAR) polypeptide compared to other polypeptides. In one embodiment, the method further comprises identifying in a biological assay for constitutive androstane receptor (CAR) activity a modeled ligand that selectively binds to said constitutive androstane receptor (CAR) and increases or decreases the activity of the constitutive androstane receptor (CAR). In one embodiment, the CAR ligand-binding domain comprises the amino acid sequence shown in SEQ ID NO: 4. In one embodiment, the ligand is 2-(benzhydrylamino)-1-(2-phenylethyl)-1H-benzimidazole-6-carboxamide.
The present invention also provides a method of designing a ligand of a constitutive androstane receptor (CAR) polypeptide, the method comprising: (a) selecting a candidate constitutive androstane receptor (CAR) ligand; (b) determining which amino acid or amino acids of a constitutive androstane receptor (CAR) polypeptide interact with the ligand using a three-dimensional model of a crystallized protein, the model comprising a constitutive androstane receptor (CAR) ligand-binding domain in complex with a ligand; (c) identifying in a biological assay for constitutive androstane receptor (CAR) activity a degree to which the ligand modulates the activity of the constitutive androstane receptor (CAR) polypeptide; (d) selecting a chemical modification of the ligand wherein the interaction between the amino acids of the constitutive androstane receptor (CAR) polypeptide and the ligand is predicted to be modulated by the chemical modification; (e) synthesizing a ligand having the chemical modified to form a modified ligand; (f) contacting the modified ligand with the constitutive androstane receptor (CAR) polypeptide; (g) identifying in a biological assay for constitutive androstane receptor (CAR) activity a degree to which the modified ligand modulates the biological activity of the constitutive androstane receptor (CAR) polypeptide; and (h) comparing the biological activity of the constitutive androstane receptor (CAR) polypeptide in the presence of modified ligand with the biological activity of the constitutive androstane receptor (CAR) polypeptide in the presence of the unmodified ligand, whereby a ligand of a constitutive androstane receptor (CAR) polypeptide is designed. In one embodiment, wherein the method further comprises repeating steps (a) through (f), if the biological activity of the constitutive androstane receptor (CAR) polypeptide in the presence of the modified ligand varies from the biological activity of the constitutive androstane receptor (CAR) polypeptide in the presence of the unmodified ligand.
The present invention also provides a crystallized, recombinant polypeptide comprising: (a) an amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; (b) an amino acid sequence having at least about 95% identity with the amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; or (c) an amino acid sequence encoded by a polynucleotide that hybridizes under stringent conditions to the complementary strand of a polynucleotide having SEQ ID NO: 1 or SEQ ID NO: 3 and has at least one biological activity of constitutive androstane receptor (CAR); wherein the polypeptide of (a), (b) or (c) is in crystal form. In one embodiment, the crystallized, recombinant polypeptide diffracts X-rays to a resolution of about 2.5 Å or better. In another embodiment, the polypeptide comprises at least one heavy atom label. In another embodiment, the polypeptide is labeled with seleno-methionine.
The present invention also provides a method for designing a modulator for the prevention or treatment of a disease or disorder, comprising: (a) providing a three-dimensional structure for a crystallized, recombinant polypeptide; (b) identifying a potential modulator for the prevention or treatment of a disease or disorder by reference to the three-dimensional structure; (c) contacting a polypeptide or a constitutive androstane receptor (CAR) with the potential modulator; and (d) assaying the activity of the polypeptide after contact with the modulator, wherein a change in the activity of the polypeptide indicates that the modulator can be useful for prevention or treatment of a disease or disorder.
The present invention also provides a method for obtaining structural information of a crystallized polypeptide, the method comprising: (a) crystallizing a recombinant polypeptide, wherein the polypeptide comprises: (1) an amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; (2) an amino acid sequence having at least about 95% identity with the amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; or (3) an amino acid sequence encoded by a polynucleotide that hybridizes under stringent conditions to the complementary strand of a polynucleotide having SEQ ID NO: 1 or SEQ ID NO: 3 and has at least one biological activity of human constitutive androstane receptor (CAR); and wherein the crystallized polypeptide is capable of diffracting X-rays to a resolution of 2.5 Å or better; and (b) analyzing the crystallized polypeptide by X-ray diffraction to determine the three-dimensional structure of at least a portion of the crystallized polypeptide. In one embodiment, the three-dimensional structure of the portion of the crystallized polypeptide is determined to a resolution of 2.5 Å or better.
The present invention also provides a method for identifying a druggable region of a polypeptide, the method comprising: (a) obtaining crystals of a polypeptide comprising (1) an amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; (2) an amino acid sequence having at least about 95% identity with the amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; or (3) an amino acid sequence encoded by a polynucleotide that hybridizes under stringent conditions to the complementary strand of a polynucleotide having SEQ ID NO: 1 or SEQ ID NO: 3 and has at least one biological activity of human constitutive androstane receptor (CAR), such that the three dimensional structure of the crystallized polypeptide can be determined to a resolution of 2.5 Å or better; (b) determining the three dimensional structure of the crystallized polypeptide using X-ray diffraction; and (c) identifying a druggable region of the crystallized polypeptide based on the three-dimensional structure of the crystallized polypeptide. In one embodiment, the druggable region is an active site. In another embodiment, the druggable region is on the surface of the polypeptide.
The present invention also provides a crystalline human constitutive androstane receptor (CAR) comprising a crystal having unit cell dimensions a=83.0 Å; b=116.8 Å; c=131.9 Å; α=β=γ=90°; with an orthorhombic space group P212121 and 4 molecules per asymmetric unit.
The present invention also provides a crystallized polypeptide comprising: (1) an amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; (2) an amino acid sequence having at least about 95% identity with the amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; or (3) an amino acid sequence encoded by a polynucleotide that hybridizes under stringent conditions to the complementary strand of a polynucleotide having SEQ ID NO: 1 or SEQ ID NO: 3 and has at least one biological activity of human constitutive androstane receptor (CAR); wherein the crystal has a P212121 space group.
The present invention also provides a crystallized polypeptide comprising a structure of a polypeptide that is defined by a substantial portion of the atomic coordinates set forth in Table 2 or Table 3.
The present invention also provides a method for determining the crystal structure of a homolog of a polypeptide, the method comprising: (a) providing the three dimensional structure of a first crystallized polypeptide comprising (1) an amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; (2) an amino acid sequence having at least about 95% identity with the amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; or (3) an amino acid sequence encoded by a polynucleotide that hybridizes under stringent conditions to the complementary strand of a polynucleotide having SEQ ID NO: 1 or SEQ ID NO: 3 and has at least one biological activity of human constitutive androstane receptor (CAR); (b) obtaining crystals of a second polypeptide comprising an amino acid sequence that is at least 70% identical to the amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4, such that the three dimensional structure of the second crystallized polypeptide can be determined to a resolution of 2.5 Å or better; and (c) determining the three dimensional structure of the second crystallized polypeptide by X-ray crystallography based on the atomic coordinates of the three dimensional structure provided in step (a). In one embodiment, the atomic coordinates for the second crystallized polypeptide have a root mean square deviation from the backbone atoms of the first polypeptide of not more than 1.5 Å for all backbone atoms shared in common with the first polypeptide and the second polypeptide.
The present invention also provides a method for homology modeling a homolog of human constitutive androstane receptor (CAR), comprising: (a) aligning the amino acid sequence of a homolog of human constitutive androstane receptor (CAR) with an amino acid sequence of SEQ ID NO: 2 or SEQ ID NO: 4 and incorporating the sequence of the homolog of human CAR into a model of human constitutive androstane receptor (CAR) derived from structure coordinates as listed in Table 2 or Table 3 to yield a preliminary model of the homolog of human CAR; (b) subjecting the preliminary model to energy minimization to yield an energy minimized model; (c) remodeling regions of the energy minimized model where stereochemistry restraints are violated to yield a final model of the homolog of human constitutive androstane receptor (CAR).
The present invention also provides a method for obtaining structural information about a molecule or a molecular complex of unknown structure comprising: (a) crystallizing the molecule or molecular complex; (b) generating an X-ray diffraction pattern from the crystallized molecule or molecular complex; (c) applying at least a portion of the structure coordinates set forth in Table 2 or Table 3 to the X-ray diffraction pattern to generate a three-dimensional electron density map of at least a portion of the molecule or molecular complex whose structure is unknown.
The present invention also provides a method for attempting to make a crystallized complex comprising a polypeptide and a modulator having a molecular weight of less than 5 kDa, the method comprising: (a) crystallizing a polypeptide comprising (1) an amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; (2) an amino acid sequence having at least about 95% identity with the amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; or (3) an amino acid sequence encoded by a polynucleotide that hybridizes under stringent conditions to the complementary strand of a polynucleotide having SEQ ID NO: 1 or SEQ ID NO: 3 and has at least one biological activity of human constitutive androstane receptor (CAR); such that crystals of the crystallized polypeptide will diffract X-rays to a resolution of 5 Å or better; and (b) soaking the crystals in a solution comprising a potential modulator having a molecular weight of less than 5 kDa.
The present invention also provides a method for incorporating a potential modulator in a crystal of a polypeptide, comprising placing a hexagonal crystal of human constitutive androstane receptor (CAR) having unit cell dimensions a=83.0 Å; b=116.8 Å; c=131.9 Å, a=b=g=90°, with an orthorhombic space group P212121, in a solution comprising the potential modulator.
The present invention also provides a computer readable storage medium comprising digitally encoded structural data, wherein the data comprises structural coordinates as listed in Table 2 or Table 3 for the backbone atoms of at least about six amino acid residues from a druggable region of human constitutive androstane receptor (CAR).
The present invention also provides a scalable three-dimensional configuration of points, at least a portion of the points derived from some or all of the structure coordinates as listed in Table 2 or Table 3 for a plurality of amino acid residues from a druggable region of human constitutive androstane receptor (CAR). In one embodiment, the structure coordinates as listed in Table 2 or Table 3 for the backbone atoms of at least about five amino acid residues from a druggable region of human constitutive androstane receptor (CAR) are used to derive part or all of the portion of points. In another embodiment, the structure coordinates as listed in Table 2 or Table 3 for the backbone and optionally the side chain atoms of at least about ten amino acid residues from a druggable region of human constitutive androstane receptor (CAR) are used to derive part or all of the portion of points. In another embodiment, the structure coordinates as listed in Table 2 or Table 3 for the backbone atoms of at least about fifteen amino acid residues from a druggable region of human constitutive androstane receptor (CAR) are used to derive part or all of the portion of points. In another embodiment, substantially all of the points are derived from structure coordinates as listed in Table 2 or Table 3. In still another embodiment, the structure coordinates as listed in Table 2 or Table 3 for the atoms of the amino acid residues from any of the above-described druggable regions of human constitutive androstane receptor (CAR) are used to derive part or all of the portion of points.
The present invention also provides a scalable three-dimensional configuration of points, comprising points having a root mean square deviation of less than about 1.5 Å from the three dimensional coordinates as listed in Table 2 or Table 3 for the backbone atoms of at least five amino acid residues, wherein the five amino acid residues are from a druggable region of human constitutive androstane receptor (CAR). In one embodiment, any point-to-point distance, calculated from the three dimensional coordinates as listed in Table 2 or Table 3, between one of the backbone atoms for one of the five amino acid residues and another backbone atom of a different one of the five amino acid residues is not more than about 10 Å.
The present invention also provides a scalable three-dimensional configuration of points comprising points having a root mean square deviation of less than about 1.5 Å from the three dimensional coordinates as listed in Table 2 or Table 3 for the atoms of the amino acid residues from any of the above-described druggable regions of human constitutive androstane receptor (CAR).
The present invention also provides a computer readable storage medium comprising digitally encoded structural data, wherein the data comprise the identity and three-dimensional coordinates as listed in Table 2 or Table 3 for the atoms of the amino acid residues from any of the above-described druggable regions of human constitutive androstane receptor (CAR).
The present invention also provides a scalable three-dimensional configuration of points, wherein the points have a root mean square deviation of less than about 1.5 Å from the three dimensional coordinates as listed in Table 2 or Table 3 for the atoms of the amino acid residues from any of the above-described druggable regions of human constitutive androstane receptor (CAR), wherein up to one amino acid residue in each of the regions can have a conservative substitution thereof.
The present invention also provides a scalable three-dimensional configuration of points derived from a druggable region of a polypeptide, wherein the points have a root mean square deviation of less than about 1.5 Å from the three dimensional coordinates as listed in Table 2 or Table 3 for the backbone atoms of at least ten amino acid residues that participate in the intersubunit contacts of human constitutive androstane receptor (CAR).
The present invention also provides a computer-assisted method for identifying an inhibitor of the activity of human constitutive androstane receptor (CAR), comprising: (a) supplying a computer modeling application with a set of structure coordinates as listed in Table 2 or Table 3 for the atoms of the amino acid residues from any of the above-described druggable regions of human constitutive androstane receptor (CAR) so as to define part or all of a molecule or complex; (b) supplying the computer modeling application with a set of structure coordinates of a chemical entity; and (c) determining whether the chemical entity is expected to bind to or interfere with the molecule or complex. In one embodiment, determining whether the chemical entity is expected to bind to or interfere with the molecule or complex comprises performing a fitting operation between the chemical entity and a druggable region of the molecule or complex, followed by computationally analyzing the results of the fitting operation to quantify the association between the chemical entity and the druggable region. In one embodiment, the method further comprises screening a library of chemical entities.
The present invention also provides a computer-assisted method for designing an inhibitor of constitutive androstane receptor (CAR) activity comprising: (a) supplying a computer modeling application with a set of structure coordinates having a root mean square deviation of less than about 1.5 Å from the structure coordinates as listed in Table 2 or Table 3 for the atoms of the amino acid residues from any of the above-described druggable regions of human constitutive androstane receptor (CAR) so as to define part or all of a molecule or complex; (b) supplying the computer modeling application with a set of structure coordinates for a chemical entity; (c) evaluating the potential binding interactions between the chemical entity and the molecule or complex; (d) structurally modifying the chemical entity to yield a set of structure coordinates for a modified chemical entity; and (e) determining whether the modified chemical entity is an inhibitor expected to bind to or interfere with the molecule or complex, wherein binding to or interfering with the molecule or molecular complex is indicative of potential inhibition of constitutive androstane receptor (CAR) activity. In one embodiment, determining whether the modified chemical entity is an inhibitor expected to bind to or interfere with the molecule or complex comprises performing a fitting operation between the chemical entity and the molecule or complex, followed by computationally analyzing the results of the fitting operation to evaluate the association between the chemical entity and the molecule or complex. In another embodiment, the set of structure coordinates for the chemical entity is obtained from a chemical library.
The present invention also provides a computer-assisted method for designing an inhibitor of constitutive androstane receptor (CAR) activity de novo comprising: (a) supplying a computer modeling application with a set of three-dimensional coordinates derived from the structure coordinates as listed in Table 2 or Table 3 for the atoms of the amino acid residues from any of the above-described druggable regions of human constitutive androstane receptor (CAR) so as to define part or all of a molecule or complex; (b) computationally building a chemical entity represented by a set of structure coordinates; and (c) determining whether the chemical entity is an inhibitor expected to bind to or interfere with the molecule or complex, wherein binding to or interfering with the molecule or complex is indicative of potential inhibition of constitutive androstane receptor (CAR) activity. In one embodiment, determining whether the chemical entity is an inhibitor expected to bind to or interfere with the molecule or complex comprises performing a fitting operation between the chemical entity and a druggable region of the molecule or complex, followed by computationally analyzing the results of the fitting operation to quantify the association between the chemical entity and the druggable region.
The present invention also provides a method for identifying a potential modulator for the prevention or treatment of a disease or disorder, the method comprising: (a) providing the three dimensional structure of a crystallized polypeptide comprising: (1) an amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; (2) an amino acid sequence having at least about 95% identity with the amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; or (3) an amino acid sequence encoded by a polynucleotide that hybridizes under stringent conditions to the complementary strand of a polynucleotide having SEQ ID NO: 1 or SEQ ID NO: 3 and has at least one biological activity of human constitutive androstane receptor (CAR); (b) obtaining a potential modulator for the prevention or treatment of a disease or disorder based on the three dimensional structure of the crystallized polypeptide; (c) contacting the potential modulator with a second polypeptide comprising: (i) an amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; (ii) an amino acid sequence having at least about 95% identity with the amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; or (iii) an amino acid sequence encoded by a polynucleotide that hybridizes under stringent conditions to the complementary strand of a polynucleotide having SEQ ID NO: 1 or SEQ ID NO: 3 and has at least one biological activity of human constitutive androstane receptor (CAR); which second polypeptide can optionally be the same as the crystallized polypeptide; and (d) assaying the activity of the second polypeptide, wherein a change in the activity of the second polypeptide indicates that the compound can be useful for prevention or treatment of a disease or disorder.
The present invention also provides a method for designing a candidate modulator for screening for inhibitors of a polypeptide, the method comprising: (a) providing the three dimensional structure of a druggable region of a polypeptide comprising (1) an amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; (2) an amino acid sequence having at least about 95% identity with the amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; or (3) an amino acid sequence encoded by a polynucleotide that hybridizes under stringent conditions to the complementary strand of a polynucleotide having SEQ ID NO: 1 or SEQ ID NO: 3 and has at least one biological activity of human constitutive androstane receptor (CAR); and (b) designing a candidate modulator based on the three dimensional structure of the druggable region of the polypeptide.
The present invention also provides a method for identifying a potential modulator of a polypeptide from a database, the method comprising: (a) providing the three-dimensional coordinates for a plurality of the amino acids of a polypeptide comprising (1) an amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; (2) an amino acid sequence having at least about 95% identity with the amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; or (3) an amino acid sequence encoded by a polynucleotide that hybridizes under stringent conditions to the complementary strand of a polynucleotide having SEQ ID NO: 1 or SEQ ID NO: 3 and has at least one biological activity of human constitutive androstane receptor (CAR); (b) identifying a druggable region of the polypeptide; and (c) selecting from a database at least one potential modulator comprising three dimensional coordinates which indicate that the modulator can bind or interfere with the druggable region. In one embodiment, the modulator is a small molecule.
The present invention also provides a method for preparing a potential modulator of a druggable region contained in a polypeptide, the method comprising: (a) using the atomic coordinates for the backbone atoms of at least about six amino acid residues from a polypeptide of SEQ ID NO: 4, with a root mean square deviation from the backbone atoms of the amino acid residues of not more than 1.5 Å, to generate one or more three-dimensional structures of a molecule comprising a druggable region from the polypeptide; (b) employing one or more of the three dimensional structures of the molecule to design or select a potential modulator of the druggable region; and (c) synthesizing or obtaining the modulator.
The present invention also provides an apparatus for determining whether a compound is a potential modulator of a polypeptide, the apparatus comprising: (a) a memory that comprises: (i) the three dimensional coordinates and identities of at least about fifteen atoms from a druggable region of a polypeptide comprising (1) an amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; (2) an amino acid sequence having at least about 95% identity with the amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; or (3) an amino acid sequence encoded by a polynucleotide that hybridizes under stringent conditions to the complementary strand of a polynucleotide having SEQ ID NO: 1 or SEQ ID NO: 3 and has at least one biological activity of human constitutive androstane receptor (CAR); (ii) executable instructions; and (b) a processor that is capable of executing instructions to: (i) receive three-dimensional structural information for a candidate modulator; (ii) determine if the three-dimensional structure of the candidate modulator is complementary to the three dimensional coordinates of the atoms from the druggable region; and (iii) output the results of the determination.
The present invention also provides a method for making an inhibitor of constitutive androstane receptor (CAR) activity, the method comprising chemically or enzymatically synthesizing a chemical entity to yield an inhibitor of constitutive androstane receptor (CAR) activity, the chemical entity having been identified during a computer-assisted process comprising supplying a computer modeling application with a set of structure coordinates of a molecule or complex, the molecule or complex comprising at least a portion of at least one druggable region from human constitutive androstane receptor (CAR); supplying the computer modeling application with a set of structure coordinates of a chemical entity; and determining whether the chemical entity is expected to bind or to interfere with the molecule or complex at a druggable region, wherein binding to or interfering with the molecule or complex is indicative of potential inhibition of constitutive androstane receptor (CAR) activity.
The present invention also provides a computer readable storage medium comprising digitally encoded data, wherein the data comprises structural coordinates for a druggable region that is structurally homologous to the structure coordinates as listed in Table 2 or Table 3 for a druggable region of human constitutive androstane receptor (CAR).
The present invention also provides a computer readable storage medium comprising digitally encoded structural data, wherein the data comprise a majority of the three-dimensional structure coordinates as listed in Table 2 or Table 3. In one embodiment, the computer readable storage medium further comprises the identity of the atoms for the majority of the three-dimensional structure coordinates as listed in Table 2 or Table 3. In another embodiment, the data comprise substantially all of the three-dimensional structure coordinates as listed in Table 2 or Table 3.
The present invention also provides a method for building a model for an activated conformation of a constitutive androstane receptor (CAR), the method comprising: (a) employing coordinates for CAR residues 107 to 332 as shown in Table 2; (b) rotating and translating an X-ray structure of the Vitamin D receptor (VDR), so as to superimpose its core backbone atoms onto corresponding atoms from CAR; (c) combining a superimposed VDR AF2 helix, residues 416423, with residues 107-332 from CAR from step (a), to provide a starting model for residues 107-332 and 341-348 of CAR in the activated conformation; (d) computationally mutating Val418, Leu4l9, Val421, Phe422 and Gly423 in the VDR AF2 helix to corresponding amino acids in a CAR AF2 helix, wherein the corresponding amino acids in the CAR AF2 helix are Leu343, Gln344, Ile346, Cys347 and Ser348, respectively; and (e) adjusting the conformations of the mutated amino acid side chains in residues 343, 344, and 346-348 of the AF2 helix of CAR to avoid overlaps, wherein the adjusting is accomplished by one of manual manipulation and conformational search and energy minimization. In one embodiment, the method further comprises modeling a CAR AF2 linker region, residues 333-340, by using a computational loop modeling technique.
Accordingly, it is an object of the present invention to provide a three-dimensional structure of the ligand-binding domain of CAR in complex with a ligand. The object is achieved in whole or in part by the present invention.
An object of the invention having been stated hereinabove, other objects will be evident as the description proceeds, when taken in connection with the accompanying Drawings and Examples as described hereinbelow.
SEQ ID NO: 1 is a DNA sequence encoding a full-length human CAR polypeptide.
SEQ ID NO: 2 is an amino acid sequence of a full-length human CAR polypeptide.
SEQ ID NO: 3 is a DNA sequence encoding human CAR residues 103-340, the ligand-binding domain of CAR polypeptide.
SEQ ID NO: 4 is an amino acid sequence of residues 103-340, the ligand-binding domain of CAR polypeptide.
SEQ ID NO: 5 is a His tag amino acid sequence.
SEQ ID NO: 6 is a DNA sequence of a primer used in combination with the primer of SEQ ID NO: 7 to amplify a DNA fragment encoding amino acid residues 103-348 of a human CAR polypeptide. In addition to amplifying these coding nucleotides, the primer also includes sequences that will result in the amplified product (a) encoding a His tag as in SEQ ID NO: 5; and (b) having an NdeI endonuclease restriction site (CATATG) just 5′ to the His tag-encoding residues.
SEQ ID NO: 7 is a DNA sequence of a primer used in combination with the primer of SEQ ID NO: 6 to amplify a DNA fragment encoding residues 103-348 of a human CAR polypeptide. The sequence of this primer includes a BamHI endonuclease restriction site (GGATCC) 3′ to the human CAR polypeptide coding residues. When this primer is used in combination with the primer of SEQ ID NO: 6, the amplified product will have the following arrangement of features: NdeI site—His tag—nucleotides encoding human CAR amino acids 103 to 348—BamHI site.
Until disclosure of the present invention presented herein, the ability to obtain crystalline forms of a CAR LBD, particularly in complex with an antagonist ligand, has not been realized. And until disclosure of the present invention presented herein, a detailed three-dimensional crystal structure of an unliganded CAR polypeptide or a CAR polypeptide in complex with a ligand has not been solved.
In addition to providing structural information, crystalline polypeptides provide other advantages. For example, the crystallization process itself further purifies the polypeptide, and satisfies one of the classical criteria for homogeneity. In fact, crystallization frequently provides unparalleled purification quality, removing impurities that are not removed by other purification methods such as HPLC, dialysis, conventional column chromatography, etc. Moreover, crystalline polypeptides are often stable at ambient temperatures and free of protease contamination and degradation associated with solution storage. Crystalline polypeptides can also be useful as pharmaceutical preparations. Finally, crystallization techniques are generally free of problems such as denaturation associated with other stabilization methods (e.g., lyophilization).
Once crystallization has been accomplished, crystallographic data provides useful structural information that can assist the design of compounds that can serve as agonists or antagonists, as described herein below. In addition, the crystal structure provides information that can be used to map the molecular surface of the ligand-binding domain of CAR. A small non-peptide molecule designed to mimic portions of this surface could serve as a modulator of CAR activity.
I. Definitions
Before the present proteins, nucleotide sequences, and methods are described, it is understood that this invention is not limited to the particular methodology, protocols, cell lines, vectors, and reagents described, as these can vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention, the invention being defined by the claims.
Unless defined otherwise, all technical and scientific terms used herein are intended to have their ordinary meanings as understood by one of ordinary skill in the art to which this invention pertains. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, representative methods, devices, and materials are now described. All publications mentioned herein are incorporated by reference for the purpose of describing the cell lines, vectors, reagents, and methodologies they disclose.
Following long-standing patent law convention, the articles “a” and “an” are used herein to refer to one or to more than one (i.e., to at least one) of the grammatical object of the article. By way of example, “an element” means one element or more than one element.
As used herein, the term “AF2 helix” refers to a short alpha-helix, usually including 5-8 residues, located at the C-terminal end of a LBD sequence, that can usually adopt multiple positions, orientations, and conformations in the structure, and which is involved in binding to coactivators. In the hypothetical activated conformation of CAR, the AF2 helix is expected to include residues 341 to 347. These residues do not adopt an alpha-helical conformation in the structure of CAR bound to Compound 1.
As used herein, the terms “Compound 1” and “Formula (A)” are used interchangeably and refer to 2-(benzhydrylamino)-1-(2-phenylethyl)-1H-benzimidazole-6-carboxamide.
As used herein, the term “AF2 glutamate” refers to a glutamate residue in the AF2 helix that can make hydrogen bond interactions with the exposed NH groups of the LXXLL-containing peptide from a coactivator if the AF2 helix is in the active position. In CAR, the AF2 glutamate is residue number 345.
As used herein, the terms “activated”, “active conformation”, and “activated conformation” of an LBD are used interchangeably and refer to a conformation where the AF2 helix is in the active position, thereby placing the AF2 glutamate residue in a position and orientation that creates a charge clamp that can recruit coactivator peptides. Similarly, the terms “active position of the AF2 helix” and “active conformation of the AF2 helix” are used interchangeably and mean an AF2 helix having a position and/or orientation similar to that of the AF2 helix in the PPARg/SRC-1/rosiglitazone structure of Nolte et al., 1998, allowing the AF2 glutamate residue to make interactions with the exposed NH groups of a coactivator peptide. The position and/or orientation of the AF2 helix in an NR structure can be compared with that of the AF2 helix in another NR structure by rotating and/or translating one structure so as to superimpose the backbone atoms of helices 1 through 10 onto the corresponding atoms of the other structure, where corresponding residues are determined by sequence alignment. If, after superimposition, a majority of the backbone atoms of the core of the AF2 helix lie within 2.0 angstroms of the corresponding atoms from the PAPRg/SRC-1/rosiglitazone structure, then the AF2 helix is defined as being in an active position or active conformation.
Other examples of a nuclear receptor where the AF2 helix is in an “active position” include the X-ray structures of the estrogen receptor α (ERα) bound to estradiol (Brzozowski et al., 1997) and diethylstilbesterol (DES) (Shiau et al., 1998). Examples of a nuclear receptor where the AF2 helix is not in an “active position” are the X-ray structures of the estrogen receptor a (ERα) bound to raloxifene (Brzozowski et al., 1997) and tamoxifen (Shiau et al., 1998). Binding of a coactivator, and AF2-dependent activation of gene transcription, normally requires that the AF2 helix be in the “active position” (Nolte et al., 1998; Shiau et al., 1998). This creates a “charge-clamp” structure that holds the coactivator in its required position (Nolte et al., 1998).
As used herein, the terms “repressed”, “inactive conformation”, and “repressed conformation” of an LBD are used interchangeably and refer to a conformation where the AF2 helix is not in the active position, and where the AF2 glutamate residue is not in a position that could create the charge clamp that can recruit coactivator peptides.
As used herein, the term “agonist” refers to an agent that supplements or potentiates the biological activity of a functional CAR gene or protein, or of a polypeptide encoded by a gene that is up- or down-regulated by a CAR polypeptide and/or a polypeptide encoded by a gene that contains a CAR binding site or response element in its promoter region. An agent is also an agonist when the changes in gene expression, considered over many genes, are similar in direction to those induced by other agents that are commonly regarded as agonists. In one embodiment, an agonist of CAR is an androstane.
As used herein, the term “antagonist” refers to an agent that decreases or inhibits the biological activity of a functional gene or protein (for example, a functional CAR gene or protein), or that supplements or potentiates the biological activity of a naturally occurring or engineered non-functional gene or protein (for example, a non-functional CAR gene or protein). Alternatively, an antagonist can decrease or inhibit the biological activity of a functional gene or polypeptide encoded by a gene that is up- or down-regulated by a CAR polypeptide and/or contains a CAR binding site or response element in its promoter region. An antagonist can also supplement or potentiate the biological activity of a naturally occurring or engineered non-functional gene or polypeptide encoded by a gene that is up- or down-regulated by a CAR polypeptide, and/or contains a CAR binding site or response element in its promoter region. An agent is also an antagonist when the changes in gene expression, considered over many genes, are opposite in direction to those induced by other agents that are commonly regarded as agonists.
As used herein, the terms “α-helix” and “alpha-helix” are used interchangeably and refer to a conformation of a polypeptide chain wherein the polypeptide backbone is wound around the long axis of the molecule in a left-handed or right-handed direction, and the R groups of the amino acids protrude outward from the helical backbone, wherein the repeating unit of the structure is a single turn of the helix, which extends about 0.56 nm along the long axis.
As used herein, the terms “amino acid”, “amino acid residue”, and “residue” are used interchangeably and refer to an amino acid formed upon chemical digestion (hydrolysis) of a peptide or polypeptide at its peptide linkages. Amino acids can also be synthesized individually or as components of a peptide. In one embodiment, the amino acid residues described herein are in the “L” isomeric form. However, residues in the “D” isomeric form can be substituted for any L-amino acid residue, provided that the desired functional property is retained by the polypeptide. In the context of an amino acid, NH2 refers to the free amino group present at the amino terminus of a polypeptide, although some amino acids can have NH2 groups at other positions in the amino acid. COOH refers to the free carboxy group present at the carboxy terminus of a polypeptide. In keeping with standard polypeptide nomenclature, abbreviations for amino acid residues are presented above. The term “amino acid” is intended to embrace all molecules, whether natural or synthetic, which include both an amino functionality and an acid functionality and capable of being included in a polymer of naturally occurring amino acids. Exemplary amino acids include naturally occurring amino acids; analogs, derivatives and congeners thereof; amino acid analogs having variant side chains; and all stereoisomers of any of the foregoing.
It is noted that amino acid residue sequences represented herein by formulae have a left-to-right orientation in the conventional direction of amino terminus to carboxy terminus. In addition, the terms “amino acid”, “amino acid residue”, and “residue” are broadly defined to include the amino acids listed in the above table and modified or unusual amino acids. Furthermore, it is noted that a dash at the beginning or end of an amino acid residue sequence indicates a peptide bond to a further sequence of one or more amino acid residues or a covalent bond to an amino-terminal group such as NH2 or acetyl or to a carboxy-terminal group such as COOH.
As used herein, the terms “β-sheet” and “beta-sheet” are used interchangeably and refer to the conformation of a polypeptide chain stretched into an extended zigzag conformation. Portions of polypeptide chains that run “parallel” all run in the same direction. Polypeptide chains that are “anti-parallel” run in the opposite direction from the parallel chains or from each other.
The term “binding” refers to an association, which can be a stable association, between two molecules, i.e., between a polypeptide of the invention and a binding partner, due to, for example, electrostatic, hydrophobic, ionic, and/or hydrogen-bond interactions under physiological conditions.
As used herein, the terms “binding pocket of the CAR ligand-binding domain”, “CAR ligand-binding pocket” and “CAR binding pocket” are used interchangeably, and refer to the large cavity within the CAR ligand-binding domain where a ligand (e.g. Compound 1) binds. This cavity can be empty, or can contain water molecules or other molecules from the solvent, or can contain ligand atoms. The “main” binding pocket includes the region of space not occupied by atoms of CAR that is approximately encompassed or bounded by residues Phe132, Phe161, Ile164, Asn165, Thr166, Met168, Val169, Ala198, Val199, Cys202, His203, Leu206, Phe217, Tyr224, Thr225, Ile226, Glu227, Asp228, Gly229, Ala230, Phe234, Phe238, Leu239, Leu242, Phe243, His246, Tyr326, Ile330, Leu336, Ser337, Met339, and Met340. The binding pocket also includes small regions near to and contiguous with the “main” binding pocket that not occupied by atoms of CAR.
As used herein the term “biological activity” refers to any biochemical function of a biological molecule. A biological activity includes, but is not limited to, an interaction with another biological molecule (for example, a polypeptide or a nucleic acid, or a combination thereof). As such, a biological activity results in a biochemical effect including, but not limited to the initiation or inhibition of transcription of a gene.
The term “complex” refers to an association between at least two moieties (i.e. chemical or biochemical) that have an affinity for one another. Examples of complexes include associations between antigen/antibodies, lectin/avidin, target polynucleotide/probe oligonucleotide, antibody/anti-antibody, receptor/ligand, enzyme/ligand, polypeptide/polypeptide, polypeptide/polynucleotide, polypeptide/co-factor, polypeptide/substrate, polypeptide/inhibitor, polypeptide/small molecule, and the like. “Member of a complex” refers to one moiety of the complex, such as an antigen or ligand. “Protein complex” or “polypeptide complex” refers to a complex comprising at least one polypeptide.
The term “conserved residue” refers to an amino acid that is a member of a group of amino acids having certain common properties. The term “conservative amino acid substitution” refers to the substitution (conceptually or otherwise) of an amino acid from one such group with a different amino acid from the same group. A functional way to define common properties between individual amino acids is to analyze the normalized frequencies of amino acid changes between corresponding proteins of homologous organisms (Schulz & Schirmer, 1979). According to such analyses, groups of amino acids can be defined where amino acids within a group exchange preferentially with each other, and therefore resemble each other most in their impact on the overall protein structure (Schulz & Schirmer, 1979). Representative examples of sets of amino acid groups defined in this manner include: (i) a charged group, consisting of Glu and Asp, Lys, Arg and His, (ii) a positively-charged group, consisting of Lys, Arg and His, (iii) a negatively-charged group, consisting of Glu and Asp, (iv) an aromatic group, consisting of Phe, Tyr and Trp, (v) a nitrogen ring group, consisting of His and Trp, (vi) a large aliphatic nonpolar group, consisting of Val, Leu and Ile, (vii) a slightly-polar group, consisting of Met and Cys, (viii) a small-residue group, consisting of Ser, Thr, Asp, Asn, Gly, Ala, Glu, Gln and Pro, (ix) an aliphatic group consisting of Val, Leu, Ile, Met and Cys, and (x) a small hydroxyl group consisting of Ser and Thr.
As used herein, the term “DNA segment” refers to a DNA molecule that has been isolated free of total genomic DNA of a particular species. In one embodiment, a DNA segment encoding a CAR polypeptide refers to a nucleic acid comprising SEQ ID NO: 1. In another embodiment, a DNA segment encoding a CAR polypeptide refers to a nucleic acid comprising SEQ ID NO: 3. DNA segments can comprise a portion of a recombinant vector, including, for example, a plasmid, a cosmid, a phage, a virus, and the like.
As used herein, the term “DNA sequence encoding a CAR polypeptide” refers to one or more coding sequences within a particular individual. Moreover, certain differences in nucleotide sequences can exist between individual organisms, which are called alleles. It is possible that such allelic differences might or might not result in differences in amino acid sequence of the encoded polypeptide yet still encode a protein with the same biological activity. As is well known, genes for a particular polypeptide can exist in single or multiple copies within the genome of an individual. Such duplicate genes can be identical or can have certain modifications, including nucleotide substitutions, additions, or deletions, all of which still code for polypeptides having substantially the same activity.
The term “domain”, when used in connection with a polypeptide, refers to a specific region within the polypeptide that comprises a particular structure or mediates a particular function. In the typical case, a domain of a polypeptide of the invention is a fragment of the polypeptide. In certain instances, a domain is a structurally stable domain, as evidenced, for example, by mass spectroscopy, or by the fact that a modulator can bind to a druggable region of the domain. In one embodiment, a domain of a CAR polypeptide is a ligand-binding domain. In another embodiment, a domain of a CAR polypeptide is a DNA-binding domain.
The term “druggable region”, when used in reference to a polypeptide, nucleic acid, complex and the like, refers to a region of the molecule that is a target or is a likely target for binding a modulator. For a polypeptide, a druggable region generally refers to a region wherein several amino acids of a polypeptide would be capable of interacting with a modulator or other molecule. For a polypeptide or complex thereof, exemplary druggable regions including binding pockets and sites, enzymatic active sites, interfaces between domains of a polypeptide or complex, surface grooves or contours or surfaces of a polypeptide or complex which are capable of participating in interactions with another molecule. In certain instances, the interacting molecule is another polypeptide, which can be naturally occurring. In other instances, the druggable region is on the surface of the molecule. In one embodiment, a druggable region of a CAR polypeptide comprises the binding site defined by amino acid residues 103-340. In another embodiment, a druggable region of a CAR polypeptide comprises amino acid residues and surfaces of the CAR polypeptide that interact with a RXR polypeptide during CAR-RXR heterodimer formation. In another embodiment, a druggable region of a CAR polypeptide comprises the AF2 helix. In another embodiment, a druggable region of a CAR polypeptide comprises Glu345. In still another embodiment, a druggable region of a CAR polypeptide comprises a DNA-binding domain.
Druggable regions can be described and characterized in a number of ways. For example, a druggable region can be characterized by some or all of the amino acids that make up the region, or the backbone atoms thereof, or the side chain atoms thereof (optionally with or without the Cα atoms). Alternatively, in certain instances, the volume of a druggable region corresponds to that of a carbon based molecule of at least about 200 atomic mass units (amu) and often up to about 800 amu. In other instances, it will be appreciated that the volume of such region can correspond to a molecule of at least about 600 amu and often up to about 1600 amu or more.
Alternatively, a druggable region can be characterized by comparison to other regions on the same or other molecules. For example, the term “affinity region” refers to a druggable region on a molecule (such as a polypeptide of the invention) that is present in several other molecules, in so much as the structures of the same affinity regions are sufficiently the same so that they are expected to bind the same or related structural analogs. An example of an affinity region is an ATP-binding site of a protein kinase that is found in several protein kinases (whether or not of the same origin). Another example of an affinity region is a DNA-binding domain: for example, the DNA-binding domain of a CAR polypeptide.
In contrast to an affinity region, the term “selectivity region” refers to a druggable region of a molecule that can not be found on other molecules, in so much as the structures of different selectivity regions are sufficiently different so that they are not expected to bind the same or related structural analogs. An exemplary selectivity region is a catalytic domain of a protein kinase that exhibits specificity for one substrate. In certain instances, a single modulator can bind to the same affinity region across a number of proteins that have a substantially similar biological function, whereas the same modulator can bind to only one selectivity region of one of those proteins.
Continuing with examples of different druggable regions, the term “undesired region” refers to a druggable region of a molecule that upon interacting with another molecule results in an undesirable affect. For example, a binding site that oxidizes the interacting molecule and thereby results in increased toxicity for the oxidized molecule can be deemed an “undesired region”. Other examples of potential undesired regions include regions that upon interaction with a drug decrease the membrane permeability of the drug, increase the excretion of the drug, or increase the blood brain transport of the drug. It can be the case that, in certain circumstances, an undesired region will no longer be deemed an undesired region because the affect of the region will be favorable, i.e., a drug intended to treat a brain condition would benefit from interacting with a region that resulted in increased blood brain transport, whereas the same region could be deemed undesirable for drugs that were not intended to be delivered to the brain.
When used in reference to a druggable region, the “selectivity” or “specificity” of a molecule such as a modulator to a druggable region can be used to describe the binding between the molecule and a druggable region. For example, the selectivity of a modulator with respect to a druggable region can be expressed by comparison to another modulator, using the respective values of Kd (i.e., the dissociation constants for each modulator-druggable region complex) or, in cases where a biological effect is observed below the Kd, the ratio of the respective EC50's (i.e., the concentrations that produce 50% of the maximum response for the modulator interacting with each druggable region).
As used herein, the term “expression” generally refers to the cellular processes by which a biologically active polypeptide is produced. As such, the term “expression” generally includes those cellular processes that begin with transcription and end with the production of a functional polypeptide. As used herein, “expression” is also intended to refer to cellular processes by which a polypeptide is produced that would otherwise be functional except for the presence of mutations in the nucleotide sequence encoding it. Consistent with this usage, “expression” includes, but is not limited to, such processes as transcription, translation, post-translational modification, and transport of a polypeptide.
A “fusion protein” or “fusion polypeptide” refers to a chimeric protein as that term is known in the art and can be constructed using methods known in the art. In many examples of fusion proteins, there are two different polypeptide sequences, and in certain cases, there can be more. The sequences can be linked in frame. A fusion protein can include a domain that is found (albeit in a different protein) in an organism that also expresses the first protein, or it can be an “interspecies”, “intergenic”, etc. fusion expressed by different kinds of organisms. In various embodiments, the fusion polypeptide can comprise one or more amino acid sequences linked to a first polypeptide. In the case where more than one amino acid sequence is fused to a first polypeptide, the fusion sequences can be multiple copies of the same sequence, or alternatively, can be different amino acid sequences. The fusion polypeptides can be fused to the N-terminus, the C-terminus, or the N— and C-terminus of the first polypeptide. Exemplary fusion proteins include polypeptides comprising a glutathione S-transferase tag (GST-tag), histidine tag (His-tag), an immunoglobulin domain, or an immunoglobulin-binding domain.
As used herein, the term “gene” is used for simplicity to refer to a nucleotide sequence that encodes a protein, a polypeptide, or a peptide. As such, the term “gene” refers to a nucleic acid comprising an open reading frame encoding a polypeptide having exon sequences and, optionally, intron sequences. The term “intron” refers to a DNA sequence present in a given gene that is not translated into protein and is generally found between exons. As will be understood by those of skill in the art, this functional term includes both genomic sequences and cDNA sequences. Representative embodiments of such sequences are disclosed herein.
The term “having substantially similar biological activity”, when used in reference to two polypeptides, refers to a biological activity of a first polypeptide which is substantially similar to at least one of the biological activities of a second polypeptide. A substantially similar biological activity means that the polypeptides carry out a similar function, i.e., a similar enzymatic reaction or a similar physiological process, etc. For example, two homologous proteins can have a substantially similar biological activity if they are involved in a similar enzymatic reaction, i.e., they are both kinases which catalyze phosphorylation of a substrate polypeptide, however, they can phosphorylate different regions on the same protein substrate or different substrate proteins altogether. Alternatively, two homologous proteins can also have a substantially similar biological activity if they are both involved in a similar physiological process, i.e., regulation of transcription. For example, two proteins can be transcription factors, however, they can bind to different DNA sequences or bind to different polypeptide interactors. Substantially similar biological activities can also be associated with proteins carrying out a similar structural role, for example, two membrane proteins.
As used herein, the term “interact” refers to detectable interactions between molecules, such as can be detected using, for example, a yeast two-hybrid assay. The term “interact” is also meant to include “binding” interactions between molecules. Interactions include, but are not limited to protein-protein, protein-nucleic acid, and protein-small molecule interactions. These interactions can be in the form of covalent or non-covalent interactions including, but not limited to ionic, hydrogen bonding, and van der Waals interactions.
As used herein, the term “isolated” refers to a nucleic acid substantially free of other nucleic acids, proteins, lipids, carbohydrates, or other materials with which it can be associated, such association being either in cellular material or in a synthesis medium. The term can also be applied to polypeptides, in which case the polypeptide is substantially free of nucleic acids, carbohydrates, lipids, and other undesired polypeptides. The term “isolated polypeptide” refers to a polypeptide, in certain embodiments prepared from recombinant DNA or RNA, or of synthetic origin, or some combination thereof, which (1) is not associated with proteins that it is normally found with in nature, (2) is isolated from the cell in which it normally occurs, (3) is isolated free of other proteins from the same cellular source, (4) is expressed by a cell from a different species, or (5) does not occur in nature.
The term “isolated nucleic acid” refers to a polynucleotide of genomic, cDNA, or synthetic origin or some combination there of, which (1) is not associated with the cell in which the “isolated nucleic acid” is found in nature, or (2) is operably linked to a polynucleotide to which it is not linked in nature.
The terms “label” or “labeled” refer to incorporation or attachment, optionally covalently or non-covalently, of a detectable marker into a molecule, such as a polypeptide. Various methods of labeling polypeptides are known in the art and can be used. Examples of labels for polypeptides include, but are not limited to the following: radioisotopes, fluorescent labels, heavy atoms, enzymatic labels or reporter genes, chemiluminescent groups, biotinyl groups, predetermined polypeptide epitopes recognized by a secondary reporter (i.e., leucine zipper pair sequences, binding sites for secondary antibodies, metal binding domains, epitope tags). Examples and use of such labels are well known by the skilled artisan. In some embodiments, spacer arms of various lengths can be attached to labels to reduce potential steric hindrance.
The term “mammal” is known in the art, and exemplary mammals include humans, primates, bovines, porcines, canines, felines, and rodents (i.e., mice and rats).
The term “modulation”, when used in reference to a functional property or biological activity or process (i.e., enzyme activity or receptor binding), refers to the capacity to up regulate (i.e., activate or stimulate), down regulate (i.e., inhibit or suppress), or otherwise change a quality of such property, activity, or process. In certain instances, such regulation can be contingent on the occurrence of a specific event, such as activation of a signal transduction pathway, and/or can be manifest only in particular cell types.
The term “modulator” refers to a polypeptide, nucleic acid, macromolecule, complex, molecule, small molecule, compound, species, or the like (naturally-occurring or non-naturally-occurring), or an extract made from biological materials such as bacteria, plants, fungi, or animal cells or tissues, that can be capable of causing modulation. Modulators can be evaluated for potential activity as inhibitors or activators (directly or indirectly) of a functional property, biological activity or process, or combination thereof, (i.e., agonist, partial antagonist, partial agonist, inverse agonist, antagonist, anti-microbial agents, inhibitors of microbial infection or proliferation, and the like) by inclusion in assays. In such assays, many modulators can be screened at one time. The activity of a modulator can be known, unknown, or partially known.
As used herein, the term “molecular replacement” refers to a method that involves generating a preliminary model of the wild-type CAR ligand-binding domain, or a CAR mutant crystal the structure for which coordinates are unknown, by orienting and positioning a molecule the structure for which coordinates are known (e.g., the vitamin D receptor; VDR) within the unit cell of the unknown crystal so as best to account for the observed diffraction pattern of the unknown crystal. Phases can then be calculated from this model and combined with the observed amplitudes to give an approximate Fourier synthesis of the structure the coordinates for which are unknown. This, in turn, can be subjected to any of the several forms of refinement known in the art to provide a final, accurate structure of the unknown crystal (see e.g. Lattman, 1985; Rossmann, 1972). Using the structure coordinates of the ligand-binding domain of CAR provided by this invention, molecular replacement can be used to determine the structure coordinates of a crystal of a mutant or of a homologue of the CAR ligand-binding domain, or of a different crystal form of the CAR ligand-binding domain.
The term “motif” refers to an amino acid sequence that is commonly found in a protein of a particular structure or function. Typically, a consensus sequence is defined to represent a particular motif. The consensus sequence need not be strictly defined and can contain positions of variability, degeneracy, variability of length, etc. The consensus sequence can be used to search a database to identify other proteins that can have a similar structure or function due to the presence of the motif in its amino acid sequence. For example, on-line databases can be searched with a consensus sequence in order to identify other proteins containing a particular motif. Various search algorithms and/or programs can be used, including FASTA, BLAST, or ENTREZ. FASTA and BLAST are available as a part of the GCG sequence analysis package (Accelrys, Inc., San Diego, Calif., United States of America). ENTREZ is available through the National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Md., United States of America.
As used herein, the term “mutation” carries its traditional connotation and refers to a change, inherited, naturally occurring, or introduced, in a nucleic acid or polypeptide sequence, and is used in its sense as generally known to those of skill in the art.
The term “naturally occurring”, as applied to an object, refers to the fact that an object can be found in nature. For example, a polypeptide or polynucleotide sequence that is present in an organism (including bacteria) that can be isolated from a source in nature and which has not been intentionally modified by man in the laboratory is naturally occurring.
The term “nucleic acid” refers to a polymeric form of nucleotides, either ribonucleotides or deoxynucleotides or a modified form of either type of nucleotide. The terms should also be understood to include, as equivalents, analogs of either RNA or DNA made from nucleotide analogs, and, as applicable to the embodiment being described, single-stranded (such as sense or antisense) and double-stranded polynucleotides.
The term “nucleic acid of the invention” refers to a nucleic acid encoding a polypeptide of the invention, i.e., a nucleic acid comprising a sequence consisting of, or consisting essentially of, the polynucleotide sequence set forth in SEQ ID NO: 1 or SEQ ID NO: 3. A nucleic acid of the invention can comprise all, or a portion of: the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 3; a nucleotide sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98% or 99% identical to SEQ ID NO: 1 or SEQ ID NO: 3; a nucleotide sequence that hybridizes under stringent conditions to SEQ ID NO: 1 or SEQ ID NO: 3; nucleotide sequences encoding polypeptides that are functionally equivalent to polypeptides of the invention; nucleotide sequences encoding polypeptides at least about 60%, 70%, 80%, 85%, 90%, 95%, 98%, 99% homologous or identical with an amino acid sequence of SEQ ID NO: 2 or SEQ ID NO: 4; nucleotide sequences encoding polypeptides having an activity of a polypeptide of the invention and having at least about 60%, 70%, 80%, 85%, 90%, 95%, 98%, 99% or more homology or identity with SEQ ID NO: 2 or SEQ ID NO: 4; nucleotide sequences that differ by 1 to about 2, 3, 5, 7, 10, 15, 20, 30, 50, 75 or more nucleotide substitutions, additions or deletions, such as allelic variants, of SEQ ID NO: 1 and SEQ ID NO: 3; nucleic acids derived from and evolutionarily related to SEQ ID NO: 1 or SEQ ID NO: 3; and complements of and nucleotide sequences resulting from the degeneracy of the genetic code, for all of the foregoing and other nucleic acids of the invention. Nucleic acids of the invention also include homologs, i.e., orthologs and paralogs, of SEQ ID NO: 1 or SEQ ID NO: 3 and also variants of SEQ ID NO: 1 or SEQ ID NO: 3 which have been codon optimized for expression in a particular organism (i.e., host cell).
The term “operably linked”, when describing the relationship between two nucleic acid regions, refers to a juxtaposition wherein the regions are in a relationship permitting them to function in their intended manner. For example, a control sequence “operably linked” to a coding sequence is ligated in such a way that expression of the coding sequence is achieved under conditions compatible with the control sequences, such as when the appropriate molecules (i.e., inducers and polymerases) are bound to the control or regulatory sequence(s).
As used herein, “orthorhombic unit cell” refers to a unit cell wherein a≠b≠c, and α=β=γ=900. The vectors a, b, and c describe the unit cell edges and the angles α, β, and γ describe the unit cell angles.
As used herein, the term “CAR” refers to any polypeptide with an amino acid sequence that can be aligned with at least one of human, mouse, or rat CAR, such that at least 50% of the amino acids are identical to the corresponding amino acid in the human, mouse, or rat CAR. The term “CAR” also encompasses nucleic acids for which the corresponding translated protein sequence can be considered to be a CAR. The term “CAR” includes vertebrate homologs of CAR family members including, but not limited to mammalian and avian homologs. Representative mammalian homologs of CAR family members include, but are not limited to murine and human homologs.
As used herein, the terms “CAR gene” and “recombinant CAR gene” are used interchangeably and refer to a nucleic acid molecule comprising an open reading frame encoding a CAR polypeptide, including both exon and (optionally) intron sequences.
As used herein, the terms “CAR gene product”, “CAR protein”, “CAR polypeptide”, and “CAR peptide” are used interchangeably and refer to peptides having amino acid sequences which are substantially identical to native CAR amino acid sequences from the organism of interest and which are biologically active in that they comprise all or a part of the amino acid sequence of a CAR polypeptide, or cross-react with antibodies raised against a CAR polypeptide, or retain all or some of the biological activity (e.g., DNA or ligand-binding ability and/or dimerization ability) of the native amino acid sequence or protein. Such biological activity can include immunogenicity.
As used herein, the terms “CAR gene product”, “CAR protein”, “CAR polypeptide”, and “CAR peptide” are used interchangeably and refer to a subtype of the CAR family. In one embodiment, a CAR gene product is CAR. In another embodiment, a CAR gene product comprises the amino acid sequence of SEQ ID NO: 2.
As used herein, the terms “CAR gene product”, “CAR protein”, “CAR polypeptide”, and “CAR peptide” also include analogs of a CAR polypeptide. By “analog” is intended that a DNA or peptide sequence can contain alterations relative to the sequences disclosed herein, yet retain all or some of the biological activity of those sequences. Analogs can be derived from genomic nucleotide sequences as are disclosed herein or those from other organisms, or can be created synthetically. Those skilled in the art will appreciate that other analogs, as yet undisclosed or undiscovered, can be used to design and/or construct CAR analogs. There is no need for a “CAR gene product”, “CAR protein”, “CAR polypeptide”, or “CAR peptide” to comprise all or substantially all of the amino acid sequence of a CAR polypeptide gene product. Shorter or longer sequences are anticipated to be of use in the invention; shorter sequences are herein referred to as “segments”. Thus, the terms “CAR gene product”, “CAR protein”, “CAR polypeptide”, and “CAR peptide” also include fusion or recombinant CAR polypeptides and proteins comprising sequences of the present invention. Methods of preparing such proteins are disclosed herein and are known in the art.
The term “phenotype” refers to the entire physical, biochemical, and physiological makeup of a cell, i.e., having any one trait or any group of traits.
As used herein, the term “polypeptide” refers to any polymer comprising any of the 20 protein amino acids, regardless of its size. Although “protein” is often used in reference to relatively large polypeptides and “peptide” is often used in reference to small polypeptides, usage of these terms in the art overlaps and varies. The term “polypeptide” as used herein refers to peptides, polypeptides, and proteins, unless otherwise noted. As used herein, the terms “protein”, “polypeptide” and “peptide” are used interchangeably herein when referring to a gene product. The term “polypeptide”, and the terms “protein” and “peptide” which are used interchangeably herein, refers to a polymer of amino acids. Exemplary polypeptides include gene products, naturally occurring proteins, homologs, orthologs, paralogs, fragments, as well as other equivalents, variants, and analogs of the foregoing.
The terms “polypeptide fragment” or “fragment”, when used to refer to a reference polypeptide, refers to a polypeptide in which amino acid residues are deleted as compared to the reference polypeptide itself, but where the remaining amino acid sequence is usually identical to the corresponding positions in the reference polypeptide. Such deletions can occur at the amino-terminus or carboxy-terminus of the reference polypeptide, or alternatively both. Fragments typically are at least 5, 6, 8 or 10 amino acids long, at least 14 amino acids long, at least 20, 30, 40 or 50 amino acids long, at least 75 amino acids long, or at least 100, 150, 200, 300, 500 or more amino acids long. A fragment can retain one or more of the biological activities of the reference polypeptide. In certain embodiments, a fragment can comprise a druggable region, and optionally additional amino acids on one or both sides of the druggable region, which additional amino acids can number from 5, 10, 15, 20, 30, 40, 50, or up to 100 or more residues. Further, fragments can include a sub-fragment of a specific region, which sub-fragment retains a function of the region from which it is derived. In one embodiment, a fragment can have immunogenic properties.
The term “polypeptide of the invention” refers to a polypeptide comprising the amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4, or an equivalent or fragment thereof: i.e., a polypeptide comprising a sequence consisting of, or consisting essentially of, the amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4. Polypeptides of the invention include polypeptides comprising all or a portion of the amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4; the amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4 with 1 to about 2, 3, 5, 7, 10, 15, 20, 30, 50, 75 or more conservative amino acid substitutions; an amino acid sequence that is at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 2 or SEQ ID NO: 4; and functional fragments thereof. Polypeptides of the invention also include homologs, i.e., orthologs and paralogs, of SEQ ID NO: 2 or SEQ ID NO: 4.
As used herein, the term “primer” refers to a nucleic acid comprising in one embodiment 2 or more deoxyribonucleotides or ribonucleotides, in another embodiment more than 3, in another embodiment more than 8, and in yet another embodiment at least about 20 nucleotides of an exonic or intronic region. In one embodiment, an oligonucleotide is between 10 and 30 bases in length.
The term “purified” refers to an object species that is the predominant species present (i.e., on a molar basis it is more abundant than any other individual species in the composition). A “purified fraction” is a composition wherein the object species comprises at least about 50 percent (on a molar basis) of all species present. In making the determination of the purity of a species in solution or dispersion, the solvent or matrix in which the species is dissolved or dispersed is usually not included in such determination; instead, only the species (including the one of interest) dissolved or dispersed are taken into account. Generally, a purified composition will have one species that comprises more than about 80 percent of all species present in the composition, more than about 85%, 90%, 95%, 99% or more of all species present. The object species can be purified to essential homogeneity (contaminant species cannot be detected in the composition by conventional detection methods) wherein the composition consists essentially of a single species. A skilled artisan can purify a polypeptide of the invention using standard techniques for protein purification in light of the teachings herein. Purity of a polypeptide can be determined by a number of methods known to those of skill in the art, including for example, amino-terminal amino acid sequence analysis, gel electrophoresis, mass-spectrometry analysis and the methods described herein.
The terms “recombinant protein” and “recombinant polypeptide” refer to a polypeptide that is produced by recombinant DNA techniques. An example of such techniques includes when DNA encoding a polypeptide is inserted into a suitable expression vector that is in turn used to transform a host cell to produce the polypeptide encoded by the DNA.
A “reference sequence” is a defined sequence used as a basis for a sequence comparison. A reference sequence can be a subset of a larger sequence, for example, as a segment of a full-length protein given in a sequence listing such as SEQ ID NO: 2 or SEQ ID NO: 4, or can comprise a complete protein sequence. Generally, a reference sequence is at least 200, 300 or 400 nucleotides in length, frequently at least 600 nucleotides in length, and often at least 800 nucleotides in length (or the protein equivalent if it is shorter or longer in length). Because two proteins can each (1) comprise a sequence (i.e., a portion of the complete protein sequence) that is similar between the two proteins, and (2) can further comprise a sequence that is divergent between the two proteins, sequence comparisons between two (or more) proteins are typically performed by comparing sequences of the two proteins over a “comparison window” to identify and compare local regions of sequence similarity.
A “comparison window,” as used herein, refers to a conceptual segment of at least 20 contiguous amino acid positions wherein a protein sequence can be compared to a reference sequence of at least 20 contiguous amino acids and wherein the portion of the protein sequence in the comparison window can comprise additions or deletions (i.e., gaps) of 20 percent or less as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. Optimal alignment of sequences for aligning a comparison window can be conducted by the local homology algorithm of Smith & Waterman, 1981, by the homology alignment algorithm of Needleman & Wunsch, 1970, by the search for similarity method of Pearson & Lipman, 1988, by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, available from Accelrys, Inc., San Diego, Calif., United States of America), or by inspection, and the best alignment (i.e., resulting in the highest percentage of homology over the comparison window) generated by the various methods can be identified.
The term “regulatory sequence” is a generic term used throughout the specification to refer to polynucleotide sequences, such as initiation signals, enhancers, regulators and promoters, that are necessary or desirable to affect the expression of coding and non-coding sequences to which they are operably linked. Exemplary regulatory sequences are described in Goeddel, 1990, and include, for example, the early and late promoters of SV40, adenovirus or cytomegalovirus immediate early promoter, the lac system, the trp system, the TAC or TRC system, T7 promoter whose expression is directed by T7 RNA polymerase, the major operator and promoter regions of phage lambda, the control regions for fd coat protein, the promoter for 3-phosphoglycerate kinase or other glycolytic enzymes, the promoters of acid phosphatase, i.e., Pho5, the promoters of the yeast α-mating factors, the polyhedron promoter of the baculovirus system and other sequences known to control the expression of genes of prokaryotic or eukaryotic cells or their viruses, and various combinations thereof. The nature and use of such control sequences can differ depending upon the host organism. In prokaryotes, such regulatory sequences generally include promoter, ribosomal binding site, and transcription termination sequences. The term “regulatory sequence” is intended to include, at a minimum, components whose presence can influence expression, and can also include additional components whose presence is advantageous, for example, leader sequences and fusion partner sequences. In certain embodiments, transcription of a polynucleotide sequence is under the control of a promoter sequence (or other regulatory sequence) that controls the expression of the polynucleotide in a cell-type in which expression is intended. It will also be understood that the polynucleotide can be under the control of regulatory sequences that are the same or different from those sequences which control expression of the naturally occurring form of the polynucleotide.
The term “reporter gene” refers to a nucleic acid comprising a nucleotide sequence encoding a protein that is readily detectable either by its presence or activity, including, but not limited to, luciferase, fluorescent protein (i.e., green fluorescent protein), chloramphenicol acetyl transferase, β-galactosidase, secreted placental alkaline phosphatase, β-lactamase, human growth hormone, and other secreted enzyme reporters. Generally, a reporter gene encodes a polypeptide not otherwise produced by the host cell, which is detectable by analysis of the cell(s), i.e., by the direct fluorometric, radioisotopic or spectrophotometric analysis of the cell(s) and preferably without the need to kill the cells for signal analysis. In certain instances, a reporter gene encodes an enzyme, which produces a change in fluorometric properties of the host cell, which is detectable by qualitative, quantitative, or semiquantitative function or transcriptional activation. Exemplary enzymes include esterases, β-lactamase, phosphatases, peroxidases, proteases (tissue plasminogen activator or urokinase) and other enzymes whose function can be detected by appropriate chromogenic or fluorogenic substrates known to those skilled in the art or developed in the future.
The term “sequence homology” refers to the proportion of base matches between two nucleic acid sequences or the proportion of amino acid matches between two amino acid sequences. When sequence homology is expressed as a percentage, i.e., 50%, the percentage denotes the proportion of matches over the length of sequence from a desired sequence (i.e., SEQ. ID NO: 1) that is compared to some other sequence. Gaps (in either of the two sequences) are permitted to maximize matching; gap lengths of 15 bases or less are usually used, 6 bases or less are used more frequently, with 2 bases or less used even more frequently. The term “sequence identity” means that sequences are identical (i.e., on a nucleotide-by-nucleotide basis for nucleic acids or amino acid-by-amino acid basis for polypeptides) over a window of comparison. The term “percentage of sequence identity” is calculated by comparing two optimally aligned sequences over the comparison window, determining the number of positions at which the identical amino acids occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the comparison window, and multiplying the result by 100 to yield the percentage of sequence identity. Methods to calculate sequence identity are known to those of skill in the art and described in further detail herein.
As used herein, the term “sequencing” refers to determining the ordered linear sequence of nucleotides or amino acids of a DNA, RNA, or protein target sample, using conventional manual or automated laboratory techniques.
The term “small molecule” refers to a compound, which has a molecular weight of less than about 5 kilodalton (kD), less than about 2.5 kD, less than about 1.5 kD, or less than about 0.9 kD. Small molecules can be, for example, nucleic acids, peptides, polypeptides, peptide nucleic acids, peptidomimetics, carbohydrates, lipids, or other organic (carbon containing) or inorganic molecules. The term “small organic molecule” refers to a small molecule that is often identified as being an organic or medicinal compound, and does not include molecules that are exclusively nucleic acids, peptides, or polypeptides.
The term “soluble” as used herein with reference to a polypeptide of the invention or other protein means that upon expression in cell culture, at least some portion of the polypeptide or protein expressed remains in the cytoplasmic fraction of the cell and does not fractionate with the cellular debris upon lysis and centrifugation of the lysate. Solubility of a polypeptide can be increased by a variety of art recognized methods, including fusion to a heterologous amino acid sequence, deletion of amino acid residues, amino acid substitution (i.e., enriching the sequence with amino acid residues having hydrophilic side chains), and chemical modification (i.e., addition of hydrophilic groups). The solubility of polypeptides can be measured using a variety of art recognized techniques, including dynamic light scattering to determine aggregation state, UV absorption, centrifugation to separate aggregated from non-aggregated material, and SDS gel electrophoresis (i.e., the amount of protein in the soluble fraction is compared to the amount of protein in the soluble and insoluble fractions combined). When expressed in a host cell, the polypeptides of the invention can be at least about 1%, 2%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90% or more soluble, i.e., at least about 1%, 2%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90% or more of the total amount of protein expressed in the cell is found in the cytoplasmic fraction. In certain embodiments, a one liter culture of cells expressing a polypeptide of the invention will produce at least about 0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 30, 40, 50 milligrams or more of soluble protein. In an exemplary embodiment, a polypeptide of the invention is at least about 10% soluble and will produce at least about 1 milligram of protein from a one liter cell culture.
As used herein, the term “space group” refers to the arrangement of symmetry elements of a crystal.
The term “specifically hybridizes” refers to detectable and specific nucleic acid binding. Polynucleotides, oligonucleotides, and nucleic acids of the invention selectively hybridize to nucleic acid strands under hybridization and wash conditions that minimize appreciable amounts of detectable binding to nonspecific nucleic acids. Stringent conditions can be used to achieve selective hybridization conditions as known in the art and discussed herein. Generally, the nucleic acid sequence homology between the polynucleotides, oligonucleotides, and nucleic acids of the invention and a nucleic acid sequence of interest will be at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95%, 98%, 99%, or more. In certain instances, hybridization and washing conditions are performed under stringent conditions according to conventional hybridization procedures and as described further herein.
As used herein, the terms “structure coordinates”, “atomic coordinates”, and “structural coordinates” are used interchangeably and refer to coordinates derived from mathematical equations related to the patterns obtained on diffraction of a monochromatic beam of X-rays by the atoms (scattering centers) of a molecule in crystal form. The diffraction data are used to calculate an electron density map of the repeating unit of the crystal. The electron density maps are used to establish the positions of the individual atoms within the unit cell of the crystal.
Those of skill in the art understand that a set of coordinates determined by X-ray crystallography is not without experimental error. In general, the error in the coordinates tends to be reduced as the resolution is increased, since more experimental diffraction data is available for the model fitting and refinement. Thus, for example, more diffraction data can be collected from a crystal that diffracts to a resolution of 2.0 angstroms than from a crystal that diffracts to a lower resolution, such as 2.5 or 3.0 angstroms. Consequently, the refined structural coordinates will usually be more accurate when fitted and refined using data from a crystal that diffracts to higher resolution. The design of ligands for a CAR polypeptide depends on the accuracy of the structural coordinates. If the coordinates are not sufficiently accurate, then the design process will be ineffective. In most cases, it is very difficult or impossible to collect sufficient diffraction data to define atomic coordinates precisely when the crystals diffract to a resolution of 3.0 angstroms or poorer. Thus, in most cases, it is difficult to use X-ray structures in structure-based ligand design when the X-ray structures are based on crystals that diffract to a resolution of only 3.0 angstroms or poorer. However, common experience has shown that crystals diffracting to 2.0-2.5 angstroms or better can yield X-ray structures with sufficient accuracy to greatly facilitate structure-based drug design. Further improvement in the resolution can further facilitate structure-based design, but the coordinates obtained at 2.0-2.5 angstroms resolution are generally considered adequate for most purposes.
Also, those of skill in the art will understand that nuclear receptors can adopt different conformations when different ligands are bound, or in the absence of any ligand. In particular, in most nuclear receptors, the AF2 helix can adopt different conformations when agonists and antagonists (or inverse agonists) are bound. More subtle conformational changes occur in other parts of the LBD when the AF2 helix is shifted. Generally, structure-based design of ligands that modulate CAR activity requires an understanding of the “activated” conformation that occurs when agonists are bound (or in the absence of ligand), as well as the “repressed” conformation that occurs when antagonists (or inverse agonists) are bound. The crystal structure of CAR bound to Compound 1 provides the “repressed” structure of CAR. In one embodiment, the “activated” conformation of CAR can be modeled approximately by using the “repressed” CAR structure as a starting structure, and then adjusting the conformation of the residues at the C-terminal end of the structure, residues 332-348, to form an AF2 helix with conformation, position, and orientation similar to that observed in the “activated” conformations of other nuclear receptors. It should be noted that the X-ray structure of CAR bound to Compound 1, which is an inverse agonist, revealed a completely novel, unexpected conformation for the residues that normally comprise the AF2 helix and the AF2 linking segment. No conventional modeling procedure could have predicted this novel “repressed” structure from an X-ray structure of the “activated” conformation of CAR.
The terms “stringent conditions” or “stringent hybridization conditions” refer to conditions that promote specific hybridization between two complementary polynucleotide strands so as to form a duplex. Stringent conditions can be selected to be about 5° C. lower than the thermal melting point (Tm) for a given polynucleotide duplex at a defined ionic strength and pH. The length of the complementary polynucleotide strands and their GC content will determine the Tm of the duplex, and thus the hybridization conditions necessary for obtaining a desired specificity of hybridization. The Tm is the temperature (under defined ionic strength and pH) at which 50% of a polynucleotide sequence hybridizes to a perfectly matched complementary strand. In certain cases it can be desirable to increase the stringency of the hybridization conditions to be about equal to the Tm for a particular duplex.
A variety of techniques for estimating the Tm are available. Typically, G-C base pairs in a duplex are estimated to contribute about 3° C. to the Tm, while A-T base pairs are estimated to contribute about 2° C., up to a theoretical maximum of about 80-100° C. However, more sophisticated models of Tm are available in which G-C stacking interactions, solvent effects, the desired assay temperature and the like are taken into account. For example, probes can be designed to have a dissociation temperature (Td) of approximately 60° C., using the formula: Td=(((((3×#GC)+(2×#AT))×37)−562)/#bp)−5; where #GC, #AT, and #bp are the number of guanine-cytosine base pairs, the number of adenine-thymine base pairs, and the number of total base pairs, respectively, involved in the formation of the duplex.
Hybridization can be carried out in 5×SSC, 4×SSC, 3×SSC, 2×SSC, 1×SSC or 0.2×SSC for at least about 1 hour, 2 hours, 5 hours, 12 hours, or 24 hours. The temperature of the hybridization can be increased to adjust the stringency of the reaction, for example, from about 25° C. (room temperature), to about 45° C., 50° C., 55° C., 60° C., or 65° C. The hybridization reaction can also include another agent affecting the stringency; for example, hybridization conducted in the presence of 50% formamide increases the stringency of hybridization at a defined temperature.
The hybridization reaction can be followed by a single wash step, or two or more wash steps, which can be at the same or a different salinity and temperature. For example, the temperature of the wash can be increased to adjust the stringency from about 25° C. (room temperature), to about 45° C., 50° C., 55° C., 60° C., 65° C., or higher. The wash step can be conducted in the presence of a detergent, i.e., 0.1 or 0.2% SDS. For example, hybridization can be followed by two wash steps at 65° C. each for about 20 minutes in 2×SSC, 0.1% SDS, and optionally two additional wash steps at 65° C. each for about 20 minutes in 0.2×SSC, 0.1% SDS.
Exemplary stringent hybridization conditions include overnight hybridization at 65° C. in a solution comprising, or consisting of, 50% formamide, 10× Denhardt's Solution (0.2% Ficoll, 0.2% Polyvinylpyrrolidone, 0.2% bovine serum albumin) and 200 μg/ml of denatured carrier DNA, i.e., sheared salmon sperm DNA, followed by two wash steps at 65° C. each for about 20 minutes in 2×SSC, 0.1% SDS, and two wash steps at 65° C. each for about 20 minutes in 0.2×SSC, 0.1% SDS.
Hybridization can include hybridizing two nucleic acids in solution, or a nucleic acid in solution to a nucleic acid attached to a solid support, i.e., a filter. When one nucleic acid is on a solid support, a prehybridization step can be conducted prior to hybridization. Prehybridization can be carried out for at least about 1 hour, 3 hours or 10 hours in the same solution and at the same temperature as the hybridization solution (without the complementary polynucleotide strand).
Appropriate stringency conditions are known to those skilled in the art or can be determined experimentally by the skilled artisan. See e.g. Ausubel et al., 1994; Sambrook & Russell, 2001; Agrawal, 1993; Tibanyenda et al., 1984; Ebel et al., 1992.
The term “structural motif”, when used in reference to a polypeptide, refers to a polypeptide that, although it can have different amino acid sequences, can result in a similar structure, wherein by structure is meant that the motif forms generally the same tertiary structure, or that certain amino acid residues within the motif, or alternatively their backbone or side chains (which can or can not include the Cα atoms of the side chains) are positioned in a like relationship with respect to one another in the motif.
As applied to proteins, the term “substantial identity” means that two protein sequences, when optimally aligned, such as by the programs GAP or BESTFIT using default gap weights, typically share at least about 70 percent sequence identity, alternatively at least about 80, 85, 90, 95 percent sequence identity or more. In certain instances, residue positions that are not identical differ by conservative amino acid substitutions, which are described above.
As used herein, the term “substantially pure” refers to a polynucleotide or polypeptide that is substantially free of the sequences and molecules with which it is associated in its natural state, as well as from those molecules used in the isolation procedure. The term “substantially free” refers to that the sample is in one embodiment at least 50%, in another embodiment at least 70%, in another embodiment at least 80%, and in still another embodiment at least 90% free of the sequences and molecules with which is it associated in nature.
As used herein, the term “target cell” refers to a cell, into which it is desired to insert a nucleic acid sequence or polypeptide, or to otherwise effect a modification from conditions known to be present in the unmodified cell. A nucleic acid sequence introduced into a target cell can be of variable length. Additionally, a nucleic acid sequence can enter a target cell as a component of a plasmid or other vector or as a naked sequence.
The term “test compound” refers to a molecule to be tested by one or more screening method(s) as a putative modulator of a polypeptide of the invention or other biological entity or process. A test compound is usually not known to bind to a target of interest. The term “control test compound” refers to a compound known to bind to the target (i.e., a known agonist, antagonist, partial agonist or inverse agonist). The term “test compound” does not include a chemical added as a control condition that alters the function of the target to determine signal specificity in an assay. Such control chemicals or conditions include chemicals that 1) nonspecifically or substantially disrupt protein structure (i.e., denaturing agents (i.e., urea or guanidinium), chaotropic agents, sulfhydryl reagents (i.e., dithiothreitol and β-mercaptoethanol), and proteases), 2) generally inhibit cell metabolism (i.e., mitochondrial uncouplers) and 3) non-specifically disrupt electrostatic or hydrophobic interactions of a protein (i.e., high salt concentrations, or detergents at concentrations sufficient to non-specifically disrupt hydrophobic interactions). Further, the term “test compound” also does not include compounds known to be unsuitable for a therapeutic use for a particular indication due to toxicity of the subject. In certain embodiments, various predetermined concentrations of test compounds are used for screening such as 0.01 μM, 0.1 μM, 1.0 μM, and 10.0 μM. Examples of test compounds include, but are not limited to peptides, nucleic acids, carbohydrates, and small molecules. The term “novel test compound” refers to a test compound that is not in existence as of the filing date of this application. In certain assays using novel test compounds, the novel test compounds comprise at least about 50%, 75%, 85%, 90%, 95% or more of the test compounds used in the assay or in any particular trial of the assay.
The term “therapeutically effective amount” refers to that amount of a modulator, drug, or other molecule that is sufficient to effect treatment when administered to a subject in need of such treatment. The therapeutically effective amount will vary depending upon the subject and disease condition being treated, the weight and age of the subject, the severity of the disease condition, the manner of administration and the like, which can readily be determined by one of ordinary skill in the art.
The term “transfection” means the introduction of a nucleic acid, i.e., an expression vector, into a recipient cell, which in certain instances involves nucleic acid-mediated gene transfer. The term “transformation” refers to a process in which a cell's genotype is changed as a result of the cellular uptake of exogenous nucleic acid. For example, a transformed cell can express a recombinant form of a polypeptide of the invention or antisense expression can occur from the transferred gene so that the expression of a naturally occurring form of the gene is disrupted.
The term “transgene” means a nucleic acid sequence, which is partly or entirely heterologous to a transgenic animal or cell into which it is introduced, or, is homologous to an endogenous gene of the transgenic animal or cell into which it is introduced, but which is designed to be inserted, or is inserted, into the animal's genome in such a way as to alter the genome of the cell into which it is inserted (i.e., it is inserted at a location which differs from that of the natural gene or its insertion results in a knockout). A transgene can include one or more regulatory sequences and any other nucleic acids, such as introns, that can be necessary for optimal expression.
The term “transgenic animal” refers to any animal, for example, a mouse, rat or other non-human mammal, a bird or an amphibian, in which one or more of the cells of the animal contain heterologous nucleic acid introduced by way of human intervention, such as by transgenic techniques well known in the art. The nucleic acid is introduced into the cell, directly or indirectly, by way of deliberate genetic manipulation, such as by microinjection or by infection with a recombinant virus. The term genetic manipulation does not include classical cross-breeding, or in vitro fertilization, but rather is directed to the introduction of a recombinant DNA molecule. This molecule can be integrated within a chromosome, or it can be extrachromosomally replicating DNA. In the typical transgenic animals described herein, the transgene causes cells to express a recombinant form of a protein. However, transgenic animals in which the recombinant gene is silent are also contemplated.
As used herein, the term “unit cell” refers to a basic parallelepiped shaped block. Each unit cell comprises a complete representation of the unit of pattern, the repetition of which builds up the crystal. Thus, the term “unit cell” refers to the fundamental portion of a crystal structure that is repeated infinitely by translation in three dimensions. A unit cell is characterized by three vectors, a, b, and c, not located in one plane, which form the edges of a parallelepiped. Angles α, β and γ define the angles between the vectors: angle α is the angle between vectors b and c; angle β is the angle between vectors a and c; and angle γ is the angle between vectors a and b. The entire volume of a crystal can be constructed by regular assembly of unit cells, each unit cell comprising a complete representation of the unit of pattern, the repetition of which builds up the crystal.
Unless otherwise indicated, all numbers expressing quantities of ingredients, reaction conditions, and so forth used in the specification and claims are to be understood as being modified in all instances by the term “about”. Accordingly, unless indicated to the contrary, the numerical parameters set forth in this specification and attached claims are approximations that can vary depending upon the desired properties sought to be obtained by the present invention.
II. Description of Tables
Table 1 is a table summarizing the crystal and data statistics obtained from the crystallized ligand-binding domain of CAR in complex with the ligand Compound 1. Data on the unit cell are presented, including data on the crystal space group, unit cell dimensions, molecules per asymmetric cell and crystal resolution.
Table 2 is a table of the atomic coordinate data obtained from X-ray diffraction from the ligand-binding domain of CAR in complex with the ligand Compound 1.
Table 3 is a table of the atomic structure coordinate data of the poly-alanine model of the conserved vitamin D receptor ligand-binding domain.
III. General Considerations
The present invention is applicable mutatis mutandis to all CARs, as discussed herein, based in part on the patterns of CAR structure and modulation that have emerged as a consequence of determining the three dimensional structure of CAR with bound ligand. Analysis and alignment of amino acid sequences, and X-ray and NMR structure determinations, have shown that nuclear receptors have a modular architecture with three main domains:
1) a variable amino-terminal domain;
2) a highly conserved DNA-binding domain (DBD); and
3) a less conserved carboxy-terminal ligand-binding domain (LBD).
In addition, nuclear receptors can have linker segments of variable length between these major domains. Sequence analysis and X-ray crystallography, including the work of the present invention, have confirmed that CARs, and indeed many NRs, also have the same general modular architecture, with the same three domains. The function of the CARs in human cells presumably requires all three domains in a single amino acid sequence. However, the modularity of the CARs permits different domains of each protein to separately accomplish certain functions.
Previous analysis of the nuclear receptors has revealed multiple discrete functional modules within the family that display generalized functional characteristics (for review see Beato et al., 1995; Kastner et al., 1995; Mangelsdorf & Evans, 1995; Tzukerman et al., 1994). A variable amino-terminal domain (A/B) is present that sometimes contains a strong and autonomous activation function (AF1), shown to be critical for cell and target gene specificity (Tora et al., 1988). A more carboxyl-terminal central region contains a DNA binding domain (DBD) characterized by two C4-type zinc fingers. The DBD binds to specific genomic response elements and thereby regulates the transcriptional activity of select genes containing the response elements. At the distal carboxyl terminus, a ligand-binding domain (LBD) is present containing a highly conserved second transactivation function (AF2) that is important for hormone-dependent transcriptional transactivation (Lanz & Rusconi, 1994).
Typically, the LBD forms a three-layered anti-parallel helical sandwich composed of 10-14 α helices and a β-sheet with 24 strands. The helices pack together so as to leave a binding pocket near the middle of the bundle, capped on one side by the β-sheet, and, in the “activated” state, capped on the other side by the AF2-helix. Comparison of apo, agonist-bound, and antagonist-bound nuclear receptor structures has led to a model for ligand-inducible receptor action. In this model, the agonist (activating) ligands tend to hold the AF2 helix in a conformation where it “caps” the binding pocket. Antagonistic ligands usually shift the AF2 helix out of this “active” position. The AF2 helix can also shift into other conformations, positions, and orientations in the absence of ligand. Constitutively active receptors such as CAR should presumably utilize a similar mechanism of action, except that the AF2 helix adopts the “active” position, capping the ligand-binding pocket, even in the absence of ligand. Inverse agonists would presumably tend to shift the AF2 helix out of this “active” position, whereas superagonists would presumably tend to hold the AF2 helix more tightly in the active position. Central to the efficient ligand-induced transcriptional activation is the recruitment of co-regulator proteins—coactivators and co-repressors, which interact with the LBD and activate or repress transactivation, respectively (Moras & Gronemeyer, 1998; Weatherman et al., 1999; McKenna & O'Malley, 2000). In general, the conformational changes described above involving the AF2 helix cause changes in the affinity of the LBD for co-repressors versus coactivators. The binding of an agonist results in a dissociation of co-repressors and brings the AF2 into a context where it can interact with transcriptional coactivators. Likewise, an antagonist would be expected to disrupt the binding of coactivators.
Sequences that function in nuclear localization, receptor dimerization, and interaction with heat-shock proteins (Gronemeyer & Laudet, 1995) are also present within the nuclear receptor substructure. Through the coordinated action of these separate functional domains, nuclear receptor activation by ligand culminates in modulation of target gene expression through DNA interactions (Tsai & O'Malley, 1994) or in certain other cases through cross-talk with other cell signaling pathways (Stein & Yang, 1995; Paech et al., 1998). In short, a ligand alters nuclear receptor function by altering the conformation of the receptor and consequently the constellation of protein-protein interactions in which the receptor is engaged (Freedman, 1999).
Some of the functions of a domain within the full-length receptor are preserved when that particular domain is isolated from the remainder of the protein. Using conventional protein chemistry techniques, a modular domain can sometimes be separated from the parent protein. Using conventional molecular biology techniques, each domain can usually be separately expressed with its original function intact or, as discussed herein below, chimeras comprising two different proteins can be constructed, wherein the chimeras retain the properties of the individual functional domains of the respective nuclear receptors from which the chimeras were generated.
The LBD is the second most highly conserved domain in these 3 domains. As its name suggests, the LBD binds ligands. With many nuclear receptors binding of the ligand can induce a conformational change in the LBD that can, in turn, increase or decrease transcription of certain target genes. The LBD also participates in other functions, including dimerization and nuclear translocation.
X-ray structures have shown that most nuclear receptor LBDs adopt the same general folding pattern. This fold includes 10-12 alpha helices arranged in a bundle, together with several beta-strands, additional alpha helices and linking segments. The major alpha helices and beta-strands have been numbered differently in different publications. The present disclosure follows the numbering scheme of Nolte et al., 1998, where the major alpha-helices and beta-strands in PPARγ were designated sequentially through the amino acid sequence as H1, H2, S1, H2′, H3, H3′, H4, H5, S2, S3, S4, H6, H7, H8, H9, H10 and HAF. The alpha helix at the C-terminal end, HAF, is also called “helix-AF”, “helix-AF2” the “AF2 helix” or “helix-12”. Most, but not all, of these alpha helices and beta-strands are observed in the structure of CAR. An additional helix, designated here as “helix-X”, is observed in the structure of CAR bound to Compound 1 on the C-terminal side of H10.
As described herein, the LBD of a CAR can be expressed, crystallized, its three dimensional structure determined with a ligand bound as disclosed in the present invention, and computational methods can be used to design ligands to its LBD.
IV. Synthesis of CAR Ligands and Intermediates
IV.A. Compound 1—An Embodiment of a Synthetic CAR Ligand
In one embodiment, the present invention provides compounds of Compound 1 (Formula (A) below) and tautomeric forms, pharmaceutically acceptable salts and solvates thereof:
IV.B. Synthesis of Compound 1 and Intermediates
Compound 1, which was co-crystallized with the CAR LBD in the present invention, can be prepared as described in Example 6 and shown in
V. Production of CAR Polypeptides
The native and mutated CAR polypeptides, and fragments thereof, of the present invention can be chemically synthesized in whole or part using techniques that are well known in the art (see e.g., Creighton, 1983, incorporated herein in its entirety). Alternatively, methods which are well known to those skilled in the art can be used to construct expression vectors containing a partial or the entire native or mutated CAR polypeptide coding sequence and appropriate transcriptional/translational control signals. These methods include in vitro recombinant DNA techniques, synthetic techniques, and in vivo recombination/genetic recombination (see e.g., the techniques described throughout Sambrook & Russell, 2001, and Ausubel et al., 1994, both incorporated herein in their entirety).
A variety of host-expression vector systems can be utilized to express a CAR coding sequence. These include but are not limited to microorganisms such as bacteria transformed with recombinant bacteriophage DNA, plasmid DNA or cosmid DNA expression vectors containing a CAR coding sequence; yeast transformed with recombinant yeast expression vectors containing a CAR coding sequence; insect cell systems infected with recombinant virus expression vectors (e.g., baculovirus) containing a CAR coding sequence; plant cell systems infected with recombinant virus expression vectors (e.g., cauliflower mosaic virus, CaMV; tobacco mosaic virus, TMV) or transformed with recombinant plasmid expression vectors (e.g., Ti plasmid) containing a CAR coding sequence; or animal cell systems. The expression elements of these systems vary in their strength and specificities.
Depending on the host/vector system utilized, any of a number of suitable transcription and translation elements, including constitutive and inducible promoters, can be used in the expression vector. For example, when cloning in bacterial systems, inducible promoters such as pL of bacteriophage X, plac, ptrp, ptac (ptrp-lac hybrid promoter) and the like can be used. When cloning in insect cell systems, promoters such as the baculovirus polyhedrin promoter can be used. When cloning in plant cell systems, promoters derived from the genome of plant cells, such as heat shock promoters; the promoter for the small subunit of ribulose bisphosphate carboxylase (RUBISCO); the promoter for the chlorophyll a/b binding protein; or from plant viruses (e.g., the 35S RNA promoter of CaMV; the coat protein promoter of TMV) can be used. When cloning in mammalian cell systems, promoters derived from the genome of mammalian cells (e.g., metallothionein promoter) or from mammalian viruses (e.g., the adenovirus late promoter; the vaccinia virus 7.5K promoter) can be used.
In each of these systems, one of ordinary skill in the art will appreciate that other promoters can be used, and as such, the list presented is not intended to be exhaustive.
VI. Analysis of Protein Properties
VI.A. Analysis of Proteins by X-ray Crystallography Generally
VI.A.1. X-ray Structure Determination
Exemplary methods for obtaining the three dimensional structure of the crystalline form of a molecule or complex are described herein and, in view of this specification, variations on these methods will be apparent to those skilled in the art (see Ducruix & Geige, 1992).
A variety of methods involving X-ray crystallography are contemplated by the present invention. For example, the present invention contemplates producing a crystallized polypeptide of the invention, or a fragment thereof, by: (a) introducing into a host cell an expression vector comprising a nucleic acid encoding for a polypeptide of the invention, or a fragment thereof; (b) culturing the host cell in a cell culture medium to express the polypeptide or fragment; (c) isolating the polypeptide or fragment from the cell culture; and (d) crystallizing the polypeptide or fragment thereof. Alternatively, the present invention contemplates determining the three dimensional structure of a crystallized polypeptide of the invention, or a fragment thereof, by: (a) crystallizing a polypeptide of the invention, or a fragment thereof, such that the crystals will diffract X-rays to a resolution of 2.5 Å or better; and (b) analyzing the polypeptide or fragment by X-ray diffraction to determine the three-dimensional structure of the crystallized polypeptide.
X-ray crystallography techniques generally require that the protein molecules be available in the form of a crystal. Crystals can be grown from a solution containing a purified polypeptide of the invention, or a fragment thereof (i.e., a ligand-binding domain), by a variety of conventional processes. These processes include, for example, batch, liquid, bridge, dialysis, and vapor diffusion (i.e., hanging drop or sitting drop methods). See e.g., McPherson, 1982; McPherson, 1990; Webe, 1991.
In certain embodiments, native crystals of the invention can be grown by adding precipitants to the concentrated solution of the polypeptide. The precipitants are added at a concentration just below that necessary to precipitate the protein. Water can be removed by controlled evaporation to produce precipitating conditions, which are maintained until crystal growth ceases.
The formation of crystals is dependent on a number of different parameters, including pH, temperature, protein concentration, the nature of the solvent and precipitant, as well as the presence of added ions or ligands to the protein. In addition, the sequence of the polypeptide being crystallized will have a significant affect on the success of obtaining crystals. Many routine crystallization experiments can be needed to screen all these parameters for the few combinations that might give crystal suitable for X-ray diffraction analysis. See e.g., Jancarik & Kim, 1991.
Crystallization robots can automate and speed up the work of reproducibly setting up large number of crystallization experiments. Once some suitable set of conditions for growing the crystal are found, variations of the condition can be systematically screened in order to find the set of conditions which allows the growth of sufficiently large, single, well ordered crystals. In certain instances, a polypeptide of the invention is co-crystallized with a ligand: in one embodiment, Compound 1.
A number of methods are available to produce suitable radiation for X-ray diffraction. For example, X-ray beams can be produced by synchrotron rings where electrons (or positrons) are accelerated through an electromagnetic field while traveling at close to the speed of light. Because the admitted wavelength can also be controlled, synchrotrons can be used as a tunable X-ray source (Hendrickson, 2000). For less conventional Laue diffraction studies, polychromatic X-rays covering a broad wavelength window are used to observe many diffraction intensities simultaneously (Stoddard, 1998). Neutrons can also be used for solving protein crystal structures (Gutberlet et al., 2001).
Before data collection commences, a protein crystal can be frozen to protect it from radiation damage. A number of different cryo-protectants can be used to assist in freezing the crystal, such as methyl pentanediol (MPD), isopropanol, ethylene glycol, glycerol, formate, citrate, mineral oil, or a low-molecular-weight polyethylene glycol (PEG). The present invention contemplates a composition comprising a polypeptide of the invention and a cryo-protectant. As an alternative to freezing the crystal, the crystal can also be used for diffraction experiments performed at temperatures above the freezing point of the solution. In these instances, the crystal can be protected from desiccation by placing it in a narrow capillary of a suitable material (generally glass or quartz) with some of the crystal growth solution included in order to maintain vapor pressure.
X-ray diffraction results can be recorded by a number of ways known to one of skill in the art. Examples of area electronic detectors include charge coupled device detectors, multi-wire area detectors, and phosphoimager detectors (Amemiya, 1997; Westbrook & Naday, 1997; Kahn & Fourme, 1997).
A suitable system for laboratory data collection might include a Bruker AXS Proteum R system, equipped with a copper rotating anode source, Confocal MAX-FLUX™ optics and a SMART 6000 charge coupled device detector. Collection of X-ray diffraction patterns is well known to those skilled in the art (see e.g. Ducruix & Geige, 1992).
The theory behind diffraction by a crystal upon exposure to X-rays is well known. Because phase information is not directly measured in the diffraction experiment and is needed to reconstruct the electron density map, methods that can recover this missing information are required. One method of solving structures ab initio is the real/reciprocal space cycling technique. Suitable real/reciprocal space cycling search programs include Shake-and-Bake (Miller et al., 1993; Weeks et al., 1994).
Other methods for deriving phases might also be needed. These techniques generally rely on the idea that if two or more measurements of the same reflection are made where strong, measurable, differences are attributable to the characteristics of a small subset of the atoms alone, then the contributions of other atoms can be, to a first approximation, ignored, and the positions of these atoms can be determined from the difference in scattering by one of the above techniques. Knowing the position and scattering characteristics of those atoms, one can calculate what phase the overall scattering must have had to produce the observed differences.
One version of this technique is the isomorphous replacement technique, which requires the introduction of new, well ordered, X-ray scatterers into the crystal. These additions are usually heavy metal atoms, (so that they make a significant difference in the diffraction pattern); and if the additions do not change the structure of the molecule or of the crystal cell, the resulting crystals should be isomorphous. Isomorphous replacement experiments are usually performed by diffusing different heavy-metal metals into the channels of a pre-existing protein crystal. Growing the crystal from protein that has been soaked in the heavy atom is also possible (Petsko, 1985). Alternatively, the heavy atom can also be reactive and attached covalently to exposed amino acid side chains (such as the sulfur atom of cysteine) or it can be associated through non-covalent interactions. It is sometimes possible to replace endogenous light metals in metallo-proteins with heavier ones, i.e., zinc by mercury, or calcium by samarium (Petsko, 1985). Exemplary sources for such heavy compounds include, but are not limited to, sodium bromide, sodium selenate, trimethyl lead acetate, mercuric chloride, methyl mercury acetate, platinum tetracyanide, platinum tetrachloride, nickel chloride, and europium chloride.
A second technique for generating differences in scattering involves the phenomenon of anomalous scattering. X-rays that cause the displacement of an electron in an inner shell to a higher shell are subsequently rescattered, but there is a time lag that shows up as a phase delay. This phase delay is observed as a (generally quite small) difference in intensity between reflections known as Friedel mates that would be identical if no anomalous scattering were present. A second effect related to this phenomenon is that differences in the intensity of scattering of a given atom will vary in a wavelength-dependent manner, giving rise to what are known as dispersive differences. In principle, anomalous scattering occurs with all atoms, but the effect is strongest with heavy atoms, and can be maximized by using X-rays at a wavelength where the energy is equal to the difference in energy between shells. The technique therefore requires the incorporation of some heavy atom much as is needed for isomorphous replacement, although for anomalous scattering a wider variety of atoms are suitable, including lighter metal atoms (copper, zinc, iron) in metallo-proteins. One method for preparing a protein for anomalous scattering involves replacing the methionine residues in whole or in part with selenium-containing seleno-methionine. Soaking with halide salts such as bromides and other non-reactive ions can also be effective (Dauter et al., 2001).
In another process, known as multiple anomalous scattering or MAD, two to four suitable wavelengths of data are collected. (Hendrickson & Ogata, 1997). Phasing by various combinations of single and multiple isomorphous and anomalous scattering are possible too. For example, SIRAS (single isomorphous replacement with anomalous scattering) utilizes both the isomorphous and anomalous differences for one derivative to derive phases. More traditionally, several different heavy atoms are soaked into different crystals to get sufficient phase information from isomorphous differences while ignoring anomalous scattering, in the technique known as multiple isomorphous replacement (MIR) (Petsko, 1985).
Additional restraints on the phases can be derived from density modification techniques. These techniques use either generally known features of electron density distribution or known facts about that particular crystal to improve the phases. For example, because protein regions of the crystal scatter more strongly than solvent regions, solvent flattening/flipping can be used to adjust phases to make solvent density a uniform flat value (Zhang et al., 1997). If more than one molecule of the protein is present in the asymmetric unit, the fact that the different molecules should be virtually identical can be exploited to further reduce phase error using non-crystallographic symmetry averaging (Villieux & Read, 1997). Suitable programs for performing these processes include DM and other programs of the CCP4 suite (Collaborative Computational Project, 1994) and CNX.
The unit cell dimensions, symmetry, vector amplitude and derived phase information can be used in a Fourier transform function to calculate the electron density in the unit cell, i.e., to generate an experimental electron density map. This can be accomplished using programs of the CNX or CCP4 packages. The resolution is measured in Ångstrom (Å) units, and is closely related to how far apart two objects need to be before they can be reliably distinguished. The smaller this number is, the higher the resolution and therefore the greater the amount of detail that can be seen. In alternative embodiments, crystals of the invention diffract X-rays to a resolution of better than about 4.0, 3.5, 3.0, 2.5, 2.0, 1.5, 1.0, 0.5 Å, or better.
As used herein, the term “modeling” includes the quantitative and qualitative analysis of molecular structure and/or function based on atomic structural information and interaction models. The term “modeling” includes conventional numeric-based molecular dynamic and energy minimization models, interactive computer graphic models, modified molecular mechanics models, distance geometry and other structure-based constraint models.
Model building can be accomplished by either the crystallographer using a computer graphics program such as TURBO or O (Jones et al., 1991) or, under suitable circumstances, by using a fully automated model building program, such as wARP (Perrakis et al., 1999) or MAID (Levitt, 2001). This structure can be used to calculate model-derived diffraction amplitudes and phases. The model-derived and experimental diffraction amplitudes can be compared and the agreement between them can be described by a parameter referred to as R-factor. A high degree of correlation in the amplitudes corresponds to a low R-factor value, with 0.0 representing exact agreement and 0.59 representing a completely random structure. Because the R-factor can be lowered by introducing more free parameters into the model, an unbiased, cross-correlated version of the R-factor known as the R-free gives a more objective measure of model quality. For the calculation of this parameter a subset of reflections (generally around 10%) are set aside at the beginning of the refinement and not used as part of the refinement target. These reflections are then compared to those predicted by the model (Kleywegt & Brunger, 1996).
The model can be improved using computer programs that maximize the probability that the observed data was produced from the predicted model, while simultaneously optimizing the model geometry. For example, the CNX program can be used for model refinement, as can the XPLOR program (Murshudov et al., 1997). In order to maximize the convergence radius of refinement, simulated annealing refinement using torsion angle dynamics can be employed in order to reduce the degrees of freedom of motion of the model (Adams et al., 1997). Where experimental phase information is available (i.e., where MAD data was collected) Hendrickson-Lattman phase probability targets can be employed. Isotropic or anisotropic domain, group or individual temperature factor refinement, can be used to model variance of the atomic position from its mean. Well-defined peaks of electron density not attributable to protein atoms are generally modeled as water molecules. Water molecules can be found by manual inspection of electron density maps, or with automatic water picking routines. Additional small molecules, including ions, cofactors, buffer molecules, or substrates can be included in the model if sufficiently unambiguous electron density is observed in a map.
In general, the R-free is rarely as low as 0.15 and can be as high as 0.35 or greater for a reasonably well-determined protein structure. The residual difference is a consequence of approximations in the model (inadequate modeling of residual structure in the solvent, modeling atoms as isotropic Gaussian spheres, assuming all molecules are identical rather than having a set of discrete conformers, etc.) and errors in the data (Lattman, 1996). In refined structures at high resolution, there are usually no major errors in the orientation of individual residues, and the estimated errors in atomic positions are usually around 0.1-0.2 up to 0.3 Å.
The three dimensional structure of a new crystal can be modeled using molecular replacement. The term “molecular replacement” refers to a method that involves generating a preliminary model of a molecule or complex whose structure coordinates are unknown, by orienting and positioning a molecule whose structure coordinates are known within the unit cell of the unknown crystal, so as best to account for the observed diffraction pattern of the unknown crystal. Phases can then be calculated from this model and combined with the observed amplitudes to give an approximate Fourier synthesis of the structure whose coordinates are unknown. This, in turn, can be subject to any of the several forms of refinement to provide a final, accurate structure of the unknown crystal (Lattman, 1985; Rossmann, 1972).
Commonly used computer software packages for molecular replacement are CNX, X-PLOR (Brunger 1992, Nature 355: 472475), AMORE (Navaza, 1994, Acta Crystallogr. A50:157-163), the CCP4 package, the MERLOT package (Fitzgerald, 1988) and XTALVIEW (McCree et al., 1992). The quality of the model can be analyzed using a program such as PROCHECK or 3D-Profiler (Laskowski et al., 1993; Luthy et al., 1992; Bowie et al., 1991).
Homology modeling (also known as comparative modeling or knowledge-based modeling) methods can also be used to develop a three dimensional model from a polypeptide sequence based on the structures of known proteins. The method utilizes a computer model of a known protein, a computer representation of the amino acid sequence of the polypeptide with an unknown structure, and standard computer representations of the structures of amino acids. This method is well known to those skilled in the art (Greer, 1985; Blundell et al., 1988; Knighton et al., 1992). Computer programs that can be used in homology modeling are QUANTA and the Homology module in the Insight II modeling package distributed by Molecular Simulations Inc. (now part of Accelrys Inc., San Diego, Calif., United States of America), or MODELLER (Rockefeller University, New York, N.Y., United States of America). These computer programs can also be used for computational loop modeling techniques. See also Tosatto et al., 2002; Fiser et al., 2000.
Once a homology model has been generated it is analyzed to determine its correctness. A computer program available to assist in this analysis is the Protein Health module in QUANTA that provides a variety of tests. Other programs that provide structure analysis along with output include PROCHECK and 3D-Profiler (Luthy et al., 1992; Bowie et al., 1991). Once any irregularities have been resolved, the entire structure can be further refined.
Other molecular modeling techniques can also be employed in accordance with this invention. See e.g., Cohen et al., 1990; Navia & Murcko, 1992.
Under suitable circumstances, the entire process of solving a crystal structure can be accomplished in an automated fashion by a system such as ELVES (http://ucxray.berkeley.edu/-jamesh/elves/index.html) with little or no user intervention.
VI.A.2. X-ray Structure
The present invention provides methods for determining some or all of the structural coordinates for amino acids of a polypeptide of the invention, or a complex thereof.
In another aspect, the present invention provides methods for identifying a druggable region of a polypeptide of the invention. For example, one such method includes: (a) obtaining crystals of a polypeptide of the invention or a fragment thereof such that the three dimensional structure of the crystallized protein can be determined to a resolution of 2.5 Å or better; (b) determining the three dimensional structure of the crystallized polypeptide or fragment using X-ray diffraction; and (c) identifying a druggable region of a polypeptide of the invention based on the three-dimensional structure of the polypeptide or fragment.
A three dimensional structure of a molecule or complex can be described by the set of atoms that best predict the observed diffraction data (that is, which possesses a minimal R value). Files can be created for the structure that defines each atom by its chemical identity, spatial coordinates in three dimensions, root mean squared deviation from the mean observed position and fractional occupancy of the observed position.
Those of skill in the art understand that a set of structure coordinates for a protein, complex, or a portion thereof, is a relative set of points that define a shape in three dimensions. Thus, it is possible that an entirely different set of coordinates could define a similar or identical shape. Moreover, slight variations in the individual coordinates can have little affect on overall shape. Such variations in coordinates can be generated because of mathematical manipulations of the structure coordinates. For example, structure coordinates could be manipulated by crystallographic permutations of the structure coordinates, fractionalization of the structure coordinates, integer additions or subtractions to sets of the structure coordinates, inversion of the structure coordinates or any combination of the above. Alternatively, modifications in the crystal structure due to mutations, additions, substitutions, and/or deletions of amino acids, or other changes in any of the components that make up the crystal, could also yield variations in structure coordinates. Such slight variations in the individual coordinates will have little affect on overall shape. If such variations are within an acceptable standard error as compared to the original coordinates, the resulting three-dimensional shape is considered to be structurally equivalent. It should be noted that slight variations in individual structure coordinates of a polypeptide of the invention or a complex thereof would not be expected to significantly alter the nature of modulators that could associate with a druggable region thereof. Thus, for example, a modulator that bound to the active site of a polypeptide of the invention would also be expected to bind to or interfere with another active site whose structure coordinates define a shape that falls within the acceptable error.
A crystal structure of the present invention can be used to make a structural or computer model of the polypeptide, complex, or portion thereof. A model can represent the secondary, tertiary, and/or quaternary structure of the polypeptide, complex, or portion. The configurations of points in space derived from structure coordinates according to the invention can be visualized as, for example, a holographic image, a stereodiagram, a model, or a computer-displayed image, and the invention thus includes such images, diagrams, or models.
VI.A.3. Structural Equivalents
Various computational analyses can be used to determine whether a molecule or the active site portion thereof is structurally equivalent with respect to its three-dimensional structure, to all or part of a structure of a polypeptide of the invention or a portion thereof.
For the purpose of this invention, any molecule or complex or portion thereof, that has a root mean square deviation of conserved residue backbone atoms (N, Cα, C, O) of less than about 1.75 Å, when superimposed on the relevant backbone atoms described by the reference structure coordinates of a polypeptide of the invention, is considered “structurally equivalent” to the reference molecule. That is to say, the crystal structures of those portions of the two molecules are substantially identical, within acceptable error. Alternatively, the root mean square deviation can be is less than about 1.50, 1.40, 1.25, 1.0, 0.75, 0.5 or 0.35 Å.
The term “root mean square deviation” is understood in the art and means the square root of the arithmetic mean of the squares of the deviations. It is a way to express the deviation or variation from a trend or object.
In another aspect, the present invention provides a scalable three-dimensional configuration of points, at least a portion of said points, and preferably all of said points, derived from structural coordinates of at least a portion of a polypeptide of the invention and having a root mean square deviation from the structure coordinates of the polypeptide of the invention of less than 1.50, 1.40, 1.25, 1.0, 0.75, 0.5 or 0.35 Å. In certain embodiments, the portion of a polypeptide of the invention is 25%, 33%, 50%, 66%, 75%, 85%, 90%, or 95% or more of the amino acid residues contained in the polypeptide.
In another aspect, the present invention provides a molecule or complex including a druggable region of a polypeptide of the invention, the druggable region being defined by a set of points having a root mean square deviation of less than about 1.75 Å from the structural coordinates for points representing (a) the backbone atoms of the amino acids contained in a druggable region of a polypeptide of the invention, (b) the side chain atoms (and optionally the Cα atoms) of the amino acids contained in such druggable region, or (c) all the atoms of the amino acids contained in such druggable region. In certain embodiments, only a portion of the amino acids of a druggable region can be included in the set of points, such as 25%, 33%, 50%, 66%, 75%, 85%, 90% or 95% or more of the amino acid residues contained in the druggable region. In certain embodiments, the root mean square deviation can be less than 1.50, 1.40, 1.25, 1.0, 0.75, 0.5, or 0.35 Å. In still other embodiments, instead of a druggable region, a stable domain, fragment, or structural motif is used in place of a druggable region.
VI.A.4. Machine Displays and Machine Readable Storage Media
The invention provides a machine-readable storage medium including a data storage material encoded with machine readable data which, when using a machine programmed with instructions for using said data, displays a graphical three-dimensional representation of any of the molecules or complexes, or portions thereof, of this invention. In another embodiment, the graphical three-dimensional representation of such molecule, complex, or portion thereof includes the root mean square deviation of certain atoms of such molecule by a specified amount, such as the backbone atoms by less than 1.5 Å. In another embodiment, a structural equivalent of such molecule, complex, or portion thereof, can be displayed. In another embodiment, the portion can include a druggable region of the polypeptide of the invention.
According to one embodiment, the invention provides a computer for determining at least a portion of the structure coordinates corresponding to X-ray diffraction data obtained from a molecule or complex, wherein said computer includes: (a) a machine-readable data storage medium comprising a data storage material encoded with machine-readable data, wherein said data comprises at least a portion of the structural coordinates of a polypeptide of the invention; (b) a machine-readable data storage medium comprising a data storage material encoded with machine-readable data, wherein said data comprises X-ray diffraction data from said molecule or complex; (c) a working memory for storing instructions for processing said machine-readable data of (a) and (b); (d) a central-processing unit coupled to said working memory and to said machine-readable data storage medium of (a) and (b) for performing a Fourier transform of the machine readable data of (a) and for processing said machine readable data of (b) into structure coordinates; and (e) a display coupled to said central-processing unit for displaying said structure coordinates of said molecule or complex. In certain embodiments, the structural coordinates displayed are structurally equivalent to the structural coordinates of a polypeptide of the invention.
In an alternative embodiment, the machine-readable data storage medium includes a data storage material encoded with a first set of machine readable data which includes the Fourier transform of the structure coordinates of a polypeptide of the invention or a portion thereof, and which, when using a machine programmed with instructions for using said data, can be combined with a second set of machine readable data including the X-ray diffraction pattern of a molecule or complex to determine at least a portion of the structure coordinates corresponding to the second set of machine readable data.
For example, a system for reading a data storage medium can include a computer including a central processing unit (CPU), a working memory which can be, i.e., random access memory (RAM) or “core” memory, mass storage memory (such as one or more disk drives or CD-ROM drives), one or more display devices (i.e., cathode-ray tube (“CRT”) displays, light emitting diode (LED) displays, liquid crystal displays (LCDs), electroluminescent displays, vacuum fluorescent displays, field emission displays (FEDs), plasma displays, projection panels, etc.), one or more user input devices (i.e., keyboards, microphones, mice, touch screens, etc.), one or more input lines, and one or more output lines, all of which are interconnected by a conventional bidirectional system bus. The system can be a stand-alone computer, or can be networked (i.e., through local area networks, wide area networks, intranets, extranets, or the internet) to other systems (i.e., computers, hosts, servers, etc.). The system can also include additional computer controlled devices such as consumer electronics and appliances.
Input hardware can be coupled to the computer by input lines and can be implemented in a variety of ways. Machine-readable data of this invention can be inputted via the use of a modem or modems connected by a telephone line or dedicated data line. Alternatively or additionally, the input hardware can include CD-ROM drives or disk drives. In conjunction with a display terminal, a keyboard can also be used as an input device.
Output hardware can be coupled to the computer by output lines and can similarly be implemented by conventional devices. By way of example, the output hardware can include a display device for displaying a graphical representation of an active site of this invention using a program such as QUANTA as described herein. Output hardware might also include a printer, so that hard copy output can be produced, or a disk drive, to store system output for later use.
In operation, a CPU coordinates the use of the various input and output devices, coordinates data accesses from mass storage devices, accesses to and from working memory, and determines the sequence of data processing steps. A number of programs can be used to process the machine-readable data of this invention. Such programs are discussed in reference to the computational methods of drug discovery as described herein. References to components of the hardware system are included as appropriate throughout the following description of the data storage medium.
Machine-readable storage devices useful in the present invention include, but are not limited to, magnetic devices, electrical devices, optical devices, and combinations thereof. Examples of such data storage devices include, but are not limited to, hard disk devices, CD devices, digital video disk devices, floppy disk devices, removable hard disk devices, magneto-optic disk devices, magnetic tape devices, flash memory devices, bubble memory devices, holographic storage devices, and any other mass storage peripheral device. It should be understood that these storage devices include necessary hardware (i.e., drives, controllers, power supplies, etc.) as well as any necessary media (i.e., disks, flash cards, etc.) to enable the storage of data.
In one embodiment, the present invention contemplates a computer readable storage medium comprising structural data, wherein the data include the identity and three-dimensional coordinates of a polypeptide of the invention or portion thereof. In another aspect, the present invention contemplates a database comprising the identity and three-dimensional coordinates of a polypeptide of the invention or a portion thereof. Alternatively, the present invention contemplates a database comprising a portion or all of the atomic coordinates of a polypeptide of the invention or portion thereof.
VI.A.5. Structurally Similar Molecules and Complexes
Structural coordinates for a polypeptide of the invention can be used to aid in obtaining structural information about another molecule or complex. This method of the invention allows determination of at least a portion of the three-dimensional structure of molecules or molecular complexes that contain one or more structural features that are similar to structural features of a polypeptide of the invention. Similar structural features can include, for example, regions of amino acid identity, conserved active site or binding site motifs, and similarly arranged secondary structural elements (i.e., a helices and 3 sheets). Many of the methods described above for determining the structure of a polypeptide of the invention can be used for this purpose as well.
For the present invention, a “structural homolog” is a polypeptide that contains one or more amino acid substitutions, deletions, additions, or rearrangements with respect to the amino acid sequence of SEQ ID NOs: 2 or 4 or other polypeptide of the invention, but that, when folded into its native conformation, exhibits or is reasonably expected to exhibit at least a portion of the tertiary (three-dimensional) structure of the polypeptide encoded by SEQ ID NOs: 2 or 4 or such other polypeptide of the invention. For example, structurally homologous molecules can contain deletions or additions of one or more contiguous or noncontiguous amino acids, such as a loop or a domain. Structurally homologous molecules also include modified polypeptide molecules that have been chemically or enzymatically derivatized at one or more constituent amino acids, including side chain modifications, backbone modifications, and N— and C-terminal modifications including acetylation, hydroxylation, methylation, amidation, and the attachment of carbohydrate or lipid moieties, cofactors, and the like.
By using molecular replacement, all or part of the structure coordinates of a polypeptide of the invention can be used to determine the structure of a crystallized molecule or complex whose structure is unknown more quickly and efficiently than attempting to determine such information ab initio. For example, in one embodiment this invention provides a method of utilizing molecular replacement to obtain structural information about a molecule or complex whose structure is unknown including: (a) crystallizing the molecule or complex of unknown structure; (b) generating an X-ray diffraction pattern from said crystallized molecule or complex; and (c) applying at least a portion of the structure coordinates for a polypeptide of the invention to the X-ray diffraction pattern to generate a three-dimensional electron density map of the molecule or complex whose structure is unknown.
In another aspect, the present invention provides a method for generating a preliminary model of a molecule or complex whose structure coordinates are unknown, by orienting and positioning the relevant portion of a polypeptide of the invention within the unit cell of the crystal of the unknown molecule or complex so as best to account for the observed X-ray diffraction pattern of the crystal of the molecule or complex whose structure is unknown.
Structural information about a portion of any crystallized molecule or complex that is sufficiently structurally similar to a portion of a polypeptide of the invention can be resolved by this method. In addition to a molecule that shares one or more structural features with a polypeptide of the invention, a molecule that has similar bioactivity, such as the same catalytic activity, substrate specificity or ligand-binding activity as a polypeptide of the invention, can also be sufficiently structurally similar to a polypeptide of the invention to permit use of the structure coordinates for a polypeptide of the invention to solve its crystal structure.
In another aspect, the method of molecular replacement is utilized to obtain structural information about a complex containing a polypeptide of the invention, such as a complex between a modulator and a polypeptide of the invention (or a domain, fragment, ortholog, homolog etc. thereof). In certain instances, the complex includes a polypeptide of the invention (or a domain, fragment, ortholog, homolog etc. thereof) co-complexed with a modulator. For example, in one embodiment, the present invention contemplates a method for making a crystallized complex comprising a polypeptide of the invention, or a fragment thereof, and a compound having a molecular weight of less than 5 kDa, the method comprising: (a) crystallizing a polypeptide of the invention such that the crystals will diffract X-rays to a resolution of 2.5 Å or better; and (b) soaking the crystal in a solution comprising the compound having a molecular weight of less than 5 kDa, thereby producing a crystallized complex comprising the polypeptide and the compound.
Using homology modeling, a computer model of a structural homolog or other polypeptide can be built or refined without crystallizing the molecule. For example, in another aspect, the present invention provides a computer-assisted method for homology modeling a structural homolog of a polypeptide of the invention including: aligning the amino acid sequence of a known or suspected structural homolog with the amino acid sequence of a polypeptide of the invention and incorporating the sequence of the homolog into a model of a polypeptide of the invention derived from atomic structure coordinates to yield a preliminary model of the homolog; subjecting the preliminary model to energy minimization to yield an energy minimized model; remodeling regions of the energy minimized model where stereochemistry restraints are violated to yield a final model of the homolog.
In another embodiment, the present invention contemplates a method for determining the crystal structure of a homolog of a polypeptide having SEQ ID NO: 2 or SEQ ID NO: 4, or equivalent thereof, the method comprising: (a) providing the three dimensional structure of a crystallized polypeptide having SEQ ID NO: 2 or SEQ ID NO: 4, or a fragment thereof; (b) obtaining crystals of a homologous polypeptide comprising an amino acid sequence that is at least 80% identical to the amino acid sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 4 such that the three dimensional structure of the crystallized homologous polypeptide can be determined to a resolution of 2.5 Å or better; and (c) determining the three dimensional structure of the crystallized homologous polypeptide by X-ray crystallography based on the atomic coordinates of the three dimensional structure provided in step (a). In certain instances of the foregoing method, the atomic coordinates for the homologous polypeptide have a root mean square deviation from the backbone atoms of the polypeptide having SEQ ID NO: 2 or SEQ ID NO: 4, or a fragment thereof, of not more than 1.5 Å for all backbone atoms shared in common with the homologous polypeptide and the polypeptide having SEQ ID NO: 2 or SEQ ID NO: 4, or a fragment thereof.
In another aspect, the present invention provides a method for building a model for the activated conformation of CAR, using the repressed structure of Table 2 as a template. In one embodiment, the method comprises: (a) taking the coordinates for residues 107 to 332 directly from Table 2, effectively assuming that the conformation of this portion of CAR is similar or identical in the activated and repressed states; (b) rotating and translating an X-ray structure of VDR, the Vitamin-D receptor, so as to superimpose its core backbone atoms onto corresponding atoms from CAR; (c) combining the superimposed VDR AF2 helix, residues 416-423, with residues 107-332 from the initial CAR model of step (a), to serve as the starting model for residues 107-332 and 341-348 of the CAR protein in the activated conformation; (d) computationally mutating Val418, Leu419, Val421, Phe422 and Gly423 in the transplanted VDR AF2 helix to the corresponding amino acid types in the CAR AF2 helix, which are Leu343, Gln344, Ile346, Cys347 and Ser348, respectively; and (e) adjusting the conformations of the mutated amino acid side-chains in the AF2 helix of the CAR model, residues 343, 344, and 346-348, to avoid overlaps by using either manual manipulation within molecular graphics programs or conformational search and energy minimization. In one embodiment, the method further comprises modeling the CAR AF2 linker region, residues 333-340, by using a computational loop modeling technique, recognizing that the calculated linker conformation would probably deviate considerably from the actual linker conformation.
VII. Formation of CAR Ligand-Binding Domain-Ligand Crystals
The present invention provides crystals of CAR LBD in complex with the ligand. The crystals were obtained using the methodology disclosed in the Examples. The CAR LBD-ligand crystals, which can be native or derivative crystals, have orthorhombic unit cells (an orthorhombic unit cell is a unit cell wherein a≠b≠c, and wherein α=β=γ=90°) and space group symmetry P212121. There are four CAR LBD molecules in the asymmetric unit. In this CAR crystalline form, the unit cell has dimensions of a=83.0 Å, b=116.8 Å, c=131.9 Å, and α=β=γ=90°. This crystal form can be formed in a crystallization reservoir comprising 1 μl of the protein-ligand solutions disclosed herein, and 1 μl of well buffer (e.g. 100-400 mM sodium potassium tartrate, pH 7.1-7.4).
The native and derivative co-crystals comprising a CAR. LBD and a ligand disclosed in the present invention can be obtained by a variety of techniques, including batch, liquid bridge, dialysis, vapor diffusion and hanging drop methods (see e.g., McPherson, 1982; McPherson, 1990; Weber, 1991). In one embodiment, the vapor diffusion and hanging drop methods are used for the crystallization of CAR polypeptides and fragments thereof.
Native crystals of the present invention can be grown by dissolving a substantially pure CAR polypeptide or a fragment thereof, and optionally a ligand, in an aqueous buffer containing a precipitant at a concentration just below that necessary to precipitate the protein. Water is removed by controlled evaporation to produce precipitating conditions, which are maintained until crystal growth ceases.
In one embodiment of the invention, native crystals are grown by vapor diffusion (See e.g., McPherson, 1982; McPherson, 1990). In this method, the polypeptide/precipitant solution is allowed to equilibrate in a closed container with a larger aqueous reservoir having a precipitant concentration optimal for producing crystals. Generally, less than about 25 μL of CAR polypeptide solution is mixed with an equal volume of reservoir solution, giving a precipitant concentration about half that required for crystallization. This solution is suspended as a droplet underneath a coverslip, which is sealed onto the top of the reservoir. The sealed container is allowed to stand until crystals grow. Crystals generally form within two to six weeks, and are suitable for data collection within approximately seven to ten weeks. Of course, those of skill in the art will recognize that the above-described crystallization procedures and conditions can be varied.
VIII. Solving a Crystal Structure of the Present Invention
Crystal structures of the present invention can be solved using a variety of techniques including, but not limited to isomorphous replacement, anomalous scattering, or molecular replacement methods. Computer software packages can also be used to solve a crystal structure of the present invention. Applicable software packages include, but are not limited to X-PLOR™ program (Brünger, 1992; available from Accelrys Inc, San Diego, Calif., United States of America), Xtal View (McRee, 1992; available from the San Diego Supercomputer Center, San Diego, Calif., United States of America); SHELXS 97 (Sheldrick, 1990; available from the Institute of Inorganic Chemistry, Georg-August-Universität, Gottingen, Germany); HEAVY (Terwilliger, Los Alamos National Laboratory) and SHAKE-AND-BAKE (Hauptman, 1997; Weeks et al., 1993; available from the Hauptman-Woodward Medical Research Institute, Buffalo, N.Y., United States of America). See also, Ducruix & Geige, 1992, and references cited therein.
IX. The Overall Structure of CARα in Complex With a Ligand
The structure of the LBD of CAR bound with Compound 1 has been determined to 2.15 Å. The statistics of the data and the refined structure are summarized in Table 1.
R.M.S.D. is the root mean square deviation from ideal geometry.
aRsym = Σ |Iavg − Ii|/Σ Ii
bRfactor = Σ |FP − FPcalc|/Σ Fp, where Fp and Fpcalc are observed and calculated structure factors, Rfree is calculated from a randomly chosen 10% of reflections that were never used in refinement and Rfactor is calculated for the remaining 90% of reflections.
In its complex with Compound 1, an inverse agonist, the CAR LBD has a structure with approximately 11 alpha helices and a beta-sheet with 3 strands, as shown in
An inverse agonist such as Compound 1 or an antagonist could reduce gene transcription by shifting the AF2 helix into an alternative position, as has been observed with estrogen receptor (ER) bound to antagonists such as tamoxifen and raloxifene (Shiau et al., 1998). Alternatively, an inverse agonist or antagonist could act by unwinding the AF2 helix without necessarily moving it from its active position. Further analysis of the CAR X-ray structure suggests that helix-X interferes with the formation of the AF2 helix. Also, side-chains from Met339 and Met340, in and adjacent to helix-X, make extensive interactions with Compound 1. This suggests that Compound 1 induces the formation of helix-X, which in turn unwinds the AF2 helix, thereby preventing coactivator binding and shutting down gene transcription.
More generally, the analysis of the X-ray structure suggests that CAR exists in equilibrium with at least two major conformations. One conformation is an “activated conformation”, not yet observed by X-ray crystallography, where the AF2 helix is properly formed and resides in its active position. The second major conformation is an inactivated conformation, exemplified by the complex of CAR with Compound 1, where helix-X is present and the AF2 helix is absent. While the inventors do not wish to be bound by any particular hypothesized mechanism of action, it appears that, in the absence of ligand, CAR exists predominantly in the activated conformation. Agonist and “superagonist” compounds would tend to shift the equilibrium even farther towards this activated form, effectively increasing the fraction of the CAR receptor in the activated state to a level higher than that observed in the absence of ligand. Inverse agonists, such as Compound 1, would act by shifting the equilibrium towards the inactivated conformation, effectively decreasing the fraction of the CAR receptor in the activated state.
The structure of CAR revealed a number of other major structural differences when compared with the structures of PXR and VDR. The CAR X-ray structure allowed an accurate alignment of helix-1, confirming that PXR and VDR have 45 and 51 additional residues, respectively, in the region between helix-1 and helix-3. The conformation of this insert is unknown in VDR, as the available X-ray structures were determined with a construct where this insert was deleted. The full insert was present in the construct used for the PXR X-ray structure, and most of the insert was visible in the electron density. Surprisingly, in PXR, a segment from this insert acts to displace helix-6 from its usual position where it covers the ligand-binding pocket. This segment adopts an extended conformation that occupies less volume than helix-6, effectively opening up additional volume for the ligand in the PXR ligand-binding pocket. While the inventors do not wish to be bound by any particular hypothesized mechanism of action, based on the PXR X-ray structure and the similarity of the CAR amino acid sequence to PXR, one might expect that helix-6 would be absent or displaced away from the ligand-binding pocket, and that the ligand-binding pocket would be similarly voluminous. However, the X-ray structure of CAR reveals that helix-6 is present in CAR, and located in a position similar to that in VDR where it serves as one wall for the ligand-binding pocket. This reduces the volume available to the ligand in the ligand-binding pocket, and changes the shape of the pocket substantially. The pocket volume was calculated with the GRASP program using the atomic radii of Bondi, 1964, using a procedure where the MVP program is used to close channels to the external solvent. With this procedure, the CAR pocket has a volume of 824 Å3, similar to that of VDR, which has a volume of 871 Å3 when bound to Vitamin D, but much smaller than PXR, which has a volume of 1150-1544 Å3, depending on the ligand complexed to the protein.
The structure of the LBD of CAR comprises 11 main alpha helices, a beta sheet with 4 strands, and additional irregular structure and shorter helices. The key features are shown in
The ligand-binding site can be divided into two chambers (
As shown in
X. Rational Drug Design
X.A. Generally
Modulators to polypeptides of the invention and other structurally related molecules, and complexes containing the same, can be identified and developed as set forth below and otherwise using techniques and methods known to those of skill in the art.
The present invention contemplates making any molecule that is shown to modulate the activity of a polypeptide of the invention.
In another embodiment, inhibitors, modulators of the subject polypeptides, or biological complexes containing them, can be used in the manufacture of a medicament for any number of uses, including, for example, treating any disease or other treatable condition of a patient (including humans and animals), and particularly a disease caused by aberrant CAR regulation or activity.
A number of techniques can be used to screen, identify, select, and design chemical entities capable of associating with polypeptides of the invention, structurally homologous molecules, and other molecules. Knowledge of the structure for a polypeptide of the invention, determined in accordance with the methods described herein, permits the design and/or identification of molecules and/or other modulators which have a shape complementary to the conformation of a polypeptide of the invention, or more particularly, a druggable region thereof. It is understood that such techniques and methods can use, in addition to the exact structural coordinates and other information for a polypeptide of the invention, structural equivalents thereof described above (including, for example, those structural coordinates that are derived from the structural coordinates of amino acids contained in a druggable region as described above).
The term “chemical entity”, as used herein, refers to chemical compounds, complexes of two or more chemical compounds, and fragments of such compounds or complexes. In certain instances, it is desirable to use chemical entities exhibiting a wide range of structural and functional diversity, such as compounds exhibiting different shapes (i.e., flat aromatic rings(s), puckered aliphatic rings(s), straight and branched chain aliphatics with single, double, or triple bonds) and diverse functional groups (i.e., carboxylic acids, esters, ethers, amines, aldehydes, ketones, and various heterocyclic rings).
In one aspect, the method of drug design generally includes computationally evaluating the potential of a selected chemical entity to associate with any of the molecules or complexes of the present invention (or portions thereof). For example, this method can include the steps of (a) employing computational means to perform a fitting operation between the selected chemical entity and a druggable region of the molecule or complex; and (b) analyzing the results of said fitting operation to quantify the association between the chemical entity and the druggable region.
A chemical entity can be examined either through visual inspection or through the use of computer modeling using a docking program such as GRAM, DOCK, or AUTODOCK (Dunbrack et al., 1997). This procedure can include computer fitting of chemical entities to a target to ascertain how well the shape and the chemical structure of each chemical entity will complement or interfere with the structure of the subject polypeptide (Bugg et al, 1993; West et al, 1995). Computer programs can also be employed to estimate the attraction, repulsion, and steric hindrance of the chemical entity to a druggable region, for example. Generally, the tighter the fit (i.e., the lower the steric hindrance, and/or the greater the attractive force) the more potent the chemical entity will be because these properties are consistent with a tighter binding constant. Furthermore, the more specificity in the design of a chemical entity the more likely that the chemical entity will not interfere with related proteins, which can minimize potential side-effects due to unwanted interactions.
A variety of computational methods for molecular design, in which the steric and electronic properties of druggable regions are used to guide the design of chemical entities, are known. See e.g., Cohen et al., 1990; Kuntz et al., 1982; DesJarlais, 1988; Bartlett et al., 1989; Goodford et al., 1985; DesJarlais et al., 1986. Directed methods generally fall into two categories: (1) design by analogy in which 3-D structures of known chemical entities (such as from a crystallographic database) are docked to the druggable region and scored for goodness-of-fit; and (2) de novo design, in which the chemical entity is constructed piece-wise in the druggable region. The chemical entity can be screened as part of a library or a database of molecules. Databases which can be used include ACD (MDL Systems Inc., San Leandro, Calif., United States of America), NCI (National Cancer Institute, Bethesda, Md., United States of America), CCDC (Cambridge Crystallographic Data Center, Cambridge, England, United Kingdom), CAST (Chemical Abstract Service), Derwent (Derwent Information Limited, London, England, United Kingdom), Maybridge (Maybridge Chemical Company Ltd., Cornwall, England, United Kingdom), Aldrich (Aldrich Chemical Company, St. Louis, Mo., United States of America), DOCK (University of California in San Francisco, San Francisco, Calif., United States of America), and the Directory of Natural Products (Chapman & Hall). Computer programs such as CONCORD (Tripos Inc., St. Louis, Mo., United States of America) or DB-Converter (Molecular Simulations Limited, Cambridge, England, United Kingdom) can be used to convert a data set represented in two dimensions to one represented in three dimensions.
Chemical entities can be tested for their capacity to fit spatially with a druggable region or other portion of a target protein. As used herein, the term “fits spatially” means that the three-dimensional structure of the chemical entity is accommodated geometrically by a druggable region. A favorable geometric fit occurs when the surface area of the chemical entity is in close proximity with the surface area of the druggable region without forming unfavorable interactions. A favorable complementary interaction occurs where the chemical entity interacts by hydrophobic, aromatic, ionic, dipolar, or hydrogen donating and accepting forces. Unfavorable interactions can be steric hindrance between atoms in the chemical entity and atoms in the druggable region.
If a model of the present invention is a computer model, the chemical entities can be positioned in a druggable region through computational docking. If, on the other hand, the model of the present invention is a structural model, the chemical entities can be positioned in the druggable region by, for example, manual docking. As used herein the term “docking” refers to a process of placing a chemical entity in close proximity with a druggable region, or a process of finding low energy conformations of a chemical entity/druggable region complex.
In an illustrative embodiment, the design of potential modulator begins from the general perspective of shape complimentary for the druggable region of a polypeptide of the invention, and a search algorithm is employed which is capable of scanning a database of small molecules of known three-dimensional structure for chemical entities which fit geometrically with the target druggable region. Most algorithms of this type provide a method for finding a wide assortment of chemical entities that are complementary to the shape of a druggable region of the subject polypeptide. Each of a set of chemical entities from a particular data-base, such as the Cambridge Crystallographic Data Bank (CCDB) (Allen et al., 1973), is individually docked to the druggable region of a polypeptide of the invention in a number of geometrically permissible orientations with use of a docking algorithm. In certain embodiments, a set of computer algorithms called DOCK, can be used to characterize the shape of invaginations and grooves that form the active sites and recognition surfaces of the druggable region (Kuntz et al., 1982). The program can also search a database of small molecules for templates whose shapes are complementary to particular binding sites of a polypeptide of the invention (DesJarlais et al, 1988).
The orientations are evaluated for goodness-of-fit and the best are kept for further examination using molecular mechanics programs, such as AMBER or CHARMM. Such algorithms have previously proven successful in finding a variety of chemical entities that are complementary in shape to a druggable region.
Goodford et al, 1985 and Boobbyer et al., 1989 have produced a computer program (GRID) that seeks to determine regions of high affinity for different chemical groups (termed probes) of the druggable region. GRID hence provides a tool for suggesting modifications to known chemical entities that might enhance binding. It can be anticipated that some of the sites discerned by GRID as regions of high affinity correspond to “pharmacophoric patterns” determined inferentially from a series of known ligands. As used herein, a “pharmacophoric pattern” is a geometric arrangement of features of chemical entities that is believed to be important for binding. Attempts have been made to use pharmacophoric patterns as a search screen for novel ligands (Jakes et al., 1987; Brint & Willett, 1987; Jakes et al., 1986).
Yet a further embodiment of the present invention utilizes a computer algorithm such as CLIX which searches such databases as CCDB for chemical entities which can be oriented with the druggable region in a way that is both sterically acceptable and has a high likelihood of achieving favorable chemical interactions between the chemical entity and the surrounding amino acid residues. The method is based on characterizing the region in terms of an ensemble of favorable binding positions for different chemical groups and then searching for orientations of the chemical entities that cause maximum spatial coincidence of individual candidate chemical groups with members of the ensemble. The algorithmic details of CLIX are described in Lawrence et al., 1992.
In this way, the efficiency with which a chemical entity can bind to or interfere with a druggable region can be tested and optimized by computational evaluation. For example, for a favorable association with a druggable region, a chemical entity must preferably demonstrate a relatively small difference in energy between its bound and fine states (i.e., a small deformation energy of binding). Thus, certain, more desirable chemical entities will be designed with a deformation energy of binding of not greater than about 10 kcal/mole, and more preferably, not greater than 7 kcal/mole. Chemical entities can interact with a druggable region in more than one conformation that is similar in overall binding energy. In those cases, the deformation energy of binding is taken to be the difference between the energy of the free entity and the average energy of the conformations observed when the chemical entity binds to the target.
In this way, the present invention provides computer-assisted methods for identifying or designing a potential modulator of the activity of a polypeptide of the invention including: supplying a computer modeling application with a set of structure coordinates of a molecule or complex, the molecule or complex including at least a portion of a druggable region from a polypeptide of the invention; supplying the computer modeling application with a set of structure coordinates of a chemical entity; and determining whether the chemical entity is expected to bind to the molecule or complex, wherein binding to the molecule or complex is indicative of potential modulation of the activity of a polypeptide of the invention.
In another aspect, the present invention provides a computer-assisted method for identifying or designing a potential modulator to a polypeptide of the invention, supplying a computer modeling application with a set of structure coordinates of a molecule or complex, the molecule or complex including at least a portion of a druggable region of a polypeptide of the invention; supplying the computer modeling application with a set of structure coordinates for a chemical entity; evaluating the potential binding interactions between the chemical entity and active site of the molecule or molecular complex; structurally modifying the chemical entity to yield a set of structure coordinates for a modified chemical entity, and determining whether the modified chemical entity is expected to bind to the molecule or complex, wherein binding to the molecule or complex is indicative of potential modulation of the polypeptide of the invention.
In one embodiment, a potential modulator can be obtained by screening a peptide library (Scott & Smith, 1990; Cwirla et al., 1990; Devlin et al., 1990). A potential modulator selected in this manner could then be systematically modified by computer modeling programs until one or more promising potential drugs are identified. Such analysis has been shown to be effective in the development of HIV protease inhibitors (Lam et al., 1994; Wlodawer et al., 1993; Appelt, 1993; Erickson, 1993). Alternatively a potential modulator can be selected from a library of chemicals such as those that can be licensed from third parties, such as chemical and pharmaceutical companies. A third alternative is to synthesize the potential modulator de novo.
For example, in certain embodiments, the present invention provides a method for making a potential modulator for a polypeptide of the invention, the method including synthesizing a chemical entity or a molecule containing the chemical entity to yield a potential modulator of a polypeptide of the invention, the chemical entity having been identified during a computer-assisted process including supplying a computer modeling application with a set of structure coordinates of a molecule or complex, the molecule or complex including at least one druggable region from a polypeptide of the invention; supplying the computer modeling application with a set of structure coordinates of a chemical entity; and determining whether the chemical entity is expected to bind to the molecule or complex at the active site, wherein binding to the molecule or complex is indicative of potential modulation. This method can further include the steps of evaluating the potential binding interactions between the chemical entity and the active site of the molecule or molecular complex and structurally modifying the chemical entity to yield a set of structure coordinates for a modified chemical entity, which steps can be repeated one or more times.
Once a potential modulator is identified, it can then be tested in any standard assay for the macromolecule depending of course on the macromolecule, including in high throughput assays. Further refinements to the structure of the modulator will generally be necessary and can be made by the successive iterations of any and/or all of the steps provided by the particular screening assay, in particular further structural analysis by i.e., 15N NMR relaxation rate determinations or X-ray crystallography with the modulator bound to the subject polypeptide. These studies can be performed in conjunction with biochemical assays.
Once identified, a potential modulator can be used as a model structure, and analogs to the compound can be obtained. The analogs are then screened for their ability to bind the subject polypeptide. An analog of the potential modulator might be chosen as a modulator when it binds to the subject polypeptide with a higher binding affinity than the predecessor modulator.
In a related approach, iterative drug design is used to identify modulators of a target protein. Iterative drug design is a method for optimizing associations between a protein and a modulator by determining and evaluating the three dimensional structures of successive sets of protein/modulator complexes. In iterative drug design, crystals of a series of protein/modulator complexes are obtained and then the three-dimensional structures of each complex is solved. Such an approach provides insight into the association between the proteins and modulators of each complex. For example, this approach can be accomplished by selecting modulators with inhibitory activity, obtaining crystals of this new protein/modulator complex, solving the three dimensional structure of the complex, and comparing the associations between the new protein/modulator complex and previously solved protein/modulator complexes. By observing how changes in the modulator affected the protein/modulator associations, these associations can be optimized.
In addition to designing and/or identifying a chemical entity to associate with a druggable region, as described above, the same techniques and methods can be used to design and/or identify chemical entities that either associate, or do not associate, with affinity regions, selectivity regions or undesired regions of protein targets. By such methods, selectivity for one or a few targets, or alternatively for multiple targets, from the same species or from multiple species, can be achieved.
For example, a chemical entity can be designed and/or identified for which the binding energy for one druggable region, i.e., an affinity region or selectivity region, is more favorable than that for another region, i.e., an undesired region, by about 20%, 30%, 50% to about 60% or more. It can be the case that the difference is observed between (a) more than two regions, (b) between different regions (selectivity, affinity or undesirable) from the same target, (c) between regions of different targets, (d) between regions of homologs from different species, or (e) between other combinations. Alternatively, the comparison can be made by reference to the Kd, usually the apparent Kd, of said chemical entity with the two or more regions in question.
In another aspect, prospective modulators are screened for binding to two nearby druggable regions on a target protein. For example, a modulator that binds a first region of a target polypeptide does not bind a second nearby region. Binding to the second region can be determined by monitoring changes in a different set of amide chemical shifts in either the original screen or a second screen conducted in the presence of a modulator (or potential modulator) for the first region. From an analysis of the chemical shift changes, the approximate location of a potential modulator for the second region is identified. Optimization of the second modulator for binding to the region is then carried out by screening structurally related compounds (i.e., analogs as described above).
When modulators for the first region and the second region are identified, their location and orientation in the ternary complex can be determined experimentally. On the basis of this structural information, a linked compound, i.e., a consolidated modulator, is synthesized in which the modulator for the first region and the modulator for the second region are linked. In certain embodiments, the two modulators are covalently linked to form a consolidated modulator. This consolidated modulator can be tested to determine if it has a higher binding affinity for the target than either of the two individual modulators. A consolidated modulator is selected as a modulator when it has a higher binding affinity for the target than either of the two modulators. Larger consolidated modulators can be constructed in an analogous manner, i.e., linking three modulators which bind to three nearby regions on the target to form a multilinked consolidated modulator that has an even higher affinity for the target than the linked modulator. In this example, it is assumed that is desirable to have the modulator bind to all the druggable regions. However, it can be the case that binding to certain of the druggable regions is not desirable, so that the same techniques can be used to identify modulators and consolidated modulators that show increased specificity based on binding to at least one but not all druggable regions of a target.
The present invention provides a number of methods that use drug design as described above. For example, in one aspect, the present invention contemplates a method for designing a candidate compound for screening for inhibitors of a polypeptide of the invention, the method comprising: (a) determining the three dimensional structure of a crystallized polypeptide of the invention or a fragment thereof; and (b) designing a candidate inhibitor based on the three dimensional structure of the crystallized polypeptide or fragment.
In another aspect, the present invention provides a method for identifying a potential inhibitor of a polypeptide of the invention, the method comprising: (a) providing the three-dimensional coordinates of a polypeptide of the invention or a fragment thereof; (b) identifying a druggable region of the polypeptide or fragment; and (c) selecting from a database at least one compound that comprises three dimensional coordinates which indicate that the compound can bind the druggable region; (d) wherein the selected compound is a potential inhibitor of a polypeptide of the invention.
In another aspect, the present invention contemplates a method for identifying a potential modulator of a molecule comprising a druggable region similar to that of SEQ ID NO: 2 or SEQ ID NO: 4, the method comprising: (a) using the atomic coordinates of amino acid residues from SEQ ID NO: 2 or SEQ ID NO: 4, or a fragment thereof, ± a root mean square deviation from the backbone atoms of the amino acids of not more than 1.5 Å, to generate a three-dimensional structure of a molecule comprising a druggable region that is a portion of SEQ ID NO: 2 or SEQ ID NO: 4; (b) employing the three dimensional structure to design or select the potential modulator; (c) synthesizing the modulator; and (d) contacting the modulator with the molecule to determine the ability of the modulator to interact with the molecule.
In another aspect, the present invention contemplates an apparatus for determining whether a compound is a potential inhibitor of a polypeptide having SEQ ID NO: 2 or SEQ ID NO: 4, the apparatus comprising: (a) a memory that comprises: (i) the three dimensional coordinates and identities of the atoms of a polypeptide of the invention or a fragment thereof that form a druggable site; and (ii) executable instructions; and (b) a processor that is capable of executing instructions to: (i) receive three-dimensional structural information for a candidate compound; (ii) determine if the three-dimensional structure of the candidate compound is complementary to the structure of the interior of the druggable site; and (iii) output the results of the determination.
In another aspect, the present invention contemplates a method for designing a potential compound for the prevention or treatment of a disease or disorder, the method comprising: (a) providing the three dimensional structure of a crystallized polypeptide of the invention, or a fragment thereof; (b) synthesizing a potential compound for the prevention or treatment of a disease or disorder based on the three dimensional structure of the crystallized polypeptide or fragment; (c) contacting a polypeptide of the present invention or a PDE with the potential compound; and (d) assaying the activity of a polypeptide of the present invention, wherein a change in the activity of the polypeptide indicates that the compound can be useful for prevention or treatment of a disease or disorder.
In another aspect, the present invention contemplates a method for designing a potential compound for the prevention or treatment of a disease or disorder, the method comprising: (a) providing structural information of a druggable region derived from NMR spectroscopy of a polypeptide of the invention, or a fragment thereof; (b) synthesizing a potential compound for the prevention or treatment of a disease or disorder based on the structural information; (c) contacting a polypeptide of the present invention or a PDE with the potential compound; and (d) assaying the activity of a polypeptide of the present invention, wherein a change in the activity of the polypeptide indicates that the compound can be useful for prevention or treatment of a disease or disorder.
X.B. Methods of Designing CAR LBD Ligand Compounds
As discussed above, the analysis of the CAR X-ray structure suggests that CAR can adopt at least two major conformations. One major conformation corresponds to the activated state of CAR, where helix-X is absent, and where the AF2 helix is properly formed and resides in its active position. The second major conformation corresponds to the inactivated conformation, exemplified by the complex of CAR with Compound 1, where helix-X is present and where the AF2 helix is absent. In both conformations, the ligand-binding pocket is capped by the C-terminal tail, residues 340-348. These residues adopt different conformations in the activated and inactivated states of CAR, effectively covering the pocket with a cap that can assume at least two alternative shapes. Some CAR ligands might bind preferentially to the activated conformation of CAR, whereas some other CAR ligands might bind preferentially to the inactivated conformation of CAR. There might also be some ligands that bind equally well to either conformation of CAR. When a ligand binds preferentially to a particular conformational state, it will lower the energy of that state, thereby shifting the equilibrium towards that state, and increasing the fraction of the CAR receptor that exists in that state. This thermodynamic principle can be used together with the three dimensional structure of CAR to design chemical compounds that bind to specific conformational states of CAR, thereby increasing or decreasing the level of transcription in genes regulated by CAR.
The present X-ray structure of CAR bound to Compound 1 provides an accurate three-dimensional structure of the ligand-binding pocket in the inactivated conformational state of CAR. Novel ligands can be designed to fit this specific pocket using a variety of computational methods, discussed below. Alternatively, known ligands can be docked into the ligand-binding pocket, using a variety of docking programs and algorithms. These docked structures can be examined graphically to suggest chemical modifications that would improve their fit to the pocket, or their binding to the receptor. Alternatively, known ligands can be complexed with the CAR protein and crystallized using the methods of this invention, allowing the structure of the complex to be determined by X-ray crystallography. The three dimensional structures can be examined graphically to suggest chemical modifications that would improve their fit to the pocket, or their binding to the receptor.
The present X-ray structure of CAR can also be used as a template to build a three-dimensional model of the structure of the activated form of CAR. For example, residues 107 to 332, corresponding to helix-1 through most of helix-10, are taken to have exactly the same coordinates as in the template CAR structure. The AF2 helix, CAR residues 341-348, is then built using the structure of VDR as the template. The VDR template structure is superimposed onto the CAR structure using standard methods as disclosed herein and as would be apparent to one of ordinary skill in the art after a review of the present disclosure. The AF2 helix from VDR, residues 416-423, is then removed from the VDR template and transplanted into the model for CAR, without any adjustment of its coordinates. Five of the residues in the VDR AF2 helix have amino acid types different from the corresponding residues in the CAR AF2 helix. These residues are VDR Val418, Leu419, Val421, Phe422, and Gly423, which correspond to CAR Leu343, Gln344, Ile346, Cys347, and Ser348, respectively. These five residues are computationally “mutated” in the model, to obtain the covalent structure corresponding to the desired amino acids in CAR. The C-terminal Ser348 is further modified to obtain a free carboxylate as normally occurs at the C-terminal end of a protein chain.
These computational mutations can be carried out using amino acid replacement and builder functionality in molecular graphics programs such as Insight-II, available from Accelrys, or using non-graphical molecular mechanics software such as MVP. The side-chain conformations are then adjusted using computer graphics, such as Insight-II, or other energy-based procedures, such as in MVP, to obtain a reasonable overall fit. It is more difficult to obtain a reasonable conformation for the eight residues in the AF2 linker, CAR residues 333-340. The VDR linker, residues 407-415, cannot be used as the template for the CAR linker because it has nine residues, and because its N-terminal end-point is different from that required in CAR. Likewise, the PXR linker, residues 418-422, is too short to serve as a template for the CAR linker. For structure-based drug design, a conservative approach is to omit the linker residues rather than to model the linker incorrectly. Consequently, in one embodiment the linker, residues 333-340, is omitted from the activated CAR model. This model for the activated state of CAR then provides a binding site for the ligand design processes described elsewhere herein. Specifically, various computer software programs can be used to design novel ligands that would fit the specific pocket in the model for the activated form of CAR. Docking calculations can be used to predict how known CAR activators will bind to the activated form of CAR or to identify other available compounds that might bind. These predicted complex structures can then be examined by computer graphics to suggest specific chemical modifications that would enhance the binding to the activated state of CAR.
To be useful as a therapeutic agent, a chemical compound that acts through CAR must induce the appropriate level of CAR activity in relevant tissues. In principle, this can be achieved by adjusting the CAR conformational equilibrium so that appropriate fractions of the CAR protein exist in the activated and inactivated states. This in turn can be achieved with ligands that bind almost exclusively to one or the other of the two major conformational states. The design of ligands that are selective for a specific conformational state is facilitated by consideration of how these ligands might bind to each of the two conformational states. Binding modes can be obtained using docking calculations, and then examined graphically to suggest chemical modifications that would make binding to a particular conformational state either more favorable or less favorable. Iterative application of these techniques can yield ligands with the desired level of selectivity for the particular conformational state of CAR, thereby achieving the desired level of CAR activity. Ligands that can bind to both conformational states of the CAR protein can also be designed. This is also facilitated by consideration of how the ligands might bind to each of the two conformational states, using the same approach as discussed above, but this time seeking chemical structures and chemical modifications that would permit binding to both conformational states.
The methods of this invention can also be used to suggest possible chemical modifications of a compound that might reduce or minimize its effect on CAR. This approach can be useful in drug discovery projects aiming to find compounds that modulate the activity of some other target molecule, where modulation of CAR activity is an undesirable side effect. This approach is useful in engineering CAR activity out of other, non-drug molecules. Humans and other animals are exposed to a wide range of different chemical compounds, some of which might act on CAR in an undesirable manner. Such a compound could be complexed with CAR and crystallized using the methods of the present invention. The structure could then be determined by X-ray crystallography. Alternatively, the structure of the complex could be predicted computationally using molecular docking software. In this case, compounds that tend to activate CAR would be docked into a model or structure of the activated form of CAR, whereas compounds that tend to reduce the activity of CAR would be docked into a model or structure of an inactivated form of CAR, such as its complex with Compound 1 presented here.
Whether the structure is obtained by X-ray crystallography or computational methods, the structure would be examined by computer graphics to suggest chemical modifications that would minimize the tendency to bind to CAR. For example, substituents could be introduced onto the compound that would project into volume occupied by the CAR protein. Alternatively, a region of the molecule that binds to a lipophilic region of the CAR binding site could be modified to make it more polar, thus reducing its tendency to bind to CAR. Alternatively, a polar group of the compound that makes a hydrogen bonding interaction with CAR could be identified and modified to an alternative group that fails to make the hydrogen bond. Appropriate chemical modifications can be chosen such that the desirable properties and behavior of the compound would be retained.
The design of candidate substances, also referred to as “compounds” or “candidate compounds”, that bind to or modulate nuclear receptor (NR) LBD (for example, CAR LBD)-mediated activity according to the present invention generally involves consideration of two factors. First, the compound must be capable of chemically and structurally associating with a NR LBD. Non-covalent molecular interactions important in the association of a NR LBD with its substrate include hydrogen bonding, van der Waals interactions, and hydrophobic interactions. The interaction between an atom of an LBD amino acid and an atom of an LBD ligand can be made by any force or attraction described in nature. Usually the interaction between the atom of the amino acid and the ligand will be the result of a hydrogen bonding interaction, charge interaction, hydrophobic interaction, van der Waals interaction, or dipole interaction. In the case of the hydrophobic interaction, it is recognized that this is not a per se interaction between the amino acid and ligand, but rather the usual result, in part, of the repulsion of water or other hydrophilic groups from a hydrophobic surface. Reducing or enhancing the interaction of the LBD and a ligand can be measured by calculating or testing binding energies, either computationally or using thermodynamic or kinetic methods known in the art.
Second, the compound must be able to assume a conformation that allows it to associate with a NR LBD. Although certain portions of the compound will not directly participate in this association with a NR LBD, those portions can still influence the overall conformation of the molecule. This influence on conformation, in turn, can have a significant impact on potency. Such conformational requirements include the overall three-dimensional structure and orientation of the chemical entity or compound in relation to all or a portion of the binding site, e.g., the ligand-binding pocket or an accessory binding site of a NR LBD, or the spacing between functional groups of a compound comprising several chemical entities that directly interact with a NR LBD.
Chemical modifications can enhance or reduce interactions of an atom of a LBD amino acid and an atom of an LBD ligand. Steric hindrance can be a common approach for changing the interaction of a LBD binding pocket with an activation domain. Chemical modifications are introduced in one embodiment at C—H, C—, and C—OH positions in a ligand, where the carbon is part of the ligand structure that remains the same after modification is complete. In the case of C—H, C could have 1, 2, or 3 hydrogens, but usually only one hydrogen will be replaced. The H or OH can be removed after modification is complete and replaced with a desired chemical moiety.
The potential binding effect of a chemical compound on a NR LBD can be analyzed prior to its actual synthesis and testing by the use of computer modeling techniques that employ the coordinates of a crystalline NR LBD, for example a CAR LBD polypeptide of the present invention. If the theoretical structure of the given compound suggests insufficient interaction and association between it and a NR LBD, synthesis and testing of the compound is obviated. However, if computer modeling indicates a strong interaction, the molecule can then be synthesized and tested for its ability to bind and modulate the activity of a NR LBD. In this manner, synthesis of unproductive or inactive compounds can be avoided.
A binding compound of a NR LBD polypeptide (in one embodiment a CAR LBD) can be computationally evaluated and designed via a series of steps in which chemical entities or fragments are screened and selected for their ability to associate with an individual binding site or other area of a crystalline CAR LBD polypeptide of the present invention and to interact with the amino acids disposed in the binding sites.
Interacting amino acids forming contacts with a ligand and the atoms of the interacting amino acids are usually 2 to 4 angstroms away from the center of the atoms of the ligand. Generally these distances are determined by computer as discussed herein and in McRee, 1993. However distances can be determined manually once the three dimensional model is made. More commonly, the atoms of the ligand and the atoms of interacting amino acids are 3 to 4 angstroms apart. A ligand can also interact with distant amino acids, after chemical modification of the ligand to create a new ligand. Distant amino acids are generally not in contact with the ligand before chemical modification. A chemical modification can change the structure of the ligand to make a new ligand that interacts with a distant amino acid usually at least 4.5 angstroms away from the ligand. Distant amino acids rarely line the surface of the binding cavity for the ligand, as they are too far away from the ligand to be part of a pocket or surface of the binding cavity.
A compound designed or selected as binding to an NR polypeptide (in one embodiment a CAR LBD polypeptide) can be further computationally optimized so that in its bound state it would lack repulsive electrostatic interaction with the target polypeptide. Such non-complementary (e.g., electrostatic) interactions include repulsive charge-charge, dipole-dipole, and charge-dipole interactions. Specifically, the sum of all electrostatic interactions between the ligand and the polypeptide when the ligand is bound to an NR LBD make a neutral or favorable contribution to the enthalpy of binding.
One of several methods can be used to screen chemical entities or fragments for their ability to associate with a NR LBD and, more particularly, with the individual binding sites of a NR LBD, such as a ligand-binding pocket or an accessory binding site. This process can begin by visual inspection of, for example, a ligand-binding pocket on a computer screen based on the CAR LBD atomic coordinates disclosed in Tables 2-3. Selected fragments or chemical entities can then be positioned in a variety of orientations, or docked, within an individual binding site of a CAR LBD as defined herein above. Docking can be accomplished using software programs such as those available under the trade names QUANTA™ (available from Accelrys Inc, San Diego, Calif., United States of America) and SYBYL™ (available from Tripos, Inc., St. Louis, Mo., United States of America), followed by energy minimization and molecular dynamics with standard molecular mechanics force fields, such as CHARM (Brooks et al., 1993) and AMBER 5 (Case et al., 1997; Pearlman et al., 1995).
Specialized computer programs can also assist in the process of selecting fragments or chemical entities. These include:
1. GRID™ program, version 17 (Goodford, 1985), which is available from Molecular Discovery Ltd. of Oxford, United Kingdom;
2. MCSS™ program (Miranker & Karplus, 1991), which is available from Accelrys Inc, San Diego, Calif., United States of America;
3. AUTODOCK™ 3.0 program (Goodsell & Olsen, 1990), which is available from the Scripps Research Institute, La Jolla, Calif., United States of America;
4. DOCK™ 4.0 program (Kuntz et al., 1992), which is available from the University of California, San Francisco, Calif., United States of America;
5. FLEX-X™ program (See Rarey et al., 1996), which is available from Tripos, Inc., St. Louis, Mo., United States of America;
6. MVP program (Lambert, 1997); and
7. LUDI™ program (Bohm, 1992), which is available from Accelrys Inc, San Diego, Calif., United States of America.
Once suitable chemical entities or fragments have been selected, they can be assembled into a single compound or ligand. Assembly can proceed by visual inspection of the relationship of the fragments to each other on the three-dimensional image displayed on a computer screen in relation to the structure coordinates of a CAR LBD in complex with a co-regulator, optionally in further complex with a ligand. Manual model building using software such as QUANTA™ or SYBYL™ typically follows.
Useful programs to aid one of ordinary skill in the art in connecting the individual chemical entities or fragments include:
1. CAVEAT™ program (Bartlett et al., 1989), which is available from the University of California, Berkeley, Calif., United States of America;
2. 3D Database systems, such as MACCS-3D™ system program, which is available from MDL Information Systems, San Leandro, Calif., United States of America. This area is reviewed in Martin, 1992; and
3. HOOK™ program (Eisen et al., 1994), which is available from Accelrys Inc, San Diego, Calif., United States of America.
Instead of proceeding to build a NR LBD polypeptide ligand (in one embodiment a CAR LBD ligand) in a step-wise fashion one fragment or chemical entity at a time as described above, ligand compounds can be designed as a whole or de novo using the structural coordinates of a crystalline CAR LBD polypeptide of the present invention and either an empty binding site or optionally including some portion(s) of a known ligand(s). Applicable methods can employ the following software programs:
1. LUDI™ program (Bohm, 1992), which is available from Accelrys Inc, San Diego, Calif., United States of America;
2. LEGEND™ program (Nishibata & Itai, 1991); and
3. LEAPFROG™, which is available from Tripos Associates, St. Louis, Mo., United States of America.
Other molecular modeling techniques can also be employed in accordance with this invention. See e.g., Cohen et al., 1990; Navia & Murcko, 1992; and U.S. Pat. No. 6,008,033 to Abdel-Meauid et al., all of which are incorporated herein by reference.
Once a compound has been designed or selected by the above methods, the efficiency with which that compound can bind to a NR LBD can be tested and optimized by computational evaluation. By way of a particular example, a compound that has been designed or selected to function as a CAR LBD ligand can traverse a volume not overlapping that occupied by the binding site when it is bound to its native ligand. Additionally, an effective NR LBD ligand can demonstrate a relatively small difference in energy between its bound and free states (i.e., a small deformation energy of binding). Thus, the most efficient NR LBD ligands can be designed with a deformation energy of binding of in one embodiment not greater than about 10 kcal/mole, and in another embodiment not greater than 7 kcal/mole. It is possible for NR LBD ligands to interact with the polypeptide in more than one conformation that is similar in overall binding energy. In those cases, the deformation energy of binding is taken to be the difference between the energy of the free compound and the thermodynamic average energy of the conformations observed when the ligand binds to the polypeptide.
A compound designed or selected as binding to a NR LBD polypeptide (preferably a CAR polypeptide, more preferably a CAR LBD polypeptide) can be further computationally optimized so that in its bound state it would preferably lack repulsive electrostatic interaction with the target polypeptide. Such non-complementary (e.g., electrostatic) interactions include repulsive charge-charge, dipole-dipole, and charge-dipole interactions. Specifically, the sum of all electrostatic interactions between the ligand and the polypeptide when the ligand is bound to a NR LBD preferably make a neutral or favorable contribution to the enthalpy of binding.
Specific computer software is available in the art to evaluate compound deformation energy and electrostatic interaction. Examples of programs designed for such uses include:
1. GAUSSIAN 98™, which is available from Gaussian, Inc., Pittsburgh, Pa., United States of America;
2. AMBER™ program, version 6.0, which is available from the University of California, San Francisco, Calif., United States of America;
3. QUANTA™ program, which is available from Accelrys Inc, San Diego, Calif., United States of America;
4. CHARMM® program, which is available from Accelrys Inc, San Diego, Calif., United States of America; and
4. INSIGHT II® program, which is available from Accelrys Inc, San Diego, Calif. United States of America.
These programs can be implemented using a suitable computer system. Other hardware systems and software packages will be apparent to those skilled in the art after review of the disclosure of the present invention presented herein.
Once a NR LBD modulating compound has been optimally selected or designed, as described above, substitutions can then be made in some of its atoms or side groups in order to improve or modify its binding properties. In some cases, initial substitutions might be conservative, e.g., the replacement group will have approximately the same size, shape, hydrophobicity, and charge as the original group. In other cases, the replacement group will have different properties as desired to make specific interactions with the protein. Such substituted chemical compounds can then be analyzed for efficiency of fit to a NR LBD binding site using the same computer-based approaches described in detail above.
X.C. Sterically Similar Compounds A further aspect of the present invention is that sterically similar compounds can be formulated to mimic the key portions of a CAR LBD structure. Such compounds are functional equivalents. The generation of a structural functional equivalent can be achieved by the techniques of modeling and chemical design known to those of skill in the art and described herein. Modeling and chemical design of CAR and CAR LBD structural equivalents can be based on the structure coordinates of a crystalline CAR LBD polypeptide of the present invention. It will be understood that all such sterically similar constructs fall within the scope of the present invention.
XI. CAR Polypeptides
The generation of mutant and chimeric CAR polypeptides is also an aspect of the present invention. A chimeric polypeptide can comprise a CAR LBD polypeptide or a portion of a CAR LBD, (e.g. a CAR LBD) which is fused to a candidate polypeptide or a suitable region of the candidate polypeptide. Throughout the present disclosure it is intended that the term “mutant” encompass not only mutants of a CAR LBD polypeptide but chimeric proteins generated using a CAR LBD as well. It is thus intended that the following discussion of mutant CAR LBDs apply mutatis mutandis to chimeric CAR and CAR LBD polypeptides and to structural equivalents thereof.
In accordance with the present invention, a mutation can be directed to a particular site or combination of sites of a wild-type CAR LBD. For example, an accessory binding site or the binding pocket can be chosen for mutagenesis. Similarly, a residue having a location on, at or near the surface of the polypeptide can be replaced, resulting in an altered surface charge of one or more charge units, as compared to the wild-type CAR and CAR LBD. Alternatively, an amino acid residue in a CAR or a CAR LBD can be chosen for replacement based on its hydrophilic or hydrophobic characteristics.
Such mutants can be characterized by any one of several different properties as compared with the wild-type CAR LBD. For example, such mutants can have an altered surface charge of one or more charge units, or can have an increase in overall stability. Other mutants can have altered ligand specificity in comparison with, or a higher specific activity than, a wild type CAR or CAR LBD.
CAR and CAR LBD mutants of the present invention can be generated in a number of ways. For example, the wild-type sequence of a CAR or a CAR LBD can be mutated at those sites identified using this invention as desirable for mutation by employing oligonucleotide-directed mutagenesis or other conventional methods. Alternatively, mutants of a CAR or a CAR LBD can be generated by the site-specific replacement of a particular amino acid with an unnaturally occurring amino acid. In addition, CAR or CAR LBD mutants can be generated through replacement of an amino acid residue, for example, a particular cysteine or methionine residue, with selenocysteine or selenomethionine. This can be achieved by growing a host organism capable of expressing either the wild type or mutant polypeptide on a growth medium depleted of either natural cysteine or methionine (or both) but enriched in selenocysteine or selenomethionine (or both).
Mutations can be introduced into a DNA sequence coding for a CAR or a CAR LBD using synthetic oligonucleotides. These oligonucleotides contain nucleotide sequences flanking the desired mutation sites. Mutations can be generated in the full-length DNA sequence of a CAR or a CAR LBD or in any sequence coding for polypeptide fragments of a CAR or a CAR LBD.
According to the present invention, a mutated CAR or CAR LBD DNA sequence produced by the methods described above, or any alternative methods known in the art, can be expressed using an expression vector. An expression vector, as is well known to those of skill in the art, typically includes elements that permit autonomous replication in a host cell independent of the host genome, and one or more phenotypic markers for selection purposes. Either prior to or after insertion of the DNA sequences surrounding the desired CAR or CAR LBD mutant coding sequence, an expression vector includes control sequences encoding a promoter, operator, ribosome binding site, translation initiation signal, and, optionally, a repressor gene or various activator genes and a signal for termination. Where secretion of the produced mutant is desired, nucleotides encoding a “signal sequence” can be inserted prior to a CAR or a CAR LBD mutant coding sequence. For expression under the direction of the control sequences, a desired DNA sequence is operatively linked to the control sequences; that is, the sequence has an appropriate start signal in front of the DNA sequence encoding the CAR or CAR LBD mutant, and the correct reading frame to permit expression of that sequence under the control of the control sequences and production of the desired product encoded by that CAR or CAR LBD sequence.
Any of a wide variety of well-known available expression vectors can be used to express a mutated CAR or CAR LBD coding sequences of this invention. These include for example, vectors consisting of segments of chromosomal, non-chromosomal, and synthetic DNA sequences, such as known derivatives of SV40, known bacterial plasmids, e.g., plasmids from E. coli including colE1, pCR1, pBR322, pMB9 and their derivatives, wider host range plasmids, e.g., RP4, phage DNAs, e.g., derivatives of phage X, e.g., NM 989, and other DNA phages, e.g., M13 and filamentous single stranded DNA phages, yeast plasmids and vectors derived from combinations of plasmids and phage DNAs, such as plasmids which have been modified to employ phage DNA or other expression control sequences. In one embodiment of the present invention, a vector amenable to expression in a pRSETA-based expression system is employed. The pRSETA expression system is available from Invitrogen, Inc., Carlsbad, Calif., United States of America.
In addition, any of a wide variety of expression control sequences—i.e. sequences that control the expression of a DNA sequence when operatively linked to it—can be used in these vectors to express the mutated DNA sequences according to this invention. Such useful expression control sequences, include, but are not limited to the early and late promoters of SV40 for animal cells; the lac system, the trp system, the TAC or TRC system, the major operator and promoter regions of phage λ, and the control regions of fd coat protein for E. coli; the promoter for 3-phosphoglycerate kinase or other glycolytic enzymes, the promoters of acid phosphatase, (for example, Pho5), and the promoters of the yeast α-mating factors for yeast; as well as other sequences known to control the expression of genes of prokaryotic or eukaryotic cells or their viruses, and various combinations thereof.
A wide variety of hosts can be employed for producing mutated CAR and CAR LBD polypeptides according to this invention. These hosts include, for example, bacteria, such as E. coli, Bacillus, and Streptomyces; fungi, such as yeasts; animal cells, such as CHO and COS-1 cells; plant cells; insect cells, such as Sf9 cells; and transgenic host cells.
It should be understood that not all expression vectors and expression systems function in the same way to express mutated DNA sequences of this invention, and to produce modified CAR and CAR LBD polypeptides or CAR or CAR LBD mutants. Neither do all hosts function equally well with the same expression system. One of skill in the art can, however, make a selection among these vectors, expression control sequences and hosts without undue experimentation and without departing from the scope of this invention. For example, an important consideration in selecting a vector will be the ability of the vector to replicate in a given host. The copy number of the vector, the ability to control that copy number, and the expression of any other proteins encoded by the vector, such as antibiotic markers, should also be considered.
In selecting an expression control sequence, a variety of factors should also be considered. These include, for example, the relative strength of the system, its controllability and its compatibility with the DNA sequence encoding a modified CAR or CAR LBD polypeptide of this invention, with particular regard to the formation of potential secondary and tertiary structures.
Hosts should be selected by consideration of their compatibility with the chosen vector, the toxicity of a modified CAR or CAR LBD to them, their ability to express mature products, their ability to fold proteins correctly, their fermentation requirements, the ease of purification of a modified CAR or CAR LBD and safety. Within these parameters, one of skill in the art can select various vector/expression control system/host combinations that will produce useful amounts of a mutant CAR or CAR LBD. A mutant CAR or CAR LBD produced in these systems can be purified by a variety of conventional steps and strategies, including those used to purify the wild type CAR or CAR LBD.
Once a CAR LBD mutation(s) has been generated in the desired location, such as an active site or dimerization site, the mutants can be tested for any one of several properties of interest. For example, mutants can be screened for an altered charge at physiological pH. This is determined by measuring the mutant CAR or CAR LBD isoelectric point (pI) and comparing the observed value with that of the wild-type parent. Isoelectric point can be measured by gel-electrophoresis according to the method of Wellner, 1971. A mutant CAR or CAR LBD polypeptide containing a replacement amino acid located at the surface of the enzyme, as provided by the structural information of this invention, can lead to an altered surface charge and an altered pI.
XI.A. Generation of an Engineered CAR LBD or CAR LBD Mutant
In an embodiment of the present invention, a unique CAR or CAR LBD polypeptide is generated. Such a mutant can facilitate purification and the study of the ligand-binding abilities of a CAR polypeptide.
As used in the following discussion, the terms “engineered CAR”, “engineered CAR LBD”, “CAR mutant”, and “CAR LBD mutant” refers to polypeptides having amino acid sequences which contain at least one mutation in the wild-type sequence. The terms also refer to CAR and CAR LBD polypeptides which are capable of exerting a biological effect in that they comprise all or a part of the amino acid sequence of an engineered CAR or CAR LBD polypeptide of the present invention, or cross-react with antibodies raised against an engineered CAR or CAR LBD polypeptide, or retain all or some or an enhanced degree of the biological activity of the engineered CAR or CAR LBD amino acid sequence or protein. Such biological activity can include the binding of small molecules in general, and the binding of Compound 1, in particular.
The terms “engineered CAR LBD” and “CAR LBD mutant” also includes analogs of an engineered CAR LBD or CAR LBD polypeptide. By “analog” is intended that a DNA or polypeptide sequence can contain alterations relative to the sequences disclosed herein, yet retain all or some or an enhanced degree of the biological activity of those sequences. Analogs can be derived from genomic nucleotide sequences or from other organisms, or can be created synthetically. Those of skill in the art will appreciate that other analogs, as yet undisclosed or undiscovered, can be used to design and/or construct CAR LBD or CAR LBD mutant analogs. There is no need for a CAR LBD or CAR LBD mutant polypeptide to comprise all or substantially all of the amino acid sequence of SEQ ID NOs: 2 or 4. Shorter or longer sequences can be employed in the invention; shorter sequences are herein referred to as “segments”. Thus, the terms “engineered CAR LBD” and “CAR LBD mutant” also includes fusion, chimeric or recombinant CAR LBD or CAR LBD mutant polypeptides and proteins comprising sequences of the present invention. Methods of preparing such proteins are disclosed herein above and are known in the art.
XI.A.1. Sequences That Are Substantially Identical to a CAR or CAR LBD Mutant Sequence of the Present Invention
Nucleic acids that are substantially identical to a nucleic acid sequence of a CAR or CAR LBD mutant of the present invention, e.g. allelic variants, genetically altered versions of the gene, etc., bind to a CAR or CAR LBD mutant sequence under stringent hybridization conditions. By using probes, particularly labeled probes of DNA sequences, one can isolate homologous or related genes. The source of homologous genes can be any organism, including, but not limited to primates; rodents, such as rats and mice; canines; felines; bovines; equines; yeast; and nematodes.
Among mammalian species, e.g. human and mouse, homologs can have substantial sequence similarity, i.e. at least 75% sequence identity between nucleotide sequences. Sequence similarity is calculated based on a reference sequence, which can be a subset of a larger sequence, such as a conserved motif, coding region, flanking region, etc. In one embodiment, a reference sequence is at least about 18 nucleotides (nt) long, in another embodiment at least about 30 nt long, and can extend to the complete sequence that is being compared. Algorithms for sequence analysis are known in the art, such as BLAST, described in Altschul et al., 1990.
Percent identity or percent similarity of a DNA or peptide sequence can be determined, for example, by comparing sequence information using the GAP computer program, available from the University of Wisconsin Genetics Computer Group (now part of Accelrys Inc, San Diego, Calif., United States of America). The GAP program utilizes the alignment method of Needleman et al., 1970, as revised by Smith et al., 1981. Briefly, the GAP program defines similarity as the number of aligned symbols (i.e., nucleotides or amino acids) that are similar, divided by the total number of symbols in the shorter of the two sequences. The preferred parameters for the GAP program are the default parameters, which do not impose a penalty for end gaps. See e.g., Schwartz et al., 1979; Gribskov et al., 1986.
The term “similarity” is contrasted with the term “identity”. Similarity is defined as above; “identity”, however, refers to a nucleic acid or amino acid sequence having the same amino acid at the same relative position in a given family member of a gene family. Homology and similarity are generally viewed as broader terms than the term identity. Biochemically similar amino acids, for example leucine/isoleucine or glutamate/aspartate, can be present at the same position—these are not identical per se, but are biochemically “similar.” As disclosed herein, these are referred to as conservative differences or conservative substitutions. This differs from a conservative mutation at the DNA level, which changes the nucleotide sequence without making a change in the encoded amino acid, e.g. TCC to TCA, both of which encode serine.
As used herein, DNA analog sequences are “substantially identical” to specific DNA sequences disclosed herein if: (a) the DNA analog sequence is derived from coding regions of the nucleic acid sequence shown in SEQ ID NOs: 1 or 3; or (b) the DNA analog sequence is capable of hybridization with DNA sequences of (a) under stringent conditions and which encode a biologically active CAR or CAR LBD gene product; or (c) the DNA sequences are degenerate as a result of alternative genetic code to the DNA analog sequences defined in (a) and/or (b). Substantially identical analog proteins and nucleic acids will have between about 70% and 80%, preferably between about 81% to about 90% or even more preferably between about 91% and 99% sequence identity with the corresponding sequence of the native protein or nucleic acid. Sequences having lesser degrees of identity but comparable biological activity are considered to be equivalents.
As used herein, “stringent conditions” refers to conditions of high stringency, for example 6×SSC, 0.2% polyvinylpyrrolidone, 0.2% Ficoll, 0.2% bovine serum albumin, 0.1% sodium dodecyl sulfate, 100 μg/ml salmon sperm DNA and 15% formamide at 68° C. For the purposes of specifying additional conditions of high stringency, preferred conditions comprise a salt concentration of about 200 mM and temperature of about 45° C. One example of stringent conditions is hybridization in 4×SSC, at 65° C., followed by a washing in 0.1×SSC at 65° C. for one hour. Another exemplary stringent hybridization scheme uses 50% formamide, 4×SSC at 42° C.
In contrast, nucleic acids having sequence similarity are detected by hybridization under lower stringency conditions. Thus, sequence identity can be determined by hybridization under lower stringency conditions, for example, at 50° C. or higher and 0.1×SSC (9 mM NaCl/0.9 mM sodium citrate) and the sequences will remain bound when subjected to washing at 55° C. in 1×SSC.
XI.A.2. Complementarity and Hybridization to an Engineered CAR or CAR LBD Mutant Sequence
As used herein, the term “functionally equivalent codon” is used to refer to codons that encode the same amino acid, such as the ACG and AGU codons for serine. CAR or CAR LBD-encoding nucleic acid sequences comprising SEQ ID NOs: 1 and 3, which have functionally equivalent codons are covered by the present invention. Thus, when referring to the sequence examples presented in SEQ ID NOs: 1 and 3, applicants contemplate substitution of functionally equivalent codons into the sequence example of SEQ ID NOs: 1 and 3. Thus, applicants are in possession of amino acid and nucleic acids sequences which include such substitutions but which are not set forth herein in their entirety for convenience.
It will also be understood by those of skill in the art that amino acid and nucleic acid sequences can include additional residues, such as additional N— or C-terminal amino acids or 5′ or 3′ nucleic acid sequences, and yet still be essentially as set forth in one of the sequences disclosed herein, so long as the sequence retains biological protein activity where polypeptide expression is concerned. The addition of terminal sequences particularly applies to nucleic acid sequences which can, for example, include various non-coding sequences flanking either of the 5′ or 3′ portions of the coding region or can include various internal sequences, i.e., introns, which are known to occur within genes.
XI.B. Biological Equivalents
The present invention envisions and includes biological equivalents of CAR or CAR LBD mutant polypeptide of the present invention. The term “biological equivalent” refers to proteins having amino acid sequences which are substantially identical to the amino acid sequence of a CAR LBD mutant of the present invention and which are capable of exerting a biological effect in that they are capable of binding a small molecule, binding a co-regulator, homo- or heterodimerizing or cross-reacting with anti-CAR or CAR LBD mutant antibodies raised against a mutant CAR or CAR LBD polypeptide of the present invention.
For example, certain amino acids can be substituted for other amino acids in a protein structure without appreciable loss of interactive capacity with, for example, structures in the nucleus of a cell. Since it is the interactive capacity and nature of a protein that defines that protein's biological functional activity, certain amino acid sequence substitutions can be made in a protein sequence (or the nucleic acid sequence encoding it) to obtain a protein with the same, enhanced, or antagonistic properties. Such properties can be achieved by interaction with the normal targets of the protein, but this need not be the case, and the biological activity of the invention is not limited to a particular mechanism of action. It is thus in accordance with the present invention that various changes can be made in the amino acid sequence of a CAR or CAR LBD mutant polypeptide of the present invention or its underlying nucleic acid sequence without appreciable loss of biological utility or activity.
Biologically equivalent polypeptides, as used herein, are polypeptides in which certain, but not most or all, of the amino acids can be substituted. Thus, when referring to the sequence examples presented in SEQ ID NOs: 2 and 4, applicants envision substitution of codons that encode biologically equivalent amino acids, as described herein, into the sequence example of SEQ ID NOs: 2 and 4, respectively. Thus, applicants are in possession of amino acid and nucleic acids sequences which include such substitutions but which are not set forth herein in their entirety for convenience.
Alternatively, functionally equivalent proteins or peptides can be created via the application of recombinant DNA technology, in which changes in the protein structure can be engineered, based on considerations of the properties of the amino acids being exchanged, e.g. substitution of Ile for Leu. Changes designed by man can be introduced through the application of site-directed mutagenesis techniques, e.g., to introduce improvements to the antigenicity of the protein or to test a CAR or CAR LBD mutant polypeptide of the present invention in order to modulate co-regulator-binding or other activity, at the molecular level.
Amino acid substitutions, such as those which might be employed in modifying a CAR or CAR LBD mutant polypeptide of the present invention are generally, but not necessarily, based on the relative similarity of the amino acid side-chain substituents, for example, their hydrophobicity, hydrophilicity, charge, size, and the like. An analysis of the size, shape and type of the amino acid side-chain substituents reveals that arginine, lysine and histidine are all positively charged residues; that alanine, glycine and serine are all of similar size; and that phenylalanine, tryptophan and tyrosine all have a generally similar shape. Therefore, based upon these considerations, arginine, lysine and histidine; alanine, glycine and serine; and phenylalanine, tryptophan and tyrosine; are defined herein as biologically functional equivalents. Those of skill in the art will appreciate other biologically functional equivalent changes. It is implicit in the above discussion, however, that one of skill in the art can appreciate that a radical, rather than a conservative substitution is warranted in a given situation. Non-conservative substitutions in mutant CAR or CAR LBD polypeptides of the present invention are also an aspect of the present invention.
In making biologically functional equivalent amino acid substitutions, the hydropathic index of amino acids can be considered. Each amino acid has been assigned a hydropathic index on the basis of their hydrophobicity and charge characteristics, these are: isoleucine (+4.5); valine (+4.2); leucine (+3.8); phenylalanine (+2.8); cysteine (+2.5); methionine (+1.9); alanine (+1.8); glycine (−0.4); threonine (−0.7); serine (−0.8); tryptophan (−0.9); tyrosine (−1.3); proline (−1.6); histidine (−3.2); glutamate (−3.5); glutamine (−3.5); aspartate (−3.5); asparagine (−3.5); lysine (−3.9); and arginine (−4.5).
The importance of the hydropathic amino acid index in conferring interactive biological function on a protein is generally understood in the art (Kyte & Doolittle, 1982, incorporated herein by reference). It is known that certain amino acids can be substituted for other amino acids having a similar hydropathic index or score and still retain a similar biological activity. In making changes based upon the hydropathic index, the substitution of amino acids whose hydropathic indices are within ±2 of the original value is preferred, those within ±1 of the original value are particularly preferred, and those within ±0.5 of the original value are even more particularly preferred.
It is also understood in the art that the substitution of like amino acids can be made effectively on the basis of hydrophilicity. U.S. Pat. No. 4,554,101, incorporated herein by reference, states that the greatest local average hydrophilicity of a protein, as governed by the hydrophilicity of its adjacent amino acids, correlates with its immunogenicity and antigenicity, i.e. with a biological property of the protein. It is understood that an amino acid can be substituted for another having a similar hydrophilicity value and still obtain a biologically equivalent protein.
As detailed in U.S. Pat. No. 4,554,101 to Hopp, the following hydrophilicity values have been assigned to amino acid residues: arginine (+3.0); lysine (+3.0); aspartate (+3.0±1); glutamate (+3.0±1); serine (+0.3); asparagine (+0.2); glutamine (+0.2); glycine (0); threonine (−0.4); proline (−0.5±1); alanine (−0.5); histidine (−0.5); cysteine (−1.0); methionine (−1.3); valine (−1.5); leucine (−1.8); isoleucine (−1.8); tyrosine (−2.3); phenylalanine (−2.5); tryptophan (−3.4).
In making changes based upon similar hydrophilicity values, the substitution of amino acids whose hydrophilicity values are within ±2 of the original value is preferred, those that are within ±1 of the original value are particularly preferred, and those within ±0.5 of the original value are even more particularly preferred.
While discussion has focused on functionally equivalent polypeptides arising from amino acid changes, it will be appreciated that these changes can be effected by alteration of the encoding DNA, taking into consideration also that the genetic code is degenerate and that two or more codons can code for the same amino acid.
Thus, it will also be understood that this invention is not limited to the particular amino acid and nucleic acid sequences of SEQ ID NOs: 14. Recombinant vectors and isolated DNA segments can therefore variously include a CAR or CAR LBD mutant polypeptide-encoding region itself, include coding regions bearing selected alterations or modifications in the basic coding region, or include larger polypeptides which nevertheless comprise a CAR or CAR LBD mutant polypeptide-encoding regions or can encode biologically functional equivalent proteins or polypeptides which have variant amino acid sequences. Biological activity of a CAR or CAR LBD mutant polypeptide can be determined, for example, by employing binding assays known to those of skill in the art.
The nucleic acid segments of the present invention, regardless of the length of the coding sequence itself, can be combined with other DNA sequences, such as promoters, enhancers, polyadenylation signals, additional restriction enzyme sites, multiple cloning sites, other coding segments, polyhistidine encoding segments and the like, such that their overall length can vary considerably. It is therefore contemplated that a nucleic acid fragment of almost any length can be employed, with the total length preferably being limited by the ease of preparation and use in the intended recombinant DNA protocol. For example, nucleic acid fragments can be prepared which include a short stretch complementary to a nucleic acid sequence set forth in SEQ ID NOs: 1 and 3, such as about 10 nucleotides, and which are up to 10,000 or 5,000 base pairs in length. DNA segments with total lengths of about 4,000, 3,000, 2,000, 1,000, 500, 200, 100, and about 50 base pairs in length are also useful.
The DNA segments of the present invention encompass biologically functional equivalents of CAR or CAR LBD mutant polypeptides. Such sequences can arise as a consequence of codon redundancy and functional equivalency that are known to occur naturally within nucleic acid sequences and the proteins thus encoded. Alternatively, functionally equivalent proteins or polypeptides can be created via the application of recombinant DNA technology, in which changes in the protein structure can be engineered, based on considerations of the properties of the amino acids being exchanged. Changes can be introduced through the application of site-directed mutagenesis techniques, e.g., to introduce improvements to the antigenicity of the protein or to test variants of a CAR or CAR LBD mutant of the present invention in order to examine the degree of lipid-binding activity, or other activity at the molecular level. Various site-directed mutagenesis techniques are known to those of skill in the art and can be employed in the present invention.
The invention further encompasses fusion proteins and peptides wherein a CAR or CAR LBD mutant coding region of the present invention is aligned within the same expression unit with other proteins or peptides having desired functions, such as for purification or immunodetection purposes.
Recombinant vectors form important further aspects of the present invention. Particularly useful vectors are those in which the coding portion of the DNA segment is positioned under the control of a promoter. The promoter can be that naturally associated with a CAR gene, as can be obtained by isolating the 5′ non-coding sequences located upstream of the coding segment or exon, for example, using recombinant cloning and/or PCR technology and/or other methods known in the art, in conjunction with the compositions disclosed herein.
In other embodiments, certain advantages can be gained by positioning the coding DNA segment under the control of a recombinant, or heterologous, promoter. As used herein, a recombinant or heterologous promoter is a promoter that is not normally associated with a CAR gene in its natural environment. Such promoters can include promoters isolated from bacterial, viral, eukaryotic, or mammalian cells. Naturally, it will be important to employ a promoter that effectively directs the expression of the DNA segment in the cell type chosen for expression. The use of promoter and cell type combinations for protein expression is generally known to those of skill in the art of molecular biology (See e.g., Sambrook & Russell, 2001, specifically incorporated herein by reference). The promoters employed can be constitutive or inducible and can be used under the appropriate conditions to direct high level expression of the introduced DNA segment, such as is advantageous in the large-scale production of recombinant proteins or peptides. One exemplary promoter system contemplated for use in high-level expression is a T7 promoter-based system.
XII. The Role of the Three-Dimensional Structure of the CAR LDB in Solving Additional CAR Crystals
Because polypeptides can crystallize in more than one crystal form, the structural coordinates of a CAR LBD, or portions thereof, in complex with a co-regulator as provided by the present invention, are particularly useful in solving the structure of other crystal forms of CAR and the crystalline forms of other NRs and CARs. The coordinates provided in the present invention can also be used to solve the structure of CAR or CAR LBD mutants (such as those above), CAR LDB co-complexes, or the crystalline form of any other protein with significant amino acid sequence homology to any functional domain of CAR.
One method that can be employed for the purpose of solving additional CAR crystal structures is molecular replacement. See generally, Rossmann, 1972. In the molecular replacement method, an unknown crystal form, whether it is another crystal form of a CAR or a CAR LBD, (i.e. a CAR or a CAR LBD mutant), a CAR or a CAR LBD polypeptide in complex with another compound (i.e. a “co-complex”) or the crystal of some other protein with significant amino acid sequence homology to any functional region of the CAR LBD (e.g. another NR), can be determined using the CAR LBD structure coordinates provided in Tables 2-3. This method provides an accurate structural form for the unknown crystal more quickly and efficiently than attempting to determine such information ab initio.
In addition, in accordance with this invention, CAR or CAR LBD mutants can be crystallized in complex with known modulators, such as a co-regulator. The crystal structures of a series of such complexes can then be solved by molecular replacement and compared with that of wild-type CAR or the wild-type CAR. LBD. Potential sites for modification within the various binding sites of the enzyme can thus be conveniently identified. This information provides an additional tool for identifying efficient binding interactions, for example, increased hydrophobic interactions between the CAR LBD and a chemical entity or compound.
All of the complexes referred to in the present disclosure can be studied using X-ray diffraction techniques (See e.g., Blundell & Johnson, 1985) and can be refined using computer software, such as the X-PLOR™ program (Bringer, 1992; X-PLOR is available from Accelrys Inc, San Diego, Calif., United States of America). This information can thus be used to optimize known classes of CAR and CAR LBD ligands, and more importantly, to design and synthesize novel classes of CAR and CAR LBD ligands, including co-regulators.
The following Examples have been included to illustrate exemplary modes of the invention. Certain aspects of the following Examples are described in terms of techniques and procedures found or contemplated by the present inventors to work well in the practice of the invention. These Examples are exemplified through the use of standard laboratory practices of the inventors. In light of the present disclosure and the general level of skill in the art, those of skill will appreciate that the following Examples are intended to be exemplary only and that numerous changes, modifications, and alterations can be employed without departing from the spirit and scope of the invention.
A DNA fragment encoding residues 103-348 of a human CAR polypeptide (GenBank Accession No. Z30425) was amplified by the polymerase chain reaction (PCR) with a commercial kit (Stratagene, La Jolla, Calif., United States of America). The 5′ PCR primer included an N-terminal poly-histidine tag sequence (MKKGHHHHHHG; SEQ ID NO: 5) along with an NdeI endonuclease restriction site (CATATG), and the 3′ PCR primer contained a BamHI restriction site (GGATCC). The PCR primers used were 5′-CGGCGGCGCCATATGAAAAAAGGTCATCATCATCATCATCATGGTCCT GTGMCTGAGTMGGAGCMG-3′ (SEQ ID NO: 6) and 5′-CGGCGGCGCGGATCCTTAGCTGCAGATCTCCTGGAGCAGCGG 3′ (SEQ ID NO: 7). The amplified DNA fragment was inserted downstream of a T7 promoter from the pRSETA vector (Invitrogen Corp., Carlsbad, Calif., United States of America) at the NdeI-BamHI enzyme restriction sites. E. coli cells BL21 (DE3) transformed with the above expression vector were grown on a carbenicillin antibiotic agar plate (50 mg/L carbenicillin). A starter culture of 80 ml LB media (10 g/L Bacto-Tryptone, 5 g/L yeast extract, 5 g/L NaCl, QC with distilled water) with carbenicillin antibiotic (50 mg/L carbenicillin) was grown from one colony at 37° C., 250 rpm for four hours. Twelve 2 L shaker flasks with 1 L LB media and carbenicillin antibiotic (50 mg/L carbenicillin) were inoculated with 5 ml of the starter culture. Cells were grown at 23° C., 250 rpm for 16 hours to an OD600 of 2.0, and harvested by centrifugation. The pellet was completely resuspended with 20 ml extract buffer (150 mM NaCl, 50 mM imidazole pH 7.5) per liter of cells. The cells were sonicated for 5 minutes using a Sonicator Ultrasonic Processor XL-2015 (Heat Systems, Inc., Farmingdale, N.Y., United States of America) at 0° C. The lysed cells were centrifuged at 40,000 g for 40 minutes and the supernatant was loaded on a 50 ml Ni-agarose column. The column was washed with 250 ml Buffer A (50 mM imidazole pH 7.5, 150 mM NaCl), 100 ml of Buffer B (200 mM imidazole pH 7.5, 150 mM NaCl), and the protein eluted with a 300 ml gradient to Buffer B (500 mM imidazole pH 7.5, 150 mM NaCl). The peak, which eluted at 45% Buffer B, contained 60 mg of His-tagged CAR LBD protein.
This protein was diluted 5-fold in 10 mM Tris-Cl pH 8.0 to reduce the NaCl concentration before loading the entire sample on a 50 ml SP Sepharose FASTFLOW™ column (Pharmacia Biotech, now part of Amersham Biosciences Corp., Piscataway, N.J., United States of America). The column was washed with 200 ml Buffer S-A (10 mM Tris-Cl pH 8.0, 30 mM NaCl, 5 mM DTT, 1 mM EDTA pH 8.0) and the His-tagged CAR protein was eluted from the column by running a 300 ml increasing NaCl concentration gradient of Buffer S-B (10 mM Tris-Cl pH 8.0, 500 mM NaCl, 5 mM DTT, 1 mM EDTA pH 8.0). Peak fractions containing the CAR protein were pooled together, protein was concentrated to 1 mg/ml in CENTRIPREP™ 30 units (Millipore Corp., Bedford, Mass., United States of America) concentrators. The protein yield was 4 mg/L cells grown. The protein was aliquoted into 10 mg aliquots at 1.0 mg/ml and stored on ice.
The purified CAR LBD protein (10 mg) was complexed with Compound 1 (10 mM in DMSO) in a 1:5 molar ratio and incubated on ice for 1 hour. The CAR LBD/Compound 1 protein complex was concentrated to 4 mg/ml in a CENTRIPREP™ 30 units and stored on ice until needed for crystallization efforts.
CAR/Compound 1 crystals were grown at 4° C. in hanging drops containing 1 μl of the protein-ligand solutions disclosed in Example 1, and 1 μl of well buffer (100-400 mM sodium potassium tartrate, pH 7.1-7.4). Crystals grew to a size of 100-200 μm within several weeks. Before data collection, crystals were transiently mixed with the well buffer that contains an additional 14% ethylene glycol, 7% glycerol, and then flash frozen in liquid nitrogen.
Orthorhombic CAR/ligand crystals formed in the P212121, space group, with a=82.3 Å, b=116.8 Å, c=131.9 Å. Each asymmetric unit contained four CAR LBDs and four ligands. The crystals had a solvent content of 40%.
Crystals were screened with a Rigaku R-Axis IV detector (Rigaku International Corp., Tokyo, Japan), and data sets were collected with a MAR CCD detector at the IMCA 171D beam line at Argonne National Labs (Argonne, Ill., United States of America). The observed reflections were reduced, merged, and scaled with DENZO™ and SCALEPACK™ software in the HKL2000 package (Otwinowski, 1993).
Structures were determined by molecular replacement methods with the CCP4 AMORE™ program (Collaborative Computational Project, 1994; Navaza, 1994) using the poly-alanine model of the conserved region of VDR LBD. Coordinates for this model are presented in Table 3.
The best fitting solution generated with the AMORE™ program gave a correlation coefficiency of 30% and an R-factor of 50%. The phases generated from molecular replacement were extensively refined and improved with solvent flattening, histogram matching, and NCS as implemented in CCP4DM and DMMULTI programs (Cowtan, 1994). Model building proceeded with QUANTA™ (available from Accelrys Inc, San Diego, Calif., United States of America), and refinement progressed with CNX (Brünger et al., 1998), and involved multiple cycles of manual rebuilding.
The structure of CAR in complex with the antagonist Compound 1 was determined. The statistics of the structure are summarized in Table 1.
Surface area was calculated with the Connolly MS program (Connolly, 1983) and the MVP program (Lambert, 1997). The binding pocket volumes were calculated with the program GRASP (Nicholls et al., 1991), using the program MVP to close openings to solvent. The sequence alignments were generated with the MVP program.
Screening of synthetic compound libraries with the purified CAR LBD protein by a Fluorescence Resonance Energy Transfer (FRET) Ligand Sensing Assay (Parks et al., 1999) was conducted to identify molecules that alter the basal interaction between a coactivator peptide and the CAR LBD protein. Briefly, the purified human CAR LBD protein was biotinylated and labeled with streptavidin-conjugated fluorophore allophycocyanin. The labeled CAR LBD protein was incubated with a test compound and with a peptide that included the second LXXLL binding motif of the nuclear coactivator SRC-1 (GenBank Accession No. U59302; amino acids 676-700) that was labeled with europium chelate. Data were collected with a WALLAC VICTOR™ fluorescence reader (available from PerkinElmer Life Sciences Inc., Boston, Mass., United States of America) in a time resolved mode and the fluorescence ratio calculated. Compound 1 was identified from the screen to be an inverse agonist molecule that reduces the basal fluorescent signal indicating that the CAR LBD/SRC-1 interaction was reduced below background levels. Standard dose response curves were conducted with the CAR LBD protein plus Compound 1 and the EC50 was determined to be 15 nM.
2-(benzhydrylamino)-1-(2-phenylethyl)-1H-benzimidazole-6-carboxamide (Compound 1) was synthesized as follows. A solution of 3-fluoro-4-nitrobenzoic acid (1.28 g; 6.9 mmol) in 10 mL anhydrous N,N-dimethylformamide was treated with [O-(7-azabenzotriazol-1-yl)-1,1,3,3-tetramethyluronium hexafluoro-phosphate] (2.6 g; 6.9 mmol) followed by N,N-diisopropylethylamine (3.6 ml, 20.7 mmol). After shaking for 5 min, the mixture was added to polystyrene Rink amide AM resin (1.0 g; 0.69 mmol/g; 0.69 mmol), and the reaction was rotated at 25° C. for 18 h. The reaction solution was drained, and the resin was washed sequentially with N,N-dimethylformamide (3×), dichloromethane (3×), methanol (2×), and dichloromethane (3×). The dried resin was treated with 15.2 ml of a 0.5 M phenethylamine in N-methylpyrrolidinone solution then rotated at 70° C. for 15 hours. The cooled reaction was drained, and the resin was washed sequentially with N,N-dimethylformamide (3×), dichloromethane (3×), methanol (2×), and dichloromethane (3×). The resin was treated with 3.8 ml of 2.0 M SnCl2.dihydrate in N-methylpyrrolidinone solution and rotated at 25° C. for 24 hours. The reaction was drained and the resin washed sequentially with 30% ethylenediamine (3×), N,N-dimethylformamide (3×), dichloromethane (3×), methanol (2×), and dichloromethane (3×). The dried diamine resin was treated with 7.6 ml of a 0.5 M benzyhydryl isothiocyanate in N-methylpyrrolidinone solution and 7.6 ml of a 1.0 M diisopropylcarbodiimide in N-methylpyrrolidinone solution. After rotating at 80° C. for 24 h the reaction was cooled to 25° C., drained, and the resin was washed sequentially with N,N-dimethylformamide (3×), dichloromethane (3×), methanol (2×), and dichloromethane (3×). The resin was treated with 30 ml 95% trifluoroacetic acid (TFA) in water and rotated at 25° C. for 3 hours. The resin was drained and washed with dichloromethane. The filtrate was concentrated in vacuo to give an oil. The oil was redissolved in dichloromethane and the solution was washed twice with saturated sodium bicarbonate (NaHCO3). The organic layer was dried (Na2SO4), filtered, and concentrated in vacuo. The crude product was triturated with Et2O/hexanes, and the solid was collected by filtration to give 333 mg (98% yield) of the title compound as an off-white solid: 1H NMR (DMSO-d6, 400 MHz) δ 7.68 (m, 2 H), 7.63 (d, 1 H, J=8.4), 7.54 (dd, 1 H, J=8.0, 1.2), 7.40-7.00 (m, 17 H), 6.36 (d, 1 H, J=8), 4.42 (t, 2 H, J=7.4), 2.97 (t, 2 H, J=7.4); MS (ESP+) m/e 447 (MH+).
The references listed below as well as all references cited in the specification are incorporated herein by reference to the extent that they supplement, explain, provide a background for or teach methodology, techniques and/or compositions employed herein.
Implementation and applications. Acta Crystallogr A 50:210-20.
It will be understood that various details of the invention can be changed without departing from the scope of the invention. Furthermore, the foregoing description is for the purpose of illustration only, and not for the purpose of limitation, the invention being defined by the claims.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US04/23092 | 7/16/2004 | WO | 1/18/2006 |
Number | Date | Country | |
---|---|---|---|
60488415 | Jul 2003 | US |