The present invention relates to modification of nonhuman immunoglobulins to enhance their efficacy, specificity and safety.
Molecular biology over the last 10 years has elucidated the genome of humans, several animals, plants, bacteria, viruses and fungi. The genomic DNA sequences are deposited in a public database accessible for research and production of recombinant proteins. A major tool to access these data is the so-called NIH BLAST search that can detect genetic relationships in the genome of organisms. These searches are possible because the genetic code is universal for organisms, with some exceptions.
The universality of the genetic code led Crick in 1968 [Crick, F. (1968) J. Mol. Biol. 38: 367-379] to propose that the “allocation of codons to amino acids was entirely a matter of chance”. Since then, variations in codon usage have been discovered that suggest that the genetic code was not a “frozen accident” but has evolved over time [Knight, R. D. et al (1999) TIBS 24:241-247]. It is commonly believed that as life began, there were fewer amino acids than the current count of 20; a code less complex than the triplet code accommodates a smaller variety of amino acids. As has been previously proposed [Kohler, H. et al (2001) J. Mol. Rec., 14: 269-272] it is possible that primordial life operated with a binary genetic code consisting of two chemically different nucleotides, e.g., a purine and a pyrimidine. However, such a genetic code can only accommodate two different amino acids. This line of reasoning has led to the suggestion that early codon relationships exist between nucleotides and amino acids [Root-Bernstein, R. S. (1983), J. Theor. Biol. 100: 99-106].
The evolution of the genetic code has been reviewed recently [Knight, R. et al. (1999) TIBS, 24: 241-247]. It has been observed that a relation between codon usage and amino acid can be recognized in the nucleotide base occurring in the second position of the triplet code. In particular, the second base of the triplet code appears to sort the 20 amino acids into two hydropathy groups, i.e., purines A and G are associated with positive hydropathy, and pyrimidines C and U with negative hydropathy. Hydrophobic Tyr and Trp and hydrophilic Ser and Met are exceptions.
Further support for a binary code is found in the code ambiguity that occurs in primitive organisms which use variants of the standard code [Kohler, H. et al. (1995) J Mol. Recognition, 14: 269-272]. The fungus Candida uses the codon CUG to encode serine, whereas the standard code encodes leucine [Santos, M. et al. (1995) Nucleic. Acids Res. 23: 1481-1486]. This result suggests a hidden binary code in CUG, called “zero”, whereby serine and leucine derive from the same binary code. Similarly, Saccharomyces cerevisiae uses four of six codons, which in the standard code encode leucine, to encode threonine [Zimmer T. et al. (1995) Yeast, 11: 33-41]. Again, leucine and threonine are imputed to share the same binary code of “zero”.
Still further support for a primordial binary code comes from a sharing of the hidden binary code by different amino acids that control the folding of proteins. Analysis of the helix kinks in transmembrane proteins [Yohannan S., et al. (2004) Proc. Natl Acad. Sci USA 101: 959-963] shows that, in addition to proline in the position of bending, alanine, valine, phenylalanine and threonine are present. This set of amino acids carries the binary code “zero”. Evidently, important structural functions in protein folding preserve a primordial genetic code and are remnants of a primordial binary code.
Recently, an artificially ambiguous genetic code has been made [Pezo V. et al (2004) Proc. Natl Acad. Sci USA 101: 8593-8597] that confers growth advantage. The aminoacyl-tRNA synthetase for isoleucine has been mutated to allow both isoleucine and valine to use the same code triplet. Thus, a laboratory experiment has re-created an evolutionary ancient ambiguous code that uses an identical binary (zero) code word for isoleucine and valine.
Protein integration in the yeast genome has been compared with evolutionary distance to the protein's ortholog in C. elegans [Fraser, H. B. et al. (2002) Science 296: 750-752]. A correlation was detected between the evolutionary distance and the number of interactions showing the higher the network interactions were the greater the evolutionary distance. As the evolutionary rates of proteins and their networks are being discovered, it is increasingly apparent that protein-protein interacting surfaces are highly conserved in evolution. These observations imply that network building is closely linked to gene duplication and mutational specification suggesting that network building may have its origin at the onset of functional diversity. There is accumulating evidence that protein networks are an integral part of general protein evolution. Inasmuch as protein folding determines protein function, it also dictates protein contacts.
Translation of a protein sequence into a binary genetic code provides evidence for a primordial binary genetic code based on the hydropathy relationship of the second base in the coding nucleotide triplet. The binary code sequesters the amino acids into two categories, disregarding side chain differences in charge, size and hydropathy. Such a binary code better elucidates the evolution of proteins and their networks than the younger triplet code. In addition, the reduction of protein sequence data to a binary code set simplifies data management and analysis.
However, translation of the current genetic code back into the putative ancient binary code introduces errors, because of the ambiguity of codon usage for the amino acid serine. Four codons for serine have a pyrimidine, C, in the second base position, whereas the two other codons have a purine, G, in that position. Thus, in a reduction to the binary code from an amino acid sequence, the resulting binary sequence does not mirror the coding DNA sequence.
It has been noted that codon ambiguity can confer adaptive advantages in growth yield [Pezo V. et al., supra]. At the level of protein structure, similar ambiguities in amino acid have been documented. For example, analysis of the kinks in helical conformations can be achieved by a number of different amino acids, including the canonical proline, that all share the same binary code [Yohannan S., supra]. An example of amino acid ambiguity that does not produce variations in protein folds suggests a higher fidelity in binary code executed homology searches than achieved by searching at the level of a degenerate codon language, i.e., the current triplet code.
It is argued herein that searching for evolutionarily related genes with identical function, so-called ortholog genes, is more effective at the level of protein structure than at the level of the coding sequence, due to over-specification in the triplet code. Accordingly, in the present invention, a binary genetic sequence is extracted from a known amino acid sequence, because the function of a given gene is ultimately determined at the level of amino acids in the protein. This facilitates computerized homology searches. Moreover, it can enable the development of ortholog proteins that retain essential folding and hydropathic properties.
The principles of the present invention are applied to the problem of “humanization” of nonhuman antibodies. Typically, antibodies immunoreactive with a given antigen are initially obtained in the laboratory from nonhuman models. Great interest lies in “humanizing” such antibodies, i.e., modifying them so that they are both efficacious and tolerated by a human host. One approach proposes to find homologous human and nonhuman, typically murine, variable framework regions, on the theory that close chemical similarity in the framework regions is most important for providing toleration in the host while retaining affinity [See, e.g., Queen, C. et al, (1989) Proc. Natl Acad. Sci USA 86: 10029-10033 and U.S. Pat. Nos. 5,585,089, 6,180,370, and 7,022,500 (issued to Queen et al.)]. A second approach argues that similarly structured CDRs (complementarity-determining regions) will point to human frameworks that also support mouse CDRs with good retention of affinity [Hwang, W. et al. (2005) Methods, 36: 35-42]. Recently described is a laborious third approach that involves generation of a library of humanized heavy and light chain pairs from corresponding murine heavy and light chains. The humanized pairs are then screened for antigen binding [U.S. Pat. No. 7,087,409 (issued to Barbas, III et al.)]. Yet another proposal calls for ranking amino acid similarities between non-human CDRs and human CDRs obtained from a library of human antibody sequences, without need for comparing framework sequences [U.S. Pat. No. 6,881,557 (issued to Foote)].
We propose herein a new paradigm in which a simplified genetic code is employed to identify preexisting human variable region immuno-sequences having a high degree of homology with nonhuman sequences, with that information being used to generate minimally altered modifications of the nonhuman sequences.
Based on insights obtained from our study of the putative primordial genetic code, binary strings can be constructed and employed to identify and generate humanized immunoglobulins. In this method, only the second base position of each triplet codon encoding a variable region amino acid of the nonhuman antibody is considered. A value of “1” is arbitrarily assigned to purine bases (A and G) appearing in the second position, which are imputed to encode amino acids having positive hydropathy; and the value “0” is assigned to pyrimidines (U and C), which encode amino acids having negative hydropathy. Human immunoglobulin gene databases can be screened with the summarizing nonhuman binary strings obtained previously to identify those nucleotide sequences having the greatest homology. Finally, whenever a mismatch occurs between the two strings, the amino acid at that position can be replaced with one having the correct hydropathic properties. The number of modifications necessitated by this method can be quite small. In addition, the inherent ambiguity in the encoding nucleotide and amino acid sequences affords great flexibility in the design of novel fusion proteins.
Previously, a computerized search of a human genomic antibody database for orthologs to a known nonhuman antibody (NHA) would typically require a detailed knowledge of both of the human and nonhuman antibody genomes. This would require a determination of the pertinent genomic sequences of the antibody clone that secretes the NHA. To this end, the amino acid residue sequences of the heavy and/or light chain variable regions of the NHA would be determined by conventional methods, e.g., trypsin digest, N-terminal or C-terminal microsequencing, mass spectroscopy, and the like. The amino acid sequences thereby obtained would then be used to generate a random or reduced set of DNA oligonucleotides that would be used to probe the clone for hybridization. The genomic sequences bracketed by hybridizing probes would then be amplified by PCR, and the amplified DNA would be sequenced by conventional techniques to deduce the native codons encoding the NHA. The resulting nucleotide sequences would then be used to search for orthologs within the human database.
The present invention maintains that the previous approach over-specifies the problem and generates a frequently unnecessary level of detail. The present invention is simpler and avoids some of the more tedious recombinant DNA and analytical techniques outlined above. Algorithms for performing the present invention are illustrated with reference to
Referring to
The single amino acid sequence obtained thereby is then converted into a two-valued (binary) string according to the primordial genetic code shown in Table 1. (1c)
As shown in Table 1, those amino acids encoded by a pyrimidine base in the second position of the standard triplet code are assigned the value “0”. These include Ser, Pro, Thr, Ala, Phe, Leu, Ile, Met and Val. Those amino acids “encoded” by a purine base in the second position are assigned the value “1”. These include Tyr, His, Gln, Asn, Lys, Asp, Glu, Cys, Trp, Arg, and Gly. (Strictly speaking, the codons for Ser are ambiguous since four are in the pyrimidine class and two are in the purine class; therefore, the predominating code is selected, which is the same as for Leu.)
The two-valued representation, sometimes referred to herein as a “probe”, is then used to screen, preferably by computer, a human antibody gene database that has likewise been converted to a two-valued representation [Kohler H. supra]. This process permits selection of those human framework sequences, sometimes referred to herein as “reference” sequences, showing the greatest degree of homology with the NHA framework sequences. (1d) A distance match algorithm looking for the sequence with minimal Hamming Distance [Dictionary of Algorithms and Data Structure, NIST, Black, P., ed.] from the probe is employed to discover the closeness of the match.
Based on the homology results obtained by the comparison of human and NHA binary strings, it may or may not be necessary to modify the NHA framework and/or CDR sequences to make it/them identical with the corresponding human framework and/or CDR sequences, thereby “humanizing” them. (1e) Typically, modification efforts are confined to replacing the amino acids in the non-identical positions of the NHA sequence(s) with the amino acids appearing in the corresponding positions of the reference sequence(s). Such modifications can be performed in situ by conventional recombinant DNA techniques, e.g., homologous recombination, point mutation with synthetic oligonucleotides, and the like, which alter the genome of the NHA-secreting clone and introduce the necessary changes. See, e.g., J. Sambrook, et al., Molecular Cloning: A Laboratory Manual, 3rd ed., Cold Spring Harbor Laboratory Press, New York, 2001, and protocol updates available online from CSH Protocols Online.
Alternatively, expression vectors individually containing the nucleotide sequences that encode the respective FRs and/or CDRs of the NHA can be altered by conventional techniques, and followed by replication and excision with restriction enzymes to afford sufficient amounts of the desired DNA segments containing modifications. Direct PCR amplification of oligonucleotides encoding the NHA FRs and/or CDRs, employing primers containing the desired point mutation(s), can also be used to generate sufficient amounts of appropriately modified DNA segments. The desired humanized immunoglobulins can be expressed by the cell using standard transformation and culture techniques. See, e.g., Sambrook, vide supra.
DNA segments encoding the (un)modified NHA FRs and/or CDRs can be reassembled by standard ligation techniques to generate a complete antibody gene template. (1f) Alternatively, a full-length template can be synthesized directly, with or without modification, based on the information obtained above. The template can then be introduced into a cell capable of expressing the humanized antibody, optionally employing an expression vector that contains the template. (1g)
An alternative aspect of the invention is depicted in
The invention is further illustrated with reference to a specific example for humanizing the murine T15 antibody [BLAST, gi|346800]. The variable heavy chain FRs of the T15 antibody, when represented as a single, contiguous amino acid sequence (absent the CDRs), have the following formula:
In the formula, the two diagonals indicate the positions where different parts of the framework are conjoined. The three regions correspond to positions 1-30, 36-51, and 57-94 of the standard Kabat alignment of heavy chains. The amino acids in boldface indicate those residues differing from the closest-match human antibody, vide infra.
The single T15 FR sequence translates according to the code shown in Table 1 into the following binary string (where numbers in boldface correspond to the aforementioned amino acid residues):
A computerized minimal-distance algorithm that searches a BLAST human genome database for closest matches to the T15 sequence identifies an amino acid sequence, designated 12-2′CL, having the following formula and binary translation:
Based on these results, it is desired to modify the T15 variable heavy chain FR sequences to incorporate the amino acid residues found in 12-2′CL:
The present invention has been described with reference to certain examples for purposes of explanation and clarity of understanding. It should be appreciated by the skilled practitioner that obvious improvements and modifications can be practiced within the scope of the appended claims.
This application claims the benefit of U.S. Provisional Application 60/708,154, filed Aug. 15, 2005, the disclosure of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
60708154 | Aug 2005 | US |